Suffix Trie

A suffix trie is a radix trie that stores information about a single string and exports a huge amount of structural information about that string. It is able to perform these stuctural operations by storing all the possible suffixes of the given text, hence the name “Suffix Trie”.

../../_images/suffix_trie.gif

⏱ Time-Complexity

The following table sums up all the different public functionality in this class and also provides the worst-case time complexity along side with the optimal time complexity that I will try to reach in future releases Insha’Allah. Generally, we are going to use the following indicators in the table:

  • n is the number of nodes currently in the suffix trie.
  • m is the length of the passed string.
Method Description Worst-case Optimal
__init__() Initializes the suffix trie. O(n^2) O(n)
__len__() Returns the number of nodes in the suffix trie. O(n) O(1)
__repr__() Represents the suffix trie as a string. O(n) O(n)
__iter__() Iterates over the suffix trie. O(n) O(n)
get_height() Gets the suffix trie’s height. O(n) O(n)
get_depth() Gets the suffix trie’s depth. O(1) O(1)
count_leaf_nodes() Counts all leaf nodes in the suffix trie. O(n) O(n)
to_list() Converts the suffix trie instance to a normal list. O(n) O(n)
has_substring() Searches the suffix trie for a substring. O(m) O(m)
to_suffix_array() Converts the suffix trie to a suffix array. O(nlog(n)) O(1)
get_longest_common_substring() Retrieves the longest common substring in the suffix trie. O(m) O(m)
get_longest_repeated_substring() Retrieves the longest repeated substring in the suffix trie. O(m) O(m)
get_lowest_common_ancestor() Retrieves the lowest common ancestor in the suffix trie. O(m) O(m)
count_pattern_occurrences() Counts the occurrences of a pattern in the suffix trie. O(m) O(m)

☕️ API

Here are all of the public methods that can be used with SuffixTrie() objects:

class extra.trees.suffix_trie.SuffixTrie(word)

A suffix trie is a radix trie that stores information about a single string and exports a huge amount of structural information about that string. It is able to perform these stuctural operations by storing all the possible suffixes of the given text, hence the name “Suffix Trie”.

__init__(word)

Creates a SuffixTrie() object using the given word in quadratic time-complexity!!

Parameters:

word (str) – The string that we want to create our SuffixTrie() from its suffixes.

Raises:
  • ValueError: It can be raised due to one of these cases: – 1. If given word is None. 2. If the given word is an empty string.
  • TypeError: It can be raised due to one of these cases: – 1. If word is an Extra object. 2. If the type of the given word isn’t a string.

Example

>>> st = SuffixTrie("banana")
>>> type(st)
<class 'extra.trees.suffix_trie.SuffixTrie'>
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
__iter__()

Iterates over the SuffixTrie() instance and returns a generator of the string values stored at the different nodes in breadth-first manner.

Yields:str – The string value stored at each node in the instance.

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> for value in st:
...     print(value, end=',')
banana$,a,na,$,na,$,na$,$,na$,$,
__len__()

Gets the number of nodes within the SuffixTrie() instance.

Returns:The length of the SuffixTrie() instance. Length is the number of nodes in the instance.
Return type:int

Examples

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> len(st)
11
__repr__()

Represents the SuffixTrie() instance as a string.

Returns:The string-representation of the SuffixTrie() instance.
Return type:str

Examples

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
count_leaf_nodes()

Counts the number of leaf nodes in the SuffixTrie() instance. Leaf nodes are the trie nodes that have no children.

Returns:A positive integer representing the number of leaf nodes in the SuffixTrie().
Return type:int

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.count_leaf_nodes()
7
count_pattern_occurrences(pattern)

Counts the number of times a certain sequence of characters appear in the SuffixTrie().

Parameters:pattern (str) – The pattern or the sequence of characters to find its number of occurrences.
Returns:The number of occurrences this pattern appear in the SuffixTrie().
Return type:int

Example

>>> st = SuffixTrie("banana")
>>> st.count_pattern_occurrences("ana")
2
>>> st.count_pattern_occurrences("a")
3
>>> st.count_pattern_occurrences("bann")
0
get_depth()

Gets the depth of the SuffixTrie() instance.

Returns:A positive integer representing the depth of the given SuffixTrie().
Return type:int

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.get_depth()
0
get_height()

Gets the height of the SuffixTrie() instance. The trie’s height is the number of edges between the root and the furthest leaf node.

Returns:A positive integer representing the height of the instance.
Return type:int

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.get_height()
3
get_longest_common_substring()

Gets the longest common substring in the SuffixTrie(), or LCS for short. LCS is the longest substring that occurs at least twice.

Returns:A list of longest commong substring(s) found in the SuffixTrie()
Return type:list

Example

>>> st = SuffixTrie("banana")
>>> st.get_longest_common_substring()
['ana']
>>>
>>> st = SuffixTrie("abcpqrabpqpq")
>>> st.get_longest_common_substring()
["pq", "ab"]

Note

Longest Common Substring is the same as Longest Repeated Substring which means that the following two methods are exactly the same:

  • get_longest_common_substring().
  • get_longest_repeated_substring().
get_longest_repeated_substring()

Gets the longest repeated substring in the SuffixTrie(), or LRS for short. LRS is the longest substring that occurs at least twice.

Returns:A list of longest reapeated substring(s) found in the SuffixTrie()
Return type:list

Example

>>> st = SuffixTrie("banana")
>>> st.get_longest_repeated_substring()
['ana']
>>>
>>> st = SuffixTrie("abcpqrabpqpq")
>>> st.get_longest_repeated_substring()
["pq", "ab"]

Note

Longest Repeated Substring is the same as Longest Common Substring which means that the following two methods are exactly the same:

  • get_longest_common_substring().
  • get_longest_repeated_substring().
get_lowest_common_ancestor(i, j)

Gets lowest common ancestor between two given suffixes defined by their indices in the SuffixTrie().

Parameters:
  • i (int) – The start index of the first suffix.
  • j (int) – The start index of the second suffix.s
Returns:

The lowest common ancestor between the two suffixes.

Return type:

str

Raises:
  • TypeError: – If either one of the given indices were not an integer.
  • ValueError: – If the value of the given indices were invalid, like negative number of more than the length of the base string.

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.get_lowest_common_ancestor(2, 4)
'na'
>>> st.get_lowest_common_ancestor(0, 6)
''
>>> st.get_lowest_common_ancestor(1, 5)
'a'
has_substring(substr)

Searches the SuffixTrie() for the given substring and returns True if the whole substring exists and False if not.

Parameters:substr (str) – The substring to be searched for in the SuffixTrie() instance.
Returns:Returns True if the substring exists in the SuffixTrie() instance and False if not.
Return type:bool

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.has_substring("ban")
True
>>> st.has_substring("ab")
False
to_list()

Converts the SuffixTrie() instance to a list where values will be inserted in breadth-first manner.

Returns:A list object containing the same elements as the SuffixTrie() instance.
Return type:list

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.to_list()
['banana$', 'a', 'na', '$', 'na', '$', 'na$', '$', 'na$', '$']
to_suffix_array()

Converts the SuffixTrie() to a suffix array. A suffix array is an array that stores the starting indices of all suffix in the given string sorted alphabetically.

Returns:The suffix array.
Return type:list

Example

>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.to_suffix_array()
[6, 5, 3, 1, 0, 4, 2]
>>> # indices associated with the following suffixes:
>>> # ['$', 'a', 'ana', 'anana', 'banana', 'na', 'nana']