A suffix trie is a radix trie that stores information about a single string and exports a huge amount of structural information about that string. It is able to perform these stuctural operations by storing all the possible suffixes of the given text, hence the name “Suffix Trie”.
The following table sums up all the different public functionality in this class and also provides the worst-case time complexity along side with the optimal time complexity that I will try to reach in future releases Insha’Allah. Generally, we are going to use the following indicators in the table:
Method | Description | Worst-case | Optimal |
---|---|---|---|
__init__() | Initializes the suffix trie. | O(n^2) | O(n) |
__len__() | Returns the number of nodes in the suffix trie. | O(n) | O(1) |
__repr__() | Represents the suffix trie as a string. | O(n) | O(n) |
__iter__() | Iterates over the suffix trie. | O(n) | O(n) |
get_height() | Gets the suffix trie’s height. | O(n) | O(n) |
get_depth() | Gets the suffix trie’s depth. | O(1) | O(1) |
count_leaf_nodes() | Counts all leaf nodes in the suffix trie. | O(n) | O(n) |
to_list() | Converts the suffix trie instance to a normal list. | O(n) | O(n) |
has_substring() | Searches the suffix trie for a substring. | O(m) | O(m) |
to_suffix_array() | Converts the suffix trie to a suffix array. | O(nlog(n)) | O(1) |
get_longest_common_substring() | Retrieves the longest common substring in the suffix trie. | O(m) | O(m) |
get_longest_repeated_substring() | Retrieves the longest repeated substring in the suffix trie. | O(m) | O(m) |
get_lowest_common_ancestor() | Retrieves the lowest common ancestor in the suffix trie. | O(m) | O(m) |
count_pattern_occurrences() | Counts the occurrences of a pattern in the suffix trie. | O(m) | O(m) |
Here are all of the public methods that can be used with SuffixTrie() objects:
extra.trees.suffix_trie.
SuffixTrie
(word)¶A suffix trie is a radix trie that stores information about a single string and exports a huge amount of structural information about that string. It is able to perform these stuctural operations by storing all the possible suffixes of the given text, hence the name “Suffix Trie”.
__init__
(word)¶Creates a SuffixTrie() object using the given word in quadratic time-complexity!!
Parameters: | word (str) – The string that we want to create our SuffixTrie() from its suffixes. |
---|---|
Raises: |
|
Example
>>> st = SuffixTrie("banana")
>>> type(st)
<class 'extra.trees.suffix_trie.SuffixTrie'>
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
__iter__
()¶Iterates over the SuffixTrie() instance and returns a generator of the string values stored at the different nodes in breadth-first manner.
Yields: | str – The string value stored at each node in the instance. |
---|
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> for value in st:
... print(value, end=',')
banana$,a,na,$,na,$,na$,$,na$,$,
__len__
()¶Gets the number of nodes within the SuffixTrie() instance.
Returns: | The length of the SuffixTrie() instance. Length is the number of nodes in the instance. |
---|---|
Return type: | int |
Examples
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> len(st)
11
__repr__
()¶Represents the SuffixTrie() instance as a string.
Returns: | The string-representation of the SuffixTrie() instance. |
---|---|
Return type: | str |
Examples
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
count_leaf_nodes
()¶Counts the number of leaf nodes in the SuffixTrie() instance. Leaf nodes are the trie nodes that have no children.
Returns: | A positive integer representing the number of leaf nodes in the SuffixTrie(). |
---|---|
Return type: | int |
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.count_leaf_nodes()
7
count_pattern_occurrences
(pattern)¶Counts the number of times a certain sequence of characters appear in the SuffixTrie().
Parameters: | pattern (str) – The pattern or the sequence of characters to find its number of occurrences. |
---|---|
Returns: | The number of occurrences this pattern appear in the SuffixTrie(). |
Return type: | int |
Example
>>> st = SuffixTrie("banana")
>>> st.count_pattern_occurrences("ana")
2
>>> st.count_pattern_occurrences("a")
3
>>> st.count_pattern_occurrences("bann")
0
get_depth
()¶Gets the depth of the SuffixTrie() instance.
Returns: | A positive integer representing the depth of the given SuffixTrie(). |
---|---|
Return type: | int |
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.get_depth()
0
get_height
()¶Gets the height of the SuffixTrie() instance. The trie’s height is the number of edges between the root and the furthest leaf node.
Returns: | A positive integer representing the height of the instance. |
---|---|
Return type: | int |
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.get_height()
3
get_longest_common_substring
()¶Gets the longest common substring in the SuffixTrie(), or LCS for short. LCS is the longest substring that occurs at least twice.
Returns: | A list of longest commong substring(s) found in the SuffixTrie() |
---|---|
Return type: | list |
Example
>>> st = SuffixTrie("banana")
>>> st.get_longest_common_substring()
['ana']
>>>
>>> st = SuffixTrie("abcpqrabpqpq")
>>> st.get_longest_common_substring()
["pq", "ab"]
Note
Longest Common Substring is the same as Longest Repeated Substring which means that the following two methods are exactly the same:
- get_longest_common_substring().
- get_longest_repeated_substring().
get_longest_repeated_substring
()¶Gets the longest repeated substring in the SuffixTrie(), or LRS for short. LRS is the longest substring that occurs at least twice.
Returns: | A list of longest reapeated substring(s) found in the SuffixTrie() |
---|---|
Return type: | list |
Example
>>> st = SuffixTrie("banana")
>>> st.get_longest_repeated_substring()
['ana']
>>>
>>> st = SuffixTrie("abcpqrabpqpq")
>>> st.get_longest_repeated_substring()
["pq", "ab"]
Note
Longest Repeated Substring is the same as Longest Common Substring which means that the following two methods are exactly the same:
- get_longest_common_substring().
- get_longest_repeated_substring().
get_lowest_common_ancestor
(i, j)¶Gets lowest common ancestor between two given suffixes defined by their indices in the SuffixTrie().
Parameters: |
|
---|---|
Returns: | The lowest common ancestor between the two suffixes. |
Return type: | str |
Raises: |
|
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.get_lowest_common_ancestor(2, 4)
'na'
>>> st.get_lowest_common_ancestor(0, 6)
''
>>> st.get_lowest_common_ancestor(1, 5)
'a'
has_substring
(substr)¶Searches the SuffixTrie() for the given substring and returns True if the whole substring exists and False if not.
Parameters: | substr (str) – The substring to be searched for in the SuffixTrie() instance. |
---|---|
Returns: | Returns True if the substring exists in the SuffixTrie() instance and False if not. |
Return type: | bool |
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.has_substring("ban")
True
>>> st.has_substring("ab")
False
to_list
()¶Converts the SuffixTrie() instance to a list where values will be inserted in breadth-first manner.
Returns: | A list object containing the same elements as the SuffixTrie() instance. |
---|---|
Return type: | list |
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.to_list()
['banana$', 'a', 'na', '$', 'na', '$', 'na$', '$', 'na$', '$']
to_suffix_array
()¶Converts the SuffixTrie() to a suffix array. A suffix array is an array that stores the starting indices of all suffix in the given string sorted alphabetically.
Returns: | The suffix array. |
---|---|
Return type: | list |
Example
>>> st = SuffixTrie("banana")
>>> st
ROOT
├── banana$ ⟶ 0
├─┬ a
│ ├─┬ na
│ │ ├── na$ ⟶ 1
│ │ └── $ ⟶ 3
│ └── $ ⟶ 5
├─┬ na
│ ├── na$ ⟶ 2
│ └── $ ⟶ 4
└── $ ⟶ 6
>>> st.to_suffix_array()
[6, 5, 3, 1, 0, 4, 2]
>>> # indices associated with the following suffixes:
>>> # ['$', 'a', 'ana', 'anana', 'banana', 'na', 'nana']