Pattern Searching using Suffix Tree


Trie − A trie is a tree-based data structure used to store and retrieve a dynamic set of strings.

Compressed Trie − A compressed trie is a variation of the trie data structure used for storing and searching dynamic sets of strings. Memory usage is minimised by sharing common prefixes.

In a compressed trie, nodes with only one child are merged with their parent nodes compressing the common prefixes into a single edge.

Suffix Tree − A suffix tree is a data structure used in string processing to store and search for all suffixes of a given string. It represents all possible suffixes of a string in the form of a tree-like data structure where each edge represents a substring and each node represents a position in the string. The root node represents an empty string and leaf nodes represent all unique suffixes of the string.

Creating a Suffix Tree for a Given String

  • Generate all the suffixes for the given string.

  • Taking an example of the word “world” −

Suffixes of “world\0” are:
world\0
orld\0
rld\0
ld\0
d\0
\0
  • Taking each suffix are separate words, create a compressed trie.

Problem Statement

Given an input string ‘str’ and a pattern string ‘ptr’. Using a suffix tree tell if the pattern string ‘ptr’ is present in the input string ‘str’or not along with th indices they are present at.

Sample Example 1

Input: str = “aabcdaaabcdbabc”
ptr = “abc”
Output: 1, 7, 12

Explanation

The pattern “abc” is present at indices 1, 7, and 12 in the input string.

Sample Example 2

Input: str = “minimization”
ptr = “ma”
Output: False

Explanation

The pattern “ma” is not found in the input string.

Solution Approach

For searching ptr in the suffix tree, we first look at the first character of the pattern and match it with the children of the root node. If there is a match then we search recursively on the child node. But if at any point, the pattern does not match the child node, then the pattern is not present in the string.

Pseudocode

class Node
   children[256]: Node array
   ind: List of integers

   constructor()
      ind <- create new empty list of integers
      for i from 0 to 255
         children[i] <- NULL

   function insertSuffix(suffix: string, index: integer)
      ind.push_back(index)
      if suffix.length() > 0
         cIndex <- suffix.at(0)
         if children[cIndex] is NULL
            children[cIndex] <- create new Node
         children[cIndex].insertSuffix(suffix.substr(1), index + 1)

   function search(pat: string): List of integers
      if pat.length() is 0
         return ind
      if children[pat.at(0)] is not NULL
         return children[pat.at(0)].search(pat.substr(1))
      else
         return NULL


class SuffixTree
   root: Node

   constructor(txt: string)
      root <- create new Node
      for i from 0 to txt.length() - 1
         root.insertSuffix(txt.substr(i), i)

   function search(ptr: string)
      ans <- root.search(ptr)
      if ans is NULL
         print "Pattern not found"
      else
         for each i in ans
            print "Pattern found at position " + (i - ptr.length())

Example: C++ Implementation

The following code searches for a pattern in a string using a suffix tree.

#include <bits/stdc++.h>
using namespace std;

// Defining node of the Suffix tree
class Node{
private:
   Node *children[256];
   list<int> *ind;
public:
   Node(){
      ind = new list<int>;
      for (int i = 0; i < 256; i++) {
         children[i] = NULL;
      }
   }
   // Inserting new suffix to the tree
   void insertSuffix(string suffix, int index){
      ind->push_back(index);
      if (suffix.length() > 0){
         char cIndex = suffix.at(0);
         if (children[cIndex] == NULL)
            children[cIndex] = new Node();
         children[cIndex]->insertSuffix(suffix.substr(1), index + 1);
      }
   }
   // Pattern Searching in subtree
   list<int> *search(string pat){
      if (pat.length() == 0)
         return ind;
      if (children[pat.at(0)] != NULL)
         return (children[pat.at(0)])->search(pat.substr(1));
      else
         return NULL;
   }
};

// Defination of Suffix Tree
class SuffixTree {
private:
   Node root;
public:
   SuffixTree(string txt){
      for (int i = 0; i < txt.length(); i++)
         root.insertSuffix(txt.substr(i), i);
   }
   // Function for searching a pattern in the tree
   void search(string ptr){
      list<int> *ans = root.search(ptr);
      if (ans == NULL)
         cout << "Pattern not found" << endl;
      else {
         list<int>::iterator i;
         int ptrLength = ptr.length();
         for (i = ans->begin(); i != ans->end(); i++){
            cout << "Pattern found at position " << *i - ptrLength << endl;
         }
      }
   }
};
int main(){
   string str = "aabcdaaabcdbabc";
   string ptr = "abcx";
   SuffixTree Tree(str);
   cout << "Searching for " << ptr << endl;
   Tree.search(ptr);
   return 0;
}

Output

Searching for abcx
Pattern not found

Time Complexity

Suffix Tree Construction − O(N^2) where N is the length of input string and this is the worst case time complexity.

Pattern Searching − O(M) where M is the length of the pattern.

Space Complexity

Suffix Tree Construction − O(N^2) where N is the length of the input string and this is the worst-case space complexity.

Pattern Searching − O(1)

Conclusion

In conclusion, the Suffix Tree is a powerful data structure for efficiently storing and manipulating strings. It allows for various string-related operations, including substring searches, pattern matching, and prefix/suffix queries. Pattern searching in a string using the suffix tree is an efficient approach. The provided solution solves the problem with the time complexity of O(M) where M is the size of the patterns string, and space complexity of O(1).

Updated on: 03-Nov-2023

190 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements