Knuth-Morris-Pratt Algorithm

Data StructureAlgorithmsPattern Searching Algorithms

Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case.

The time complexity of KMP is O(n).

Input and Output

Input:
Main String: “AAAABAAAAABBBAAAAB”, The pattern “AAAB”
Output:
Pattern found at location: 1
Pattern found at location: 7
Pattern found at location: 14

Algorithm

findPrefix(pattern, m, prefArray)

Input − The pattern, the length of pattern and an array to store prefix location

Output − The array to store where prefixes are located

Begin
   length := 0
   prefArray[0] := 0

   for all character index ‘i’ of pattern, do
      if pattern[i] = pattern[length], then
         increase length by 1
         prefArray[i] := length
      else
         if length ≠ 0 then
            length := prefArray[length - 1]
            decrease i by 1
         else
            prefArray[i] := 0
   done
End

kmpAlgorithm(text, pattern)

Input: The main text, and the pattern, which will be searched

Output − The location where patterns are found

Begin
   n := size of text
   m := size of pattern
   call findPrefix(pattern, m, prefArray)

   while i < n, do
      if text[i] = pattern[j], then
         increase i and j by 1
      if j = m, then
         print the location (i-j) as there is the pattern
         j := prefArray[j-1]
      else if i < n AND pattern[j] ≠ text[i] then
         if j ≠ 0 then
            j := prefArray[j - 1]
         else
            increase i by 1
   done
End

Example

#include<iostream>
using namespace std;

void findPrefix(string pattern, int m, int prefArray[]) {
   int length = 0;
   prefArray[0] = 0;     //first place is always 0 as no prefix

   for(int i = 1; i<m; i++) {
      if(pattern[i] == pattern[length]) {
         length++;
         prefArray[i] = length;    
      }else {
         if(length != 0) {
            length = prefArray[length - 1];
            i--;     //decrease i to avoid effect of increasing after iteration
         }else
            prefArray[i] = 0;
      }
   }
}

void kmpPattSearch(string mainString, string pattern, int *locArray, int &loc) {
   int n, m, i = 0, j = 0;
   n = mainString.size();
   m = pattern.size();
   int prefixArray[m];    //prefix array as same size of pattern
   findPrefix(pattern, m, prefixArray);
   loc = 0;

   while(i < n) {
      if(mainString[i] == pattern[j]) {
         i++; j++;
      }

      if(j == m) {
         locArray[loc] = i-j;      //item found at i-j position.
         loc++;
         j = prefixArray[j-1];    //get the prefix length from array
      }else if(i < n && pattern[j] != mainString[i]) {
         if(j != 0)
            j = prefixArray[j-1];
         else
            i++;
      }
   }
}

int main() {
   string str = "AAAABAAAAABBBAAAAB";
   string patt = "AAAB";
   int locationArray[str.size()];
   int index;
   kmpPattSearch(str, patt, locationArray, index);

   for(int i = 0; i<index; i++) {
      cout << "Pattern found at location: " <<locationArray[i] << endl;
   }
}

Output

Pattern found at location: 1
Pattern found at location: 7
Pattern found at location: 14
raja
Published on 09-Jul-2018 10:46:30
Advertisements