DSA - Home
DSA - Overview
DSA - Environment Setup
DSA - Algorithms Basics
DSA - Asymptotic Analysis
Data Structures
DSA - Data Structure Basics
DSA - Data Structures and Types
DSA - Array Data Structure
DSA - Skip List Data Structure
Linked Lists
DSA - Linked List Data Structure
DSA - Doubly Linked List Data Structure
DSA - Circular Linked List Data Structure
Stack & Queue
DSA - Stack Data Structure
DSA - Expression Parsing
DSA - Queue Data Structure
DSA - Circular Queue Data Structure
DSA - Priority Queue Data Structure
DSA - Deque Data Structure
Searching Algorithms
DSA - Searching Algorithms
DSA - Linear Search Algorithm
DSA - Binary Search Algorithm
DSA - Interpolation Search
DSA - Jump Search Algorithm
DSA - Exponential Search
DSA - Fibonacci Search
DSA - Sublist Search
DSA - Hash Table
Sorting Algorithms
DSA - Sorting Algorithms
DSA - Bubble Sort Algorithm
DSA - Insertion Sort Algorithm
DSA - Selection Sort Algorithm
DSA - Merge Sort Algorithm
DSA - Shell Sort Algorithm
DSA - Heap Sort Algorithm
DSA - Bucket Sort Algorithm
DSA - Counting Sort Algorithm
DSA - Radix Sort Algorithm
DSA - Quick Sort Algorithm
Matrices Data Structure
DSA - Matrices Data Structure
DSA - Lup Decomposition In Matrices
DSA - Lu Decomposition In Matrices
Graph Data Structure
DSA - Graph Data Structure
DSA - Depth First Traversal
DSA - Breadth First Traversal
DSA - Spanning Tree
DSA - Topological Sorting
DSA - Strongly Connected Components
DSA - Biconnected Components
DSA - Augmenting Path
DSA - Network Flow Problems
DSA - Flow Networks In Data Structures
DSA - Edmonds Blossom Algorithm
DSA - Maxflow Mincut Theorem
Tree Data Structure
DSA - Tree Data Structure
DSA - Tree Traversal
DSA - Binary Search Tree
DSA - AVL Tree
DSA - Red Black Trees
DSA - B Trees
DSA - B+ Trees
DSA - Splay Trees
DSA - Range Queries
DSA - Segment Trees
DSA - Fenwick Tree
DSA - Fusion Tree
DSA - Hashed Array Tree
DSA - K-Ary Tree
DSA - Kd Trees
DSA - Priority Search Tree Data Structure
Recursion
DSA - Recursion Algorithms
DSA - Tower of Hanoi Using Recursion
DSA - Fibonacci Series Using Recursion
Divide and Conquer
DSA - Divide and Conquer
DSA - Max-Min Problem
DSA - Strassen's Matrix Multiplication
DSA - Karatsuba Algorithm
Greedy Algorithms
DSA - Greedy Algorithms
DSA - Travelling Salesman Problem (Greedy Approach)
DSA - Prim's Minimal Spanning Tree
DSA - Kruskal's Minimal Spanning Tree
DSA - Dijkstra's Shortest Path Algorithm
DSA - Map Colouring Algorithm
DSA - Fractional Knapsack Problem
DSA - Job Sequencing with Deadline
DSA - Optimal Merge Pattern Algorithm
Dynamic Programming
DSA - Dynamic Programming
DSA - Matrix Chain Multiplication
DSA - Floyd Warshall Algorithm
DSA - 0-1 Knapsack Problem
DSA - Longest Common Sub-sequence Algorithm
DSA - Travelling Salesman Problem (Dynamic Approach)
Hashing
DSA - Hashing Data Structure
DSA - Collision In Hashing
Disjoint Set
DSA - Disjoint Set
DSA - Path Compression And Union By Rank
Heap
DSA - Heap Data Structure
DSA - Binary Heap
DSA - Binomial Heap
DSA - Fibonacci Heap
Tries Data Structure
DSA - Tries
DSA - Standard Tries
DSA - Compressed Tries
DSA - Suffix Tries
Treaps
DSA - Treaps Data Structure
Bit Mask
DSA - Bit Mask In Data Structures
Bloom Filter
DSA - Bloom Filter Data Structure
Approximation Algorithms
DSA - Approximation Algorithms
DSA - Vertex Cover Algorithm
DSA - Set Cover Problem
DSA - Travelling Salesman Problem (Approximation Approach)
Randomized Algorithms
DSA - Randomized Algorithms
DSA - Randomized Quick Sort Algorithm
DSA - Karger’s Minimum Cut Algorithm
DSA - Fisher-Yates Shuffle Algorithm
Miscellaneous
DSA - Infix to Postfix
DSA - Bellmon Ford Shortest Path
DSA - Maximum Bipartite Matching
DSA Useful Resources
DSA - Questions and Answers
DSA - Selection Sort Interview Questions
DSA - Merge Sort Interview Questions
DSA - Insertion Sort Interview Questions
DSA - Heap Sort Interview Questions
DSA - Bubble Sort Interview Questions
DSA - Bucket Sort Interview Questions
DSA - Radix Sort Interview Questions
DSA - Cycle Sort Interview Questions
DSA - Quick Guide
DSA - Useful Resources
DSA - Discussion

Knuth-Morris-Pratt Algorithm

Quiz

KMP Algorithm for Pattern Matching

The KMP algorithm is used to solve the pattern matching problem which is a task of finding all the occurrences of a given pattern in a text. It is very useful when it comes to finding multiple patterns. For instance, if the text is "aabbaaccaabbaadde" and the pattern is "aabaa", then the pattern occurs twice in the text, at indices 0 and 8.

The naive solution to this problem is to compare the pattern with every possible substring of the text, starting from the leftmost position and moving rightwards. This takes O(n*m) time, where 'n' is the length of the text and 'm' is the length of the pattern.

When we work with long text documents, the brute force and naive approaches may result in redundant comparisons. To avoid such redundancy, Knuth, Morris, and Pratt developed a linear sequence-matching algorithm named the KMP pattern matching algorithm. It is also referred to as Knuth Morris Pratt pattern matching algorithm.

How does KMP Algorithm work?

The KMP algorithm starts the search operation from left to right. It uses the prefix function to avoid unnecessary comparisons while searching for the pattern. This function stores the number of characters matched so far which is known as LPS value. The following steps are involved in KMP algorithm −

Define a prefix function.
Slide the pattern over the text for comparison.
If all the characters match, we have found a match.
If not, use the prefix function to skip the unnecessary comparisons. If the LPS value of previous character from the mismatched character is '0', then start comparison from index 0 of pattern with the next character in the text. However, if the LPS value is more than '0', start the comparison from index value equal to LPS value of the previously mismatched character.

The KMP algorithm takes O(n + m) time and O(m) space. It is faster than the naive solution because it skips the redundant comparisons, and only compares each character of the text at most once.

Let's understand the input-output scenario of a pattern matching problem with an example −

Input:
main String: "AAAABCAAAABCBAAAABC" 
pattern: "AAABC"
Output:
Pattern found at position: 1
Pattern found at position: 7
Pattern found at position: 14

Example

The following example practically illustrates the KMP algorithm for pattern matching.

C C++ Java Python

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
// function to find prefix
void prefixSearch(char* pat, int m, int* pps) {
   int length = 0; 
   // array to store prefix
   pps[0] = 0;      
   int i = 1; 
   while(i < m) { 
      // to check if the current character matches the previous character
      if(pat[i] == pat[length]) { 
          // increment the length
         length++; 
         // store the length in the prefix array  
         pps[i] = length;  
      }else { 
         if(length != 0) { 
            // to update length of previous prefix length
            length = pps[length - 1]; 
            i--; 
         } else 
            // if the length is 0, store 0 in the prefix array
            pps[i] = 0; 
      }
      i++; // incrementing i 
   }
}
// function to search for pattern
void patrnSearch(char* orgnString, char* patt, int m, int *locArray, int *loc) {
   int n, i = 0, j = 0; 
   n = strlen(orgnString); 
   // array to store the prefix values  
   int* prefixArray = (int*)malloc(m * sizeof(int)); // allocate memory for the prefix array
   // calling prefix function to fill the prefix array
   prefixSearch(patt, m, prefixArray); 
   *loc = 0; // initialize the location index
   while(i < n) { 
       // checking if main string character matches pattern string character
      if(orgnString[i] == patt[j]) { 
         // increment both i and j
         i++; 
         j++; 
      }
      // if j and m are equal pattern is found
      if(j == m) { 
         // store the location of the pattern
         locArray[*loc] = i-j; 
         (*loc)++; // increment the location index
          // update j to the previous prefix value
         j = prefixArray[j-1];
      // checking if i is less than n and the current characters do not match
      }else if(i < n && patt[j] != orgnString[i]) { 
         if(j != 0) 
            // update j to the previous prefix value
            j = prefixArray[j-1]; 
         // if j is zero    
         else 
            i++; // increment i
      }
   }
   free(prefixArray); // free the memory of the prefix array
}
int main() {
    // declare the original text
   char* orgnStr = "AAAABCAEAAABCBDDAAAABC"; 
   // pattern to be found
   char* patrn = "AAABC"; 
   // get the size of the pattern
   int m = strlen(patrn);
   // array to store the locations of the pattern
   int locationArray[strlen(orgnStr)]; 
   // to store the number of locations
   int index; 
   // calling pattern search function
   patrnSearch(orgnStr, patrn, m, locationArray, &index); 
   // to loop through location array
   for(int i = 0; i<index; i++) { 
      // print the location of the pattern
      printf("Pattern found at location: %d\n", locationArray[i]); 
   }
}

#include<iostream>
using namespace std; 
// function to find prefix
void prefixSearch(string pattern, int m, int storePrefx[]) {
   int length = 0; 
   // array to store prefix
   storePrefx[0] = 0;      
   int i = 1; 
   while(i < m) { 
      // to check if the current character matches the previous character
      if(pattern[i] == pattern[length]) { 
         // increment the length
         length++; 
         // store the length in the prefix array  
         storePrefx[i] = length;  
      }else { 
         if(length != 0) { 
            // to update length of previous prefix length
            length = storePrefx[length - 1]; 
            i--; 
         } else 
            // if the length is 0, store 0 in the prefix array
            storePrefx[i] = 0; 
      }
      i++; // incrementing i 
   }
}
// function to search for pattern
void patrnSearch(string orgnString, string patt, int *locArray, int &loc) {
   int n, m, i = 0, j = 0; 
   n = orgnString.size(); 
   m = patt.size(); 
   // array to store the prefix values  
   int prefixArray[m];  
   // calling prefix function to fill the prefix array
   prefixSearch(patt, m, prefixArray); 
   loc = 0; // initialize the location index
   while(i < n) { 
      // checking if main string character matches pattern string character
      if(orgnString[i] == patt[j]) { 
         // increment both i and j
         i++; 
         j++; 
      }
      // if j and m are equal pattern is found
      if(j == m) { 
         // store the location of the pattern
         locArray[loc] = i-j; 
         loc++; // increment the location index
         // update j to the previous prefix value
         j = prefixArray[j-1];
      // checking if i is less than n and the current characters do not match
      }else if(i < n && patt[j] != orgnString[i]) { 
         if(j != 0) 
            // update j to the previous prefix value
            j = prefixArray[j-1]; 
         // if j is zero    
         else 
            i++; // increment i
      }
   }
}
int main() {
   // declare the original text
   string orgnStr = "AAAABCAEAAABCBDDAAAABC"; 
   // pattern to be found
   string patrn = "AAABC"; 
   // array to store the locations of the pattern
   int locationArray[orgnStr.size()]; 
   // to store the number of locations
   int index; 
   // calling pattern search function
   patrnSearch(orgnStr, patrn, locationArray, index); 
   // to loop through location array
   for(int i = 0; i<index; i++) { 
      // print the location of the pattern
      cout << "Pattern found at location: " <<locationArray[i] << endl; 
   }
}

import java.io.*; 
// class to implement the KMP algorithm
public class KMPalgo {
   // function to find prefix
   public static void prefixSearch(String pat, int m, int[] storePrefx) {
      int length = 0;
      // array to store prefix
      storePrefx[0] = 0;
      int i = 1;
      while (i < m) {
         // to check if the current character matches the previous character
         if (pat.charAt(i) == pat.charAt(length)) {
            // increment the length
            length++;
            // store the length in the prefix array
            storePrefx[i] = length;
         } else {
            if (length != 0) {
               // to update length of previous prefix length
               length = storePrefx[length - 1];
               i--;
            } else
               // if the length is 0, store 0 in the prefix array
               storePrefx[i] = 0;
            }
         i++; // incrementing i
      }
   }
   // function to search for pattern
   public static int patrnSearch(String orgnString, String patt, int[] locArray) {
      int n, m, i = 0, j = 0;
      n = orgnString.length();
      m = patt.length();
      // array to store the prefix values
      int[] prefixArray = new int[m]; // allocate memory for the prefix array
      // calling prefix function to fill the prefix array
      prefixSearch(patt, m, prefixArray);
      int loc = 0; // initialize the location index
      while (i < n) {
         // checking if main string character matches pattern string character
         if (orgnString.charAt(i) == patt.charAt(j)) {
            // increment both i and j
            i++;
            j++;
         }
         // if j and m are equal pattern is found
         if (j == m) {
            // store the location of the pattern
            locArray[loc] = i - j;
            loc++; // increment the location index
            // update j to the previous prefix value
            j = prefixArray[j - 1];
            // checking if i is less than n and the current characters do not match
         } else if (i < n && patt.charAt(j) != orgnString.charAt(i)) {
            if (j != 0)
               // update j to the previous prefix value
               j = prefixArray[j - 1];
               // if j is zero
            else
               i++; // increment i
         }
      }
      return loc;
   }
   public static void main(String[] args) throws IOException {
      // declare the original text
      String orgnStr = "AAAABCAEAAABCBDDAAAABC";
      // pattern to be found
      String patrn = "AAABC";
      // array to store the locations of the pattern
      int[] locationArray = new int[orgnStr.length()];
      // calling pattern search function
      int index = patrnSearch(orgnStr, patrn, locationArray);
      // to loop through location array
      for (int i = 0; i < index; i++) {
         // print the location of pattern
         System.out.println("Pattern found at location: " + locationArray[i]);
      }
   }
}

# function to find prefix
def prefix_search(pattern, m, store_prefx):
   length = 0
   # array to store prefix
   store_prefx[0] = 0
   i = 1
   while i < m:
      # to check if the current character matches the previous character
      if pattern[i] == pattern[length]:
         # increment the length
         length += 1
         # store the length in the prefix array
         store_prefx[i] = length
      else:
         if length != 0:
            # to update length of previous prefix length
            length = store_prefx[length - 1]
            i -= 1
         else:
            # if the length is 0, store 0 in the prefix array
            store_prefx[i] = 0
      i += 1  # incrementing i
# function to search for pattern
def pattern_search(orgn_string, patt, loc_array):
   n = len(orgn_string)
   m = len(patt)
   i = j = loc = 0
   # array to store the prefix values
   prefix_array = [0] * m
   # calling prefix function to fill the prefix array
   prefix_search(patt, m, prefix_array)
   while i < n:
      # checking if main string character matches pattern string character
      if orgn_string[i] == patt[j]:
         # increment both i and j
         i += 1
         j += 1
      # if j and m are equal pattern is found
      if j == m:
         # store the location of the pattern
         loc_array[loc] = i - j
         loc += 1  # increment the location index
         # update j to the previous prefix value
         j = prefix_array[j - 1]
      # checking if i is less than n and the current characters do not match
      elif i < n and patt[j] != orgn_string[i]:
         if j != 0:
            # update j to the previous prefix value
            j = prefix_array[j - 1]
         else:
            i += 1  # increment i
   return loc
# main function
def main():
   # declare the original text
   orgn_str = "AAAABCAEAAABCBDDAAAABC"
   # pattern to be found
   patrn = "AAABC"
   # array to store the locations of the pattern
   location_array = [0] * len(orgn_str)
   # calling pattern search function
   index = pattern_search(orgn_str, patrn, location_array)
   # to loop through location array
   for i in range(index):
      # print the location of the pattern
      print("Pattern found at location:", location_array[i])
# call the main function
if __name__ == "__main__":
   main()

Output

Pattern found at location: 1
Pattern found at location: 8
Pattern found at location: 17

Print Page