- DSA - Home
- DSA - Overview
- DSA - Environment Setup
- DSA - Algorithms Basics
- DSA - Asymptotic Analysis
- Data Structures
- DSA - Data Structure Basics
- DSA - Data Structures and Types
- DSA - Array Data Structure
- DSA - Skip List Data Structure
- Linked Lists
- DSA - Linked List Data Structure
- DSA - Doubly Linked List Data Structure
- DSA - Circular Linked List Data Structure
- Stack & Queue
- DSA - Stack Data Structure
- DSA - Expression Parsing
- DSA - Queue Data Structure
- DSA - Circular Queue Data Structure
- DSA - Priority Queue Data Structure
- DSA - Deque Data Structure
- Searching Algorithms
- DSA - Searching Algorithms
- DSA - Linear Search Algorithm
- DSA - Binary Search Algorithm
- DSA - Interpolation Search
- DSA - Jump Search Algorithm
- DSA - Exponential Search
- DSA - Fibonacci Search
- DSA - Sublist Search
- DSA - Hash Table
- Sorting Algorithms
- DSA - Sorting Algorithms
- DSA - Bubble Sort Algorithm
- DSA - Insertion Sort Algorithm
- DSA - Selection Sort Algorithm
- DSA - Merge Sort Algorithm
- DSA - Shell Sort Algorithm
- DSA - Heap Sort Algorithm
- DSA - Bucket Sort Algorithm
- DSA - Counting Sort Algorithm
- DSA - Radix Sort Algorithm
- DSA - Quick Sort Algorithm
- Matrices Data Structure
- DSA - Matrices Data Structure
- DSA - Lup Decomposition In Matrices
- DSA - Lu Decomposition In Matrices
- Graph Data Structure
- DSA - Graph Data Structure
- DSA - Depth First Traversal
- DSA - Breadth First Traversal
- DSA - Spanning Tree
- DSA - Topological Sorting
- DSA - Strongly Connected Components
- DSA - Biconnected Components
- DSA - Augmenting Path
- DSA - Network Flow Problems
- DSA - Flow Networks In Data Structures
- DSA - Edmonds Blossom Algorithm
- DSA - Maxflow Mincut Theorem
- Tree Data Structure
- DSA - Tree Data Structure
- DSA - Tree Traversal
- DSA - Binary Search Tree
- DSA - AVL Tree
- DSA - Red Black Trees
- DSA - B Trees
- DSA - B+ Trees
- DSA - Splay Trees
- DSA - Range Queries
- DSA - Segment Trees
- DSA - Fenwick Tree
- DSA - Fusion Tree
- DSA - Hashed Array Tree
- DSA - K-Ary Tree
- DSA - Kd Trees
- DSA - Priority Search Tree Data Structure
- Recursion
- DSA - Recursion Algorithms
- DSA - Tower of Hanoi Using Recursion
- DSA - Fibonacci Series Using Recursion
- Divide and Conquer
- DSA - Divide and Conquer
- DSA - Max-Min Problem
- DSA - Strassen's Matrix Multiplication
- DSA - Karatsuba Algorithm
- Greedy Algorithms
- DSA - Greedy Algorithms
- DSA - Travelling Salesman Problem (Greedy Approach)
- DSA - Prim's Minimal Spanning Tree
- DSA - Kruskal's Minimal Spanning Tree
- DSA - Dijkstra's Shortest Path Algorithm
- DSA - Map Colouring Algorithm
- DSA - Fractional Knapsack Problem
- DSA - Job Sequencing with Deadline
- DSA - Optimal Merge Pattern Algorithm
- Dynamic Programming
- DSA - Dynamic Programming
- DSA - Matrix Chain Multiplication
- DSA - Floyd Warshall Algorithm
- DSA - 0-1 Knapsack Problem
- DSA - Longest Common Sub-sequence Algorithm
- DSA - Travelling Salesman Problem (Dynamic Approach)
- Hashing
- DSA - Hashing Data Structure
- DSA - Collision In Hashing
- Disjoint Set
- DSA - Disjoint Set
- DSA - Path Compression And Union By Rank
- Heap
- DSA - Heap Data Structure
- DSA - Binary Heap
- DSA - Binomial Heap
- DSA - Fibonacci Heap
- Tries Data Structure
- DSA - Tries
- DSA - Standard Tries
- DSA - Compressed Tries
- DSA - Suffix Tries
- Treaps
- DSA - Treaps Data Structure
- Bit Mask
- DSA - Bit Mask In Data Structures
- Bloom Filter
- DSA - Bloom Filter Data Structure
- Approximation Algorithms
- DSA - Approximation Algorithms
- DSA - Vertex Cover Algorithm
- DSA - Set Cover Problem
- DSA - Travelling Salesman Problem (Approximation Approach)
- Randomized Algorithms
- DSA - Randomized Algorithms
- DSA - Randomized Quick Sort Algorithm
- DSA - Karger’s Minimum Cut Algorithm
- DSA - Fisher-Yates Shuffle Algorithm
- Miscellaneous
- DSA - Infix to Postfix
- DSA - Bellmon Ford Shortest Path
- DSA - Maximum Bipartite Matching
- DSA Useful Resources
- DSA - Questions and Answers
- DSA - Selection Sort Interview Questions
- DSA - Merge Sort Interview Questions
- DSA - Insertion Sort Interview Questions
- DSA - Heap Sort Interview Questions
- DSA - Bubble Sort Interview Questions
- DSA - Bucket Sort Interview Questions
- DSA - Radix Sort Interview Questions
- DSA - Cycle Sort Interview Questions
- DSA - Quick Guide
- DSA - Useful Resources
- DSA - Discussion
Knuth-Morris-Pratt Algorithm
KMP Algorithm for Pattern Matching
The KMP algorithm is used to solve the pattern matching problem which is a task of finding all the occurrences of a given pattern in a text. It is very useful when it comes to finding multiple patterns. For instance, if the text is "aabbaaccaabbaadde" and the pattern is "aabaa", then the pattern occurs twice in the text, at indices 0 and 8.
The naive solution to this problem is to compare the pattern with every possible substring of the text, starting from the leftmost position and moving rightwards. This takes O(n*m) time, where 'n' is the length of the text and 'm' is the length of the pattern.
When we work with long text documents, the brute force and naive approaches may result in redundant comparisons. To avoid such redundancy, Knuth, Morris, and Pratt developed a linear sequence-matching algorithm named the KMP pattern matching algorithm. It is also referred to as Knuth Morris Pratt pattern matching algorithm.
How does KMP Algorithm work?
The KMP algorithm starts the search operation from left to right. It uses the prefix function to avoid unnecessary comparisons while searching for the pattern. This function stores the number of characters matched so far which is known as LPS value. The following steps are involved in KMP algorithm −
Define a prefix function.
Slide the pattern over the text for comparison.
If all the characters match, we have found a match.
If not, use the prefix function to skip the unnecessary comparisons. If the LPS value of previous character from the mismatched character is '0', then start comparison from index 0 of pattern with the next character in the text. However, if the LPS value is more than '0', start the comparison from index value equal to LPS value of the previously mismatched character.
The KMP algorithm takes O(n + m) time and O(m) space. It is faster than the naive solution because it skips the redundant comparisons, and only compares each character of the text at most once.
Let's understand the input-output scenario of a pattern matching problem with an example −
Input: main String: "AAAABCAAAABCBAAAABC" pattern: "AAABC" Output: Pattern found at position: 1 Pattern found at position: 7 Pattern found at position: 14
Example
The following example practically illustrates the KMP algorithm for pattern matching.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
// function to find prefix
void prefixSearch(char* pat, int m, int* pps) {
int length = 0;
// array to store prefix
pps[0] = 0;
int i = 1;
while(i < m) {
// to check if the current character matches the previous character
if(pat[i] == pat[length]) {
// increment the length
length++;
// store the length in the prefix array
pps[i] = length;
}else {
if(length != 0) {
// to update length of previous prefix length
length = pps[length - 1];
i--;
} else
// if the length is 0, store 0 in the prefix array
pps[i] = 0;
}
i++; // incrementing i
}
}
// function to search for pattern
void patrnSearch(char* orgnString, char* patt, int m, int *locArray, int *loc) {
int n, i = 0, j = 0;
n = strlen(orgnString);
// array to store the prefix values
int* prefixArray = (int*)malloc(m * sizeof(int)); // allocate memory for the prefix array
// calling prefix function to fill the prefix array
prefixSearch(patt, m, prefixArray);
*loc = 0; // initialize the location index
while(i < n) {
// checking if main string character matches pattern string character
if(orgnString[i] == patt[j]) {
// increment both i and j
i++;
j++;
}
// if j and m are equal pattern is found
if(j == m) {
// store the location of the pattern
locArray[*loc] = i-j;
(*loc)++; // increment the location index
// update j to the previous prefix value
j = prefixArray[j-1];
// checking if i is less than n and the current characters do not match
}else if(i < n && patt[j] != orgnString[i]) {
if(j != 0)
// update j to the previous prefix value
j = prefixArray[j-1];
// if j is zero
else
i++; // increment i
}
}
free(prefixArray); // free the memory of the prefix array
}
int main() {
// declare the original text
char* orgnStr = "AAAABCAEAAABCBDDAAAABC";
// pattern to be found
char* patrn = "AAABC";
// get the size of the pattern
int m = strlen(patrn);
// array to store the locations of the pattern
int locationArray[strlen(orgnStr)];
// to store the number of locations
int index;
// calling pattern search function
patrnSearch(orgnStr, patrn, m, locationArray, &index);
// to loop through location array
for(int i = 0; i<index; i++) {
// print the location of the pattern
printf("Pattern found at location: %d\n", locationArray[i]);
}
}
#include<iostream>
using namespace std;
// function to find prefix
void prefixSearch(string pattern, int m, int storePrefx[]) {
int length = 0;
// array to store prefix
storePrefx[0] = 0;
int i = 1;
while(i < m) {
// to check if the current character matches the previous character
if(pattern[i] == pattern[length]) {
// increment the length
length++;
// store the length in the prefix array
storePrefx[i] = length;
}else {
if(length != 0) {
// to update length of previous prefix length
length = storePrefx[length - 1];
i--;
} else
// if the length is 0, store 0 in the prefix array
storePrefx[i] = 0;
}
i++; // incrementing i
}
}
// function to search for pattern
void patrnSearch(string orgnString, string patt, int *locArray, int &loc) {
int n, m, i = 0, j = 0;
n = orgnString.size();
m = patt.size();
// array to store the prefix values
int prefixArray[m];
// calling prefix function to fill the prefix array
prefixSearch(patt, m, prefixArray);
loc = 0; // initialize the location index
while(i < n) {
// checking if main string character matches pattern string character
if(orgnString[i] == patt[j]) {
// increment both i and j
i++;
j++;
}
// if j and m are equal pattern is found
if(j == m) {
// store the location of the pattern
locArray[loc] = i-j;
loc++; // increment the location index
// update j to the previous prefix value
j = prefixArray[j-1];
// checking if i is less than n and the current characters do not match
}else if(i < n && patt[j] != orgnString[i]) {
if(j != 0)
// update j to the previous prefix value
j = prefixArray[j-1];
// if j is zero
else
i++; // increment i
}
}
}
int main() {
// declare the original text
string orgnStr = "AAAABCAEAAABCBDDAAAABC";
// pattern to be found
string patrn = "AAABC";
// array to store the locations of the pattern
int locationArray[orgnStr.size()];
// to store the number of locations
int index;
// calling pattern search function
patrnSearch(orgnStr, patrn, locationArray, index);
// to loop through location array
for(int i = 0; i<index; i++) {
// print the location of the pattern
cout << "Pattern found at location: " <<locationArray[i] << endl;
}
}
import java.io.*;
// class to implement the KMP algorithm
public class KMPalgo {
// function to find prefix
public static void prefixSearch(String pat, int m, int[] storePrefx) {
int length = 0;
// array to store prefix
storePrefx[0] = 0;
int i = 1;
while (i < m) {
// to check if the current character matches the previous character
if (pat.charAt(i) == pat.charAt(length)) {
// increment the length
length++;
// store the length in the prefix array
storePrefx[i] = length;
} else {
if (length != 0) {
// to update length of previous prefix length
length = storePrefx[length - 1];
i--;
} else
// if the length is 0, store 0 in the prefix array
storePrefx[i] = 0;
}
i++; // incrementing i
}
}
// function to search for pattern
public static int patrnSearch(String orgnString, String patt, int[] locArray) {
int n, m, i = 0, j = 0;
n = orgnString.length();
m = patt.length();
// array to store the prefix values
int[] prefixArray = new int[m]; // allocate memory for the prefix array
// calling prefix function to fill the prefix array
prefixSearch(patt, m, prefixArray);
int loc = 0; // initialize the location index
while (i < n) {
// checking if main string character matches pattern string character
if (orgnString.charAt(i) == patt.charAt(j)) {
// increment both i and j
i++;
j++;
}
// if j and m are equal pattern is found
if (j == m) {
// store the location of the pattern
locArray[loc] = i - j;
loc++; // increment the location index
// update j to the previous prefix value
j = prefixArray[j - 1];
// checking if i is less than n and the current characters do not match
} else if (i < n && patt.charAt(j) != orgnString.charAt(i)) {
if (j != 0)
// update j to the previous prefix value
j = prefixArray[j - 1];
// if j is zero
else
i++; // increment i
}
}
return loc;
}
public static void main(String[] args) throws IOException {
// declare the original text
String orgnStr = "AAAABCAEAAABCBDDAAAABC";
// pattern to be found
String patrn = "AAABC";
// array to store the locations of the pattern
int[] locationArray = new int[orgnStr.length()];
// calling pattern search function
int index = patrnSearch(orgnStr, patrn, locationArray);
// to loop through location array
for (int i = 0; i < index; i++) {
// print the location of pattern
System.out.println("Pattern found at location: " + locationArray[i]);
}
}
}
# function to find prefix
def prefix_search(pattern, m, store_prefx):
length = 0
# array to store prefix
store_prefx[0] = 0
i = 1
while i < m:
# to check if the current character matches the previous character
if pattern[i] == pattern[length]:
# increment the length
length += 1
# store the length in the prefix array
store_prefx[i] = length
else:
if length != 0:
# to update length of previous prefix length
length = store_prefx[length - 1]
i -= 1
else:
# if the length is 0, store 0 in the prefix array
store_prefx[i] = 0
i += 1 # incrementing i
# function to search for pattern
def pattern_search(orgn_string, patt, loc_array):
n = len(orgn_string)
m = len(patt)
i = j = loc = 0
# array to store the prefix values
prefix_array = [0] * m
# calling prefix function to fill the prefix array
prefix_search(patt, m, prefix_array)
while i < n:
# checking if main string character matches pattern string character
if orgn_string[i] == patt[j]:
# increment both i and j
i += 1
j += 1
# if j and m are equal pattern is found
if j == m:
# store the location of the pattern
loc_array[loc] = i - j
loc += 1 # increment the location index
# update j to the previous prefix value
j = prefix_array[j - 1]
# checking if i is less than n and the current characters do not match
elif i < n and patt[j] != orgn_string[i]:
if j != 0:
# update j to the previous prefix value
j = prefix_array[j - 1]
else:
i += 1 # increment i
return loc
# main function
def main():
# declare the original text
orgn_str = "AAAABCAEAAABCBDDAAAABC"
# pattern to be found
patrn = "AAABC"
# array to store the locations of the pattern
location_array = [0] * len(orgn_str)
# calling pattern search function
index = pattern_search(orgn_str, patrn, location_array)
# to loop through location array
for i in range(index):
# print the location of the pattern
print("Pattern found at location:", location_array[i])
# call the main function
if __name__ == "__main__":
main()
Output
Pattern found at location: 1 Pattern found at location: 8 Pattern found at location: 17