- DSA - Home
- DSA - Overview
- DSA - Environment Setup
- DSA - Algorithms Basics
- DSA - Asymptotic Analysis
- Data Structures
- DSA - Data Structure Basics
- DSA - Data Structures and Types
- DSA - Array Data Structure
- DSA - Skip List Data Structure
- Linked Lists
- DSA - Linked List Data Structure
- DSA - Doubly Linked List Data Structure
- DSA - Circular Linked List Data Structure
- Stack & Queue
- DSA - Stack Data Structure
- DSA - Expression Parsing
- DSA - Queue Data Structure
- DSA - Circular Queue Data Structure
- DSA - Priority Queue Data Structure
- DSA - Deque Data Structure
- Searching Algorithms
- DSA - Searching Algorithms
- DSA - Linear Search Algorithm
- DSA - Binary Search Algorithm
- DSA - Interpolation Search
- DSA - Jump Search Algorithm
- DSA - Exponential Search
- DSA - Fibonacci Search
- DSA - Sublist Search
- DSA - Hash Table
- Sorting Algorithms
- DSA - Sorting Algorithms
- DSA - Bubble Sort Algorithm
- DSA - Insertion Sort Algorithm
- DSA - Selection Sort Algorithm
- DSA - Merge Sort Algorithm
- DSA - Shell Sort Algorithm
- DSA - Heap Sort Algorithm
- DSA - Bucket Sort Algorithm
- DSA - Counting Sort Algorithm
- DSA - Radix Sort Algorithm
- DSA - Quick Sort Algorithm
- Matrices Data Structure
- DSA - Matrices Data Structure
- DSA - Lup Decomposition In Matrices
- DSA - Lu Decomposition In Matrices
- Graph Data Structure
- DSA - Graph Data Structure
- DSA - Depth First Traversal
- DSA - Breadth First Traversal
- DSA - Spanning Tree
- DSA - Topological Sorting
- DSA - Strongly Connected Components
- DSA - Biconnected Components
- DSA - Augmenting Path
- DSA - Network Flow Problems
- DSA - Flow Networks In Data Structures
- DSA - Edmonds Blossom Algorithm
- DSA - Maxflow Mincut Theorem
- Tree Data Structure
- DSA - Tree Data Structure
- DSA - Tree Traversal
- DSA - Binary Search Tree
- DSA - AVL Tree
- DSA - Red Black Trees
- DSA - B Trees
- DSA - B+ Trees
- DSA - Splay Trees
- DSA - Range Queries
- DSA - Segment Trees
- DSA - Fenwick Tree
- DSA - Fusion Tree
- DSA - Hashed Array Tree
- DSA - K-Ary Tree
- DSA - Kd Trees
- DSA - Priority Search Tree Data Structure
- Recursion
- DSA - Recursion Algorithms
- DSA - Tower of Hanoi Using Recursion
- DSA - Fibonacci Series Using Recursion
- Divide and Conquer
- DSA - Divide and Conquer
- DSA - Max-Min Problem
- DSA - Strassen's Matrix Multiplication
- DSA - Karatsuba Algorithm
- Greedy Algorithms
- DSA - Greedy Algorithms
- DSA - Travelling Salesman Problem (Greedy Approach)
- DSA - Prim's Minimal Spanning Tree
- DSA - Kruskal's Minimal Spanning Tree
- DSA - Dijkstra's Shortest Path Algorithm
- DSA - Map Colouring Algorithm
- DSA - Fractional Knapsack Problem
- DSA - Job Sequencing with Deadline
- DSA - Optimal Merge Pattern Algorithm
- Dynamic Programming
- DSA - Dynamic Programming
- DSA - Matrix Chain Multiplication
- DSA - Floyd Warshall Algorithm
- DSA - 0-1 Knapsack Problem
- DSA - Longest Common Sub-sequence Algorithm
- DSA - Travelling Salesman Problem (Dynamic Approach)
- Hashing
- DSA - Hashing Data Structure
- DSA - Collision In Hashing
- Disjoint Set
- DSA - Disjoint Set
- DSA - Path Compression And Union By Rank
- Heap
- DSA - Heap Data Structure
- DSA - Binary Heap
- DSA - Binomial Heap
- DSA - Fibonacci Heap
- Tries Data Structure
- DSA - Tries
- DSA - Standard Tries
- DSA - Compressed Tries
- DSA - Suffix Tries
- Treaps
- DSA - Treaps Data Structure
- Bit Mask
- DSA - Bit Mask In Data Structures
- Bloom Filter
- DSA - Bloom Filter Data Structure
- Approximation Algorithms
- DSA - Approximation Algorithms
- DSA - Vertex Cover Algorithm
- DSA - Set Cover Problem
- DSA - Travelling Salesman Problem (Approximation Approach)
- Randomized Algorithms
- DSA - Randomized Algorithms
- DSA - Randomized Quick Sort Algorithm
- DSA - Karger’s Minimum Cut Algorithm
- DSA - Fisher-Yates Shuffle Algorithm
- Miscellaneous
- DSA - Infix to Postfix
- DSA - Bellmon Ford Shortest Path
- DSA - Maximum Bipartite Matching
- DSA Useful Resources
- DSA - Questions and Answers
- DSA - Selection Sort Interview Questions
- DSA - Merge Sort Interview Questions
- DSA - Insertion Sort Interview Questions
- DSA - Heap Sort Interview Questions
- DSA - Bubble Sort Interview Questions
- DSA - Bucket Sort Interview Questions
- DSA - Radix Sort Interview Questions
- DSA - Cycle Sort Interview Questions
- DSA - Quick Guide
- DSA - Useful Resources
- DSA - Discussion
Kasai's Algorithm
Kasai's algorithm is used for constructing the longest common prefix (also referred to as LCP) array from a given suffix array and a text. Once we construct the LCP array, we can efficiently search for a pattern within the given text. We have discussed several algorithms that can solve pattern-matching problems efficiently including the KMP algorithm, the Boyer-Moore algorithm, and the Rabin-Karp algorithm. In this tutorial, we will explore Kasai's algorithm.
How Kasai's Algorithm works?
To understand Kasai' algorithm, we first need to learn the two core concepts of this algorithm −
Suffix Array − This is an array that stores the starting indices of all the suffixes present within a given text in lexicographic order.
LCP Array − As the name suggests, it is the longest common prefix (in short LCP) of two strings is the longest string that is a prefix of both strings.
The Kasai's algorithm is based on the following observation −
If the LCP of two suffixes starting at positions i and j is k, then the LCP of the suffixes starting at i+1 and j+1 is at least k-1, unless one of them is the last suffix in the suffix array. This is because the relative order of the characters in the suffixes remains the same after removing the first character unless they reach the end of the text. Therefore, we can use this property to compute the LCP values in a linear scan of the suffix array, starting from the first suffix and keeping track of the current LCP value in a variable k.
Whenever we move to the next suffix pair, we decrement k by one and then increment it as long as the characters at positions i+k and j+k match. To find the next suffix pair, we use an inverse array that maps each suffix index to its position in the suffix array.
Let's consider the input-output scenario for Kasai's algorithm −
Input: string: "AABAAABCEDBABCDDEBC" Output: Suffix Array: 0 1 9 3 8 2 14 10 4 11 5 15 7 12 13 6 Common Prefix Array: 1 2 3 0 4 1 2 2 0 1 1 1 1 0 1 0
Example
The following example practically demonstrates the Kasai's algorithm in different programming languages.
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
// Defining a structure to represent suffix
struct suffixes {
int index;
int rank[2];
};
// function to compare two suffixes
int compare(const void* a, const void* b) {
struct suffixes* suf1 = (struct suffixes*)a;
struct suffixes* suf2 = (struct suffixes*)b;
// If first rank is same
if(suf1->rank[0] == suf2->rank[0]) {
// Compare second rank
return (suf1->rank[1] < suf2->rank[1]) ? -1 : 1;
}else {
return (suf1->rank[0] < suf2->rank[0]) ? -1 : 1;
}
}
// function to build suffix array
int* createSuffArray(char* orgnlString, int n) {
struct suffixes suffArray[n];
for (int i = 0; i < n; i++) {
suffArray[i].index = i;
// Rank based on character itself
suffArray[i].rank[0] = orgnlString[i] - 'a';
// Next rank is next character
suffArray[i].rank[1] = ((i+1)<n)?(orgnlString[i+1]-'a'):-1;
}
// Sorting the suffixes
qsort(suffArray, n, sizeof(struct suffixes), compare);
int index[n];
for (int k = 4; k < 2*n; k = k*2) {
int currRank = 0;
int prevRank = suffArray[0].rank[0];
suffArray[0].rank[0] = currRank;
index[suffArray[0].index] = 0;
// to assign rank and index values to first suffix
for (int i = 1; i < n; i++) {
if (suffArray[i].rank[0] == prevRank && suffArray[i].rank[1] == suffArray[i-1].rank[1]) {
prevRank = suffArray[i].rank[0];
// If same as previous rank, assign the same new rank
suffArray[i].rank[0] = currRank;
} else{
prevRank = suffArray[i].rank[0];
// increment rank and assign
suffArray[i].rank[0] = ++currRank;
}
index[suffArray[i].index] = i;
}
for (int i = 0; i < n; i++) {
int nextIndex = suffArray[i].index + k/2;
suffArray[i].rank[1] = (nextIndex < n)? suffArray[index[nextIndex]].rank[0]: -1;
}
qsort(suffArray, n, sizeof(struct suffixes), compare);
}
// to store indexes of all sorted suffixes
int* suffixVector = (int*)malloc(n * sizeof(int));
for (int i = 0; i < n; i++)
suffixVector[i] = suffArray[i].index;
return suffixVector;
}
// applying Kasai's algorithm to build LCP array
int* kasaiAlgorithm(char* orgnlString, int* suffixVector, int n) {
// To store lcp array
int* longPrefix = (int*)malloc(n * sizeof(int));
// To store inverse of suffix array elements
int* suffixInverse = (int*)malloc(n * sizeof(int));
// to fill values in suffixInverse[] array
for (int i=0; i < n; i++)
suffixInverse[suffixVector[i]] = i;
int k = 0;
for (int i=0; i<n; i++) {
if (suffixInverse[i] == n-1) {
k = 0;
continue;
}
int j = suffixVector[suffixInverse[i]+1];
while (i+k<n && j+k<n && orgnlString[i+k]==orgnlString[j+k])
k++;
longPrefix[suffixInverse[i]] = k;
if (k>0)
k--;
}
return longPrefix;
}
void displayArray(int* vec, int n) {
for (int i = 0; i < n; i++)
printf("%d ", vec[i]);
printf("\n");
}
int main() {
char orgnlString[] = "AAABCAEAAABCBDDAAAABC";
int n = strlen(orgnlString);
int* suffArray = createSuffArray(orgnlString, n);
printf("Suffix Array is: \n");
displayArray(suffArray, n);
// calling function to build LCP array
int* prefixCommn = kasaiAlgorithm(orgnlString, suffArray, n);
// Print the LCP array
printf("Common Prefix Array is: \n");
displayArray(prefixCommn, n);
return 0;
}
#include<iostream>
#include<vector>
#include<algorithm>
using namespace std;
// Defining a structure to represent suffix
struct suffixes {
int index;
int rank[2];
};
// function to compare two suffixes
bool compare(suffixes suf1, suffixes suf2) {
// If first rank is same
if(suf1.rank[0] == suf2.rank[0]) {
// Compare second rank
if(suf1.rank[1] < suf2.rank[1])
return true;
else
return false;
}else {
if(suf1.rank[0] < suf2.rank[0])
return true;
else
return false;
}
}
// function to build suffix array
vector<int> createSuffArray(string orgnlString) {
int n = orgnlString.size();
suffixes suffArray[n];
for (int i = 0; i < n; i++) {
suffArray[i].index = i;
// Rank based on character itself
suffArray[i].rank[0] = orgnlString[i] - 'a';
// Next rank is next character
suffArray[i].rank[1] = ((i+1)<n)?(orgnlString[i+1]-'a'):-1;
}
// Sorting the suffixes
sort(suffArray, suffArray+n, compare);
int index[n];
for (int k = 4; k < 2*n; k = k*2) {
int currRank = 0;
int prevRank = suffArray[0].rank[0];
suffArray[0].rank[0] = currRank;
index[suffArray[0].index] = 0;
// to assign rank and index values to first suffix
for (int i = 1; i < n; i++) {
if (suffArray[i].rank[0] == prevRank && suffArray[i].rank[1] == suffArray[i-1].rank[1]) {
prevRank = suffArray[i].rank[0];
// If same as previous rank, assign the same new rank
suffArray[i].rank[0] = currRank;
} else{
prevRank = suffArray[i].rank[0];
// increment rank and assign
suffArray[i].rank[0] = ++currRank;
}
index[suffArray[i].index] = i;
}
for (int i = 0; i < n; i++) {
int nextIndex = suffArray[i].index + k/2;
suffArray[i].rank[1] = (nextIndex < n)? suffArray[index[nextIndex]].rank[0]: -1;
}
sort(suffArray, suffArray+n, compare);
}
// to store indexes of all sorted suffixes
vector<int>suffixVector;
for (int i = 0; i < n; i++)
suffixVector.push_back(suffArray[i].index);
return suffixVector;
}
// applying Kasai's algorithm to build LCP array
vector<int> kasaiAlgorithm(string orgnlString, vector<int> suffixVector) {
int n = suffixVector.size();
// To store lcp array
vector<int> longPrefix(n, 0);
// To store inverse of suffix array elements
vector<int> suffixInverse(n, 0);
// to fill values in suffixInverse[] array
for (int i=0; i < n; i++)
suffixInverse[suffixVector[i]] = i;
int k = 0;
for (int i=0; i<n; i++) {
if (suffixInverse[i] == n-1) {
k = 0;
continue;
}
int j = suffixVector[suffixInverse[i]+1];
while (i+k<n && j+k<n && orgnlString[i+k]==orgnlString[j+k])
k++;
longPrefix[suffixInverse[i]] = k;
if (k>0)
k--;
}
return longPrefix;
}
void displayArray(vector<int> vec) {
vector<int>::iterator it;
for (it = vec.begin(); it < vec.end() ; it++)
cout << *it << " ";
cout << endl;
}
int main() {
string orgnlString = "AAABCAEAAABCBDDAAAABC";
vector<int>suffArray = createSuffArray(orgnlString);
int n = suffArray.size();
cout<< "Suffix Array is: "<<endl;
displayArray(suffArray);
// calling function to build LCP array
vector<int>prefixCommn = kasaiAlgorithm(orgnlString, suffArray);
// Print the LCP array
cout<< "Common Prefix Array is: "<<endl;
displayArray(prefixCommn);
}
import java.util.Arrays;
public class Main {
// Defining a class to represent suffix
static class suffixes {
int index;
int[] rank = new int[2];
}
// method to compare two suffixes
static int compare(suffixes suf1, suffixes suf2) {
// If first rank is same
if (suf1.rank[0] == suf2.rank[0]) {
// Compare second rank
if (suf1.rank[1] < suf2.rank[1])
return -1;
else
return 1;
} else {
if (suf1.rank[0] < suf2.rank[0])
return -1;
else
return 1;
}
}
// method to build suffix array
static int[] createSuffArray(String orgnlString) {
int n = orgnlString.length();
suffixes[] suffArray = new suffixes[n];
for (int i = 0; i < n; i++) {
suffArray[i] = new suffixes();
suffArray[i].index = i;
// Rank based on character itself
suffArray[i].rank[0] = orgnlString.charAt(i) - 'a';
// Next rank is next character
suffArray[i].rank[1] = ((i + 1) < n) ? (orgnlString.charAt(i + 1) - 'a') : -1;
}
// Sorting the suffixes
Arrays.sort(suffArray, Main::compare);
int[] index = new int[n];
for (int k = 4; k < 2 * n; k = k * 2) {
int currRank = 0;
int prevRank = suffArray[0].rank[0];
suffArray[0].rank[0] = currRank;
index[suffArray[0].index] = 0;
// to assign rank and index values to first suffix
for (int i = 1; i < n; i++) {
if (suffArray[i].rank[0] == prevRank && suffArray[i].rank[1] == suffArray[i - 1].rank[1]) {
prevRank = suffArray[i].rank[0];
// If same as previous rank, assign the same new rank
suffArray[i].rank[0] = currRank;
} else {
prevRank = suffArray[i].rank[0];
// increment rank and assign
suffArray[i].rank[0] = ++currRank;
}
index[suffArray[i].index] = i;
}
for (int i = 0; i < n; i++) {
int nextIndex = suffArray[i].index + k / 2;
suffArray[i].rank[1] = (nextIndex < n) ? suffArray[index[nextIndex]].rank[0] : -1;
}
Arrays.sort(suffArray, Main::compare);
}
// to store indexes of all sorted suffixes
int[] suffixVector = new int[n];
for (int i = 0; i < n; i++)
suffixVector[i] = suffArray[i].index;
return suffixVector;
}
// applying Kasai's algorithm to build LCP array
static int[] kasaiAlgorithm(String orgnlString, int[] suffixVector) {
int n = suffixVector.length;
// To store lcp array
int[] longPrefix = new int[n];
// To store inverse of suffix array elements
int[] suffixInverse = new int[n];
// to fill values in suffixInverse[] array
for (int i = 0; i < n; i++)
suffixInverse[suffixVector[i]] = i;
int k = 0;
for (int i = 0; i < n; i++) {
if (suffixInverse[i] == n - 1) {
k = 0;
continue;
}
int j = suffixVector[suffixInverse[i] + 1];
while (i + k < n && j + k < n && orgnlString.charAt(i + k) == orgnlString.charAt(j + k))
k++;
longPrefix[suffixInverse[i]] = k;
if (k > 0)
k--;
}
return longPrefix;
}
static void displayArray(int[] vec) {
for (int i : vec)
System.out.print(i + " ");
System.out.println();
}
public static void main(String[] args) {
String orgnlString = "AAABCAEAAABCBDDAAAABC";
int[] suffArray = createSuffArray(orgnlString);
System.out.println("Suffix Array is: ");
displayArray(suffArray);
// calling method to build LCP array
int[] prefixCommn = kasaiAlgorithm(orgnlString, suffArray);
// Print the LCP array
System.out.println("Common Prefix Array is: ");
displayArray(prefixCommn);
}
}
# Defining a class to represent suffix
class Suffix:
def __init__(self):
self.index = 0
self.rank = [0, 0]
# function to compare two suffixes
def compare(a, b):
if a.rank[0] == b.rank[0]:
if a.rank[1] < b.rank[1]:
return -1
else:
return 1
else:
if a.rank[0] < b.rank[0]:
return -1
else:
return 1
# function to build suffix array
def createSuffArray(orgnlString):
n = len(orgnlString)
suffArray = [Suffix() for _ in range(n)]
for i in range(n):
suffArray[i].index = i
suffArray[i].rank[0] = ord(orgnlString[i]) - ord('a')
suffArray[i].rank[1] = ord(orgnlString[i + 1]) - ord('a') if ((i + 1) < n) else -1
suffArray = sorted(suffArray, key=lambda x: (x.rank[0], x.rank[1]))
ind = [0]*n
k = 4
while k < 2*n:
rank = 0
prev_rank = suffArray[0].rank[0]
suffArray[0].rank[0] = rank
ind[suffArray[0].index] = 0
for i in range(1, n):
if suffArray[i].rank[0] == prev_rank and suffArray[i].rank[1] == suffArray[i - 1].rank[1]:
prev_rank = suffArray[i].rank[0]
suffArray[i].rank[0] = rank
else:
prev_rank = suffArray[i].rank[0]
rank += 1
suffArray[i].rank[0] = rank
ind[suffArray[i].index] = i
for i in range(n):
nextIndex = suffArray[i].index + k//2
suffArray[i].rank[1] = suffArray[ind[nextIndex]].rank[0] if (nextIndex < n) else -1
suffArray = sorted(suffArray, key=lambda x: (x.rank[0], x.rank[1]))
k *= 2
suffixVector = [0]*n
for i in range(n):
suffixVector[i] = suffArray[i].index
return suffixVector
# applying Kasai's algorithm to build LCP array
def kasaiAlgorithm(orgnlString, suffixVector):
n = len(suffixVector)
longPrefix = [0]*n
suffixInverse = [0]*n
for i in range(n):
suffixInverse[suffixVector[i]] = i
k = 0
for i in range(n):
if suffixInverse[i] == n - 1:
k = 0
continue
j = suffixVector[suffixInverse[i] + 1]
while i + k < n and j + k < n and orgnlString[i + k] == orgnlString[j + k]:
k += 1
longPrefix[suffixInverse[i]] = k
if k > 0:
k -= 1
return longPrefix
# Function to print an array
def displayArray(vec):
for i in vec:
print(i, end=' ')
print()
def main():
orgnlString = "AAABCAEAAABCBDDAAAABC"
suffArray = createSuffArray(orgnlString)
print("Suffix Array is: ")
displayArray(suffArray)
prefixCommn = kasaiAlgorithm(orgnlString, suffArray)
print("Common Prefix Array is: ")
displayArray(prefixCommn)
if __name__ == "__main__":
main()
Output
Suffix Array is: 15 0 7 16 17 1 8 2 9 18 5 19 3 10 12 4 11 20 14 13 6 Common Prefix Array is: 3 5 5 2 4 4 4 3 3 3 0 2 2 2 0 1 1 1 1 0 0