- Design and Analysis of Algorithms
- Home
- Basics of Algorithms
- DAA - Introduction
- DAA - Analysis of Algorithms
- DAA - Methodology of Analysis
- Asymptotic Notations & Apriori Analysis
- Time Complexity
- Master’s Theorem
- DAA - Space Complexities
- Divide & Conquer
- DAA - Divide & Conquer
- DAA - Max-Min Problem
- DAA - Merge Sort
- DAA - Binary Search
- Strassen’s Matrix Multiplication
- Karatsuba Algorithm
- Towers of Hanoi
- Greedy Algorithms
- DAA - Greedy Method
- Travelling Salesman Problem
- Prim's Minimal Spanning Tree
- Kruskal’s Minimal Spanning Tree
- Dijkstra’s Shortest Path Algorithm
- Map Colouring Algorithm
- DAA - Fractional Knapsack
- DAA - Job Sequencing with Deadline
- DAA - Optimal Merge Pattern
- Dynamic Programming
- DAA - Dynamic Programming
- Matrix Chain Multiplication
- Floyd Warshall Algorithm
- DAA - 0-1 Knapsack
- Longest Common Subsequence
- Travelling Salesman Problem | Dynamic Programming
- Randomized Algorithms
- Randomized Algorithms
- Randomized Quick Sort
- Karger’s Minimum Cut
- Fisher-Yates Shuffle
- Approximation Algorithms
- Approximation Algorithms
- Vertex Cover Problem
- Set Cover Problem
- Travelling Salesperson Approximation Algorithm
- Graph Theory
- DAA - Spanning Tree
- DAA - Shortest Paths
- DAA - Multistage Graph
- Optimal Cost Binary Search Trees
- Heap Algorithms
- DAA - Binary Heap
- DAA - Insert Method
- DAA - Heapify Method
- DAA - Extract Method
- Sorting Techniques
- DAA - Bubble Sort
- DAA - Insertion Sort
- DAA - Selection Sort
- DAA - Shell Sort
- DAA - Heap Sort
- DAA - Bucket Sort
- DAA - Counting Sort
- DAA - Radix Sort
- Searching Techniques
- Searching Techniques Introduction
- DAA - Linear Search
- DAA - Binary Search
- DAA - Interpolation Search
- DAA - Jump Search
- DAA - Exponential Search
- DAA - Fibonacci Search
- DAA - Sublist Search
- Complexity Theory
- Deterministic vs. Nondeterministic Computations
- DAA - Max Cliques
- DAA - Vertex Cover
- DAA - P and NP Class
- DAA - Cook’s Theorem
- NP Hard & NP-Complete Classes
- DAA - Hill Climbing Algorithm
- DAA Useful Resources
- DAA - Quick Guide
- DAA - Useful Resources
- DAA - Discussion

# Longest Common Subsequence

The longest common subsequence problem is finding the longest sequence which exists in both the given strings.

But before we understand the problem, let us understand what the term subsequence is −

Let us consider a sequence S = <s_{1}, s_{2}, s_{3}, s_{4}, …,s_{n}>. And another sequence Z = <z_{1}, z_{2}, z_{3}, …,z_{m}> over S is called a subsequence of S, if and only if it can be derived from S deletion of some elements. In simple words, a subsequence consists of consecutive elements that make up a small part in a sequence.

### Common Subsequence

Suppose, * X* and

*are two sequences over a finite set of elements. We can say that*

**Y***is a common subsequence of*

**Z***and*

**X***, if*

**Y***is a subsequence of both*

**Z***and*

**X***.*

**Y**### Longest Common Subsequence

If a set of sequences are given, the longest common subsequence problem is to find a common subsequence of all the sequences that is of maximal length.

### Naïve Method

Let * X* be a sequence of length m and

*a sequence of length n. Check for every subsequence of*

**Y***whether it is a subsequence of*

**X***, and return the longest common subsequence found.*

**Y**There are **2 ^{m}** subsequences of

*. Testing sequences whether or not it is a subsequence of*

**X***takes O(n) time. Thus, the naïve algorithm would take O(n2*

**Y**^{m}) time.

## Longest Common Subsequence Algorithm

Let X=<x_{1},x_{2},x_{3}....,x_{m}> and Y=<y_{1},y_{2},y_{3}....,y_{m}> be the sequences. To compute the length of an element the following algorithm is used.

**Step 1** − Construct an empty adjacency table with the size, n × m, where n = size of sequence **X** and m = size of sequence **Y**. The rows in the table represent the elements in sequence X and columns represent the elements in sequence Y.

**Step 2** − The zeroeth rows and columns must be filled with zeroes. And the remaining values are filled in based on different cases, by maintaining a counter value.

**Case 1**− If the counter encounters common element in both X and Y sequences, increment the counter by 1.**Case 2**− If the counter does not encounter common elements in X and Y sequences at T[i, j], find the maximum value between T[i-1, j] and T[i, j-1] to fill it in T[i, j].

**Step 3** − Once the table is filled, backtrack from the last value in the table. Backtracking here is done by tracing the path where the counter incremented first.

**Step 4** − The longest common subseqence obtained by noting the elements in the traced path.

### Pseudocode

In this procedure, table **C[m, n]** is computed in row major order and another table **B[m,n]** is computed to construct optimal solution.

Algorithm: LCS-Length-Table-Formulation (X, Y) m := length(X) n := length(Y) for i = 1 to m do C[i, 0] := 0 for j = 1 to n do C[0, j] := 0 for i = 1 to m do for j = 1 to n do if xi = yj C[i, j] := C[i - 1, j - 1] + 1 B[i, j] := ‘D’ else if C[i -1, j] ≥ C[i, j -1] C[i, j] := C[i - 1, j] + 1 B[i, j] := ‘U’ else C[i, j] := C[i, j - 1] + 1 B[i, j] := ‘L’ return C and B

Algorithm: Print-LCS (B, X, i, j) if i=0 and j=0 return if B[i, j] = ‘D’ Print-LCS(B, X, i-1, j-1) Print(xi) else if B[i, j] = ‘U’ Print-LCS(B, X, i-1, j) else Print-LCS(B, X, i, j-1)

This algorithm will print the longest common subsequence of **X** and **Y**.

### Analysis

To populate the table, the outer for loop iterates m times and the inner **for** loop iterates **n** times. Hence, the complexity of the algorithm is **O(m,n)**, where **m** and **n** are the length of two strings.

### Example

In this example, we have two strings * X=BACDB* and

*to find the longest common subsequence.*

**Y=BDCB**Following the algorithm, we need to calculate two tables 1 and 2.

Given n = length of X, m = length of Y

X = BDCB, Y = BACDB

### Constructing the LCS Tables

In the table below, the zeroeth rows and columns are filled with zeroes. Remianing values are filled by incrementing and choosing the maximum values according to the algorithm.

Once the values are filled, the path is traced back from the last value in the table at T[4, 5].

From the traced path, the longest common subsequence is found by choosing the values where the counter is first incremented.

In this example, the final count is 3 so the counter is incremented at 3 places, i.e., B, C, B. Therefore, the longest common subsequence of sequences X and Y is BCB.

## Analysis

To populate the table, the outer for loop iterates m times and the inner for loop iterates n times. Hence, the complexity of the algorithm is O(m,n), where m and n are the length of two strings.

### Example

Following is the final implementation to find the Longest Common Subsequence using Dynamic Programming Approach −

#include <stdio.h> #include <string.h> int max(int a, int b); int lcs(char* X, char* Y, int m, int n){ int L[m + 1][n + 1]; int i, j, index; for (i = 0; i <= m; i++) { for (j = 0; j <= n; j++) { if (i == 0 || j == 0) L[i][j] = 0; else if (X[i - 1] == Y[j - 1]) { L[i][j] = L[i - 1][j - 1] + 1; } else L[i][j] = max(L[i - 1][j], L[i][j - 1]); } } index = L[m][n]; char LCS[index + 1]; LCS[index] = '\0'; i = m, j = n; while (i > 0 && j > 0) { if (X[i - 1] == Y[j - 1]) { LCS[index - 1] = X[i - 1]; i--; j--; index--; } else if (L[i - 1][j] > L[i][j - 1]) i--; else j--; } printf("LCS: %s\n", LCS); return L[m][n]; } int max(int a, int b){ return (a > b) ? a : b; } int main(){ char X[] = "ABSDHS"; char Y[] = "ABDHSP"; int m = strlen(X); int n = strlen(Y); printf("Length of LCS is %d\n", lcs(X, Y, m, n)); return 0; }

### Output

LCS: ABDHS Length of LCS is 5

#include <bits/stdc++.h> using namespace std; int max(int a, int b); int lcs(char* X, char* Y, int m, int n){ int L[m + 1][n + 1]; int i, j, index; for (i = 0; i <= m; i++) { for (j = 0; j <= n; j++) { if (i == 0 || j == 0) L[i][j] = 0; else if (X[i - 1] == Y[j - 1]) { L[i][j] = L[i - 1][j - 1] + 1; } else L[i][j] = max(L[i - 1][j], L[i][j - 1]); } } index = L[m][n]; char LCS[index + 1]; LCS[index] = '\0'; i = m, j = n; while (i > 0 && j > 0) { if (X[i - 1] == Y[j - 1]) { LCS[index - 1] = X[i - 1]; i--; j--; index--; } else if (L[i - 1][j] > L[i][j - 1]) i--; else j--; } printf("LCS: %s\n", LCS); return L[m][n]; } int max(int a, int b){ return (a > b) ? a : b; } int main(){ char X[] = "ABSDHS"; char Y[] = "ABDHSP"; int m = strlen(X); int n = strlen(Y); printf("Length of LCS is %d\n", lcs(X, Y, m, n)); return 0; }

### Output

LCS: ABDHS Length of LCS is 5

import java.util.*; import java.lang.*; public class LCS { static int max(int n1, int n2) { if(n1>n2) { return n1; } else { return n2; } } public static int lcs(char arr1[], char arr2[], int m ,int n) { if(m == 0 || n == 0) { return 0; } if(arr1[m-1] == arr2[n-1]) { return 1 + lcs(arr1,arr2,m-1,n-1); } else { return max(lcs(arr1, arr2, m, n-1), lcs(arr1,arr2, m-1,n)); } } public static void main(String[] args) { LCS lcs = new LCS(); String str1,str2; str1 = "ABSDHS"; str2 = "ABDHSP"; char ch[] = str1.toCharArray(); char ch1[] = str2.toCharArray(); int len1 = ch.length; int len2 = ch1.length; System.out.print("Length of LCS is: " + lcs(ch, ch1, len1, len2)); } }

### Output

Length of LCS is: 5

def lcs(X, Y, m, n): L = [[None]*(n+1) for a in range(m+1)] for i in range(m+1): for j in range(n+1): if (i == 0 or j == 0): L[i][j] = 0 elif (X[i - 1] == Y[j - 1]): L[i][j] = L[i - 1][j - 1] + 1 else: L[i][j] = max(L[i - 1][j], L[i][j - 1]) l = L[m][n] LCS = [None] * (l) a = m b = n while (a > 0 and b > 0): if (X[a - 1] == Y[b - 1]): LCS[l - 1] = X[a - 1] a = a - 1 b = b - 1 l = l - 1 elif (L[a - 1][b] > L[a][b - 1]): a = a - 1 else: b = b - 1; print("LCS is ") print(LCS) return L[m][n] X = "ABSDHS" Y = "ABDHSP" m = len(X) n = len(Y) lc = lcs(X, Y, m, n) print("Length of LCS is ") print(lc)

### Output

LCS is ['A', 'B', 'D', 'H', 'S'] Length of LCS is 5

## Applications

The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff-utility, and has applications in bioinformatics. It is also widely used by revision control systems, such as SVN and Git, for reconciling multiple changes made to a revision-controlled collection of files.