Java Program to Implement Zhu-Takaoka String Matching Algorithm


The Zhu-Takaoka algorithm is one of the best algorithms for pattern matching. It is developed using the combination of the Boyer-Moore and KMP string-matching algorithms.

The Zhu-Takaoka algorithm utilizes the good character shift and bad character shift techniques to solve the problem.

Problem statement − We have given two strings. We need to implement the Zhu-Takaoka algorithm for pattern matching.

Sample Examples

Input

str = "PQRDPQRSSE"; patt = "PQRS";

Output

5

Explanation

The ‘PQRS’ pattern exists at position 5. So, it prints 5.

Input

str = "PQRDPQRSSE"; patt = "PRQS";

Output

-1

Explanation

The pattern doesn’t exist in the given string.

Input

str = "WELWELWELCOME"; patt = "WEL";

Output

1, 4, 7

Explanation

The pattern exists at multiple positions in the given string.

Approach

In the Zhu-Takaoka algorithm, we will prepare a ZTBC table after pre-processing the pattern string. So, we can know by how many indexes we should move the pattern to get the next match.

Let’s understand the working of the Zhu-Takaoka algorithm step-by-step.

First, we create a ZTBC table of dimensions 26 x 26. Each row and column of the table is represented using the uppercase alphabetical characters.

In the first stage, all table elements are equal to the pattern length, assuming we need to pass the whole pattern when any mismatch occurs.

So, table values are as shown below according to the ‘PQRS’ pattern.

A  B  C  D  E  F  …
A  4  4  4  4  4  4
B  4  4  4  4  4  4
C  4  4  4  4  4  4
D  4  4  4  4  4  4
..

In this algorithm, we need to match the two characters of the pattern with the string from right to left. If we find two consecutive matching elements in the pattern, we need to move the pattern such that it pair of characters match both in the string and pattern.

So, update the ZTBC table accordingly.

table [pattern[p-1]][pattern[p]] = len – p - 1 ;

After pre-processing the table, we need to start comparing the string and pattern.

Algorithm

  • Step 1 − Define the ZTBCTable[] array of dimension 26 x 26 globally to store the pre-processed moves of the pattern.

  • Step 2 − Execute the prepareZTBCTable() to fill the ZTBCTable() array after pre-processing the pattern.

  • Step 2.1 − In the prepareZTBCTable() function, initialize all array elements with pattern length.

  • Step 2.2 − Initialize the ZTBCTable[p][patt.charAt(0) - 'A'] with the pattern length – 1, representing the move when single character matches.

  • Step 2.3 − Update the ZTBCTable[patt.charAt(p - 1) - 'A'][patt.charAt(p) - 'A'] with pattern length – 1 – index.

  • Step 3 − Next, initialize the q with 0 and isPatPresent with a false value.

  • Step 4 − Make iterations until q is smaller than the string and pattern length difference.

  • Step 5 − Make nested loo’s iteration until string and pattern character matches from the last.

  • Step 6 − We found the match if p is less than 0. So, print the q as starting index, and update the isPatPresent with true.

  • Step 7 − Otherwise, add ZTBCTable[str.charAt(q + patt_len - 2) - 'A'][str.charAt(q + patt_len - 1) - 'A'] to the ‘q’ variable to move the pattern.

  • Step 8 − At last, if the value of isPatPresent is false, print ‘pattern doesn’t exists’ in the string.

Example

import java.io.*;
import java.lang.*;
import java.util.*;

public class Main {
   // Declaring custom strings as inputs and patterns
   public static String str = "PQRDPQRSSE";
   public static String patt = "PQRS";
   // And their lengths
   public static int str_len = str.length();
   public static int patt_len = patt.length();
   public static int[][] ZTBCTable = new int[26][26];
   public static void prepareZTBCTable() {
      int p, q;
      // Initialize the table
      for (p = 0; p < 26; ++p)
         for (q = 0; q < 26; ++q)
            ZTBCTable[p][q] = patt_len;
      for (p = 0; p < 26; ++p)
         ZTBCTable[p][patt.charAt(0) - 'A'] = patt_len - 1;
      for (p = 1; p < patt_len - 1; ++p)
         ZTBCTable[patt.charAt(p - 1) - 'A'][patt.charAt(p) - 'A'] = patt_len - 1 - p;
   }
   public static void main(String args[]) {
      int p, q;
      // Preparing a ZTBC table
      prepareZTBCTable();
      q = 0;
      boolean isPatPresent = false;
      while (q <= str_len - patt_len) {
         p = patt_len - 1;
         while (p >= 0 && patt.charAt(p) == str.charAt(p + q))
            --p;
         if (p < 0) {
            // When we get the pattern
            System.out.println("Pattern exists at index " + (q + 1));
            q += patt_len;
            isPatPresent = true;
         } else {
            // Move in the string
            q += ZTBCTable[str.charAt(q + patt_len - 2) - 'A'][str.charAt(q + patt_len - 1) - 'A'];
         }
      }
      if(!isPatPresent){
         System.out.println("Pattern doesn't exists in the given string");
      }
   }
} 

Output

Pattern exists at index 5

Time complexity – O(N*M), where N is the string length, and M is the pattern length.

Space complexity – O(26*26) to store the pattern moves.

The Zhu-Takaoka algorithm is more efficient in terms of memory and time. Also, it compares the two characters of the pattern with the string, improving the algorithm's performance by decreasing the comparisons.

Updated on: 04-Jul-2023

101 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements