Z Algorithm



Z-Algorithm for Pattern Matching

The Z-Algorithm is a linear time string-matching algorithm that is used for pattern matching or searching a given pattern in a string. Its purpose is to search all occurrences of a given pattern in the string. The Z-algorithm relies on a Z-array to find pattern occurrences. The Z-array is an array of integers that stores the length of the longest common prefix between the pattern and any substring of the text. It is of the same length as the string.

How Z-Algorithm works?

The Z algorithm works by constructing an auxiliary array named Z-array which stores the length of the longest common prefix between given text and any substring of the text. Each index in this array stores the number of matching characters, starting from the 0th index up to the current index.

The Z-algorithm requires the following steps −

  • First, merge the pattern and the given string together. We also need to add a special character in between which is not present in any of the specified strings. Let's say we are using the dollar sign($) as a special character.

  • Then, construct the Z-array for this newly created string.

  • Now, check every index of the Z-array to find where its value matches the length of the pattern being searched. If the value and length matches, mark the pattern as found.

  • In the last step, subtract the index number from the length of pattern + 1 which will result in the index of the pattern.

The figure below illustrates the above approach −

Z-Algorithm

Let's understand the input-output scenario −

Input:
Main String: "ABAAABCDBBABCDDEBCABC" 
Pattern: "ABC"
Output:
Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18

In the above scenario, we are looking for the pattern "ABC" in the main string "ABAAABCDBBABCDDEBCABC". We will check every position in the main string and note down where we find a match. We have found the pattern "ABC" at positions 4, 10 and 18.

Example

Following is the example demonstrating Z-algorithm in various programming languages −

#include <stdio.h>
#include <string.h>
// function to fill Z array 
void fillZArray(const char* conStr, int zArr[]) {
   int n = strlen(conStr);
   int windLeft, windRight, k;
   // Initialize the window size to 0
   windLeft = windRight = 0; 
   // iterating over the characters of the new string
   for (int i = 1; i < n; i++) {
      // checking if current index is greater than right bound of window
      if (i > windRight) {
         // reset the window size to 0 and position it at the current index
         windLeft = windRight = i; 
         // extend right bound of window as long as characters match
         while (windRight < n && conStr[windRight - windLeft] == conStr[windRight]) {
             windRight++; 
         }
         // setting the Z value for the current index
         zArr[i] = windRight - windLeft;
         // decrementing right bound 
         windRight--;
      } else {
         // calculating corresponding index in window
         k = i - windLeft;
         // if Z value at corresponding index is less than remaining interval
         if (zArr[k] < windRight - i + 1) {
             zArr[i] = zArr[k]; 
         } else {
            // reset left bound of window to current index
            windLeft = i;
            // extend right bound of window as long as characters match
            while (windRight < n && conStr[windRight - windLeft] == conStr[windRight]) {
               windRight++;
            }
            // Setting the Z value for the current index
            zArr[i] = windRight - windLeft;
            // Decrement the right bound of the window
            windRight--;
         }
      }
   }
}
// function to implement the Z algorithm for pattern searching
void zAlgorithm(const char* mainString, const char* pattern, int array[], int *index) {
   // concatenate the pattern, a special character, and the main string
   char concatedStr[strlen(mainString) + strlen(pattern) + 1];
   strcpy(concatedStr, pattern);
   strcat(concatedStr, "$");
   strcat(concatedStr, mainString); 
   int patLen = strlen(pattern);
   int len = strlen(concatedStr);
   // Initialize the Z array
   int zArr[len];
   // Fill the Z array 
   fillZArray(concatedStr, zArr);
   // iterate over the Z array
   for (int i = 0; i < len; i++) {
      // if Z value equals length of the pattern, the pattern is found
      if (zArr[i] == patLen) {
         (*index)++;
         array[(*index)] = i - patLen - 1;
      }
   }
}
int main() {
   const char* mainString = "ABAAABCDBBABCDDEBCABC";
   const char* pattern = "ABC";
   // Initialize the location array and the index
   int locArray[strlen(mainString)];
   int index = -1;
   // Calling the Z algorithm function
   zAlgorithm(mainString, pattern, locArray, &index);
   // to print the result
   for (int i = 0; i <= index; i++) {
      printf("Pattern found at position: %d\n", locArray[i]);
   }
   return 0;
}

Output

Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18
#include<iostream>
using namespace std;
// function to fill Z array 
void fillZArray(string conStr, int zArr[]) {
   int n = conStr.size();
   int windLeft, windRight, k;
   // initially window size is 0
   windLeft = windRight = 0;    
   // iterating over the characters of the new string
   for(int i = 1; i < n; i++) {
      // checking if current index is greater than right bound of window
      if(i > windRight) {
	     // reset the window size to 0 and position it at the current index
         windLeft = windRight = i; 
		 // extend right bound of window as long as characters match	
         while(windRight < n && conStr[windRight-windLeft] == conStr[windRight]) {
            windRight++;    
         }
		 // setting the Z value for the current index
         zArr[i] = windRight-windLeft;
		 // decrementing right bound 
         windRight--;
      }else {
	     // calculating corresponding index in window
         k = i-windLeft;
		 // if Z value at corresponding index is less than remaining interval
         if(zArr[k] < windRight-i+1)
            zArr[i] = zArr[k];    
         else {
		    // reset left bound of window to current index
            windLeft = i;
			// extend right bound of window as long as characters match
            while(windRight < n && conStr[windRight - windLeft] == conStr[windRight]) {
               windRight++;
            }
			// Setting the Z value for the current index
            zArr[i] = windRight - windLeft;
			// Decrement the right bound of the window
            windRight--;
         }
      }
   }
}
// function to implement the Z algorithm for pattern searching
void zAlgorithm(string mainString, string pattern, int array[], int *index) {
   // concatenate the pattern, a special character, and the main string
   string concatedStr = pattern + "$" + mainString;    
   int patLen = pattern.size();
   int len = concatedStr.size();
   // Initialize the Z array
   int zArr[len];
   // Fill the Z array 
   fillZArray(concatedStr, zArr);
   // iterate over the Z array
   for(int i = 0; i<len; i++) {
       // if Z value equals length of the pattern, the pattern is found
      if(zArr[i] == patLen) {
         (*index)++;
         array[(*index)] = i - patLen -1;
      }
   }
}
int main() {
   string mainString = "ABAAABCDBBABCDDEBCABC";
   string pattern = "ABC";
   // Initialize the location array and the index
   int locArray[mainString.size()];
   int index = -1;
   // Calling the Z algorithm function
   zAlgorithm(mainString, pattern, locArray, &index);
   // to print the result
   for(int i = 0; i <= index; i++) {
      cout << "Pattern found at position: " << locArray[i]<<endl;
   }
}

Output

Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18
public class ZAlgorithm {
   // method to fill Z array    
   public static void fillZArray(String conStr, int[] zArr) {
      int n = conStr.length();
      int windLeft, windRight, k;
      // initially window size is 0
      windLeft = windRight = 0; 
      // iterating over the characters of the new string
      for (int i = 1; i < n; i++) {
         // checking if current index is greater than right bound of window
         if (i > windRight) {
            // reset the window size to 0 and position it at the current index
            windLeft = windRight = i;
            while (windRight < n && conStr.charAt(windRight - windLeft) == conStr.charAt(windRight)) {
               windRight++; 
            }
            // setting the Z value for the current index
            zArr[i] = windRight - windLeft;
            windRight--;
         } else {
            k = i - windLeft;
            if (zArr[k] < windRight - i + 1)
               zArr[i] = zArr[k]; 
            else {
               windLeft = i;
               while (windRight < n && conStr.charAt(windRight - windLeft) == conStr.charAt(windRight)) {
                  windRight++;
               }
               zArr[i] = windRight - windLeft;
               windRight--;
            }
         }
      }
   }
   // method to implement the Z algorithm for pattern searching
   public static void zAlgorithm(String mainString, String pattern, int[] array) {
      // concatenate the pattern, a special character, and the main string
      String concatedStr = pattern + "$" + mainString; 
      int patLen = pattern.length();
      int len = concatedStr.length();
      // Initialize the Z array
      int[] zArr = new int[len];
      // Fill the Z array 
      fillZArray(concatedStr, zArr);
      int index = -1;
      // iterate over the Z array
      for (int i = 0; i < len; i++) {
         // if Z value equals length of the pattern, the pattern is found
         if (zArr[i] == patLen) {
            index++;
            array[index] = i - patLen - 1;
         }
      }
      // Print the results
      for (int i = 0; i <= index; i++) {
         System.out.println("Pattern found at position: " + array[i]);
      }
   }
   public static void main(String[] args) {
      String mainString = "ABAAABCDBBABCDDEBCABC";
      String pattern = "ABC";
      // Initialize the location array and the index
      int[] locArray = new int[mainString.length()];
      // Calling the Z algorithm method
      zAlgorithm(mainString, pattern, locArray);
   }
}

Output

Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18
# function to fill Z array 
def fillZArray(conStr, zArr):
    n = len(conStr)
    windLeft, windRight, k = 0, 0, 0  
    # iterating over the characters of the new string
    for i in range(1, n):
        if i > windRight:
            windLeft, windRight = i, i  
            while windRight < n and conStr[windRight - windLeft] == conStr[windRight]:
                windRight += 1  
            zArr[i] = windRight - windLeft
            windRight -= 1
        else:
            k = i - windLeft
            if zArr[k] < windRight - i + 1:
                zArr[i] = zArr[k] 
            else:
                windLeft = i
                while windRight < n and conStr[windRight - windLeft] == conStr[windRight]:
                    windRight += 1
                zArr[i] = windRight - windLeft
                windRight -= 1
# function to implement the Z algorithm for pattern searching
def zAlgorithm(mainString, pattern, array):
    concatedStr = pattern + "$" + mainString  
    patLen = len(pattern)
    length = len(concatedStr)
    zArr = [0] * length
    fillZArray(concatedStr, zArr)
    index = -1
    for i in range(length):
        if zArr[i] == patLen:
            index += 1
            array[index] = i - patLen - 1
    return index, array
def main():
    mainString = "ABAAABCDBBABCDDEBCABC"
    pattern = "ABC"
    locArray = [0] * len(mainString)
    index, locArray = zAlgorithm(mainString, pattern, locArray)
    for i in range(index + 1):
        print("Pattern found at position:", locArray[i])
if __name__ == "__main__":
    main()

Output

Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18

Complexity of Z-Algorithm

The Z Algorithm is used for pattern searching that runs in linear time. Therefore, its time complexity is O(m + n), where n is the length of the string being searched and m is the length of the pattern being searched for.

Advertisements