Bad Character Heuristic

Data StructureAlgorithmsPattern Searching Algorithms

The bad character heuristic method is one of the approaches of Boyer Moore Algorithm. Another approach is Good Suffix Heuristic. In this method we will try to find a bad character, that means a character of the main string, which is not matching with the pattern. When the mismatch has occurred, we will shift the entire pattern until the mismatch becomes a match, otherwise, pattern moves past the bad character.

Here the time complexity is O(m/n) for best case and O(mn)for the worst case, where n is the length of the text and m is the length of the pattern.

Input and Output

Input:
Main String: “ABAAABCDBBABCDDEBCABC”, Pattern “ABC”
Output:
Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18

Algorithm

badCharacterHeuristic(pattern, badCharacterArray)

Input − pattern, which will be searched, the bad character array to store location

Output: Fill the bad character array for future use

Begin
   n := pattern length
   for all entries of badCharacterArray, do
      set all entries to -1
   done

   for all characters of the pattern, do
      set last position of each character in badCharacterArray.
   done
End

searchPattern(pattern, text)

Input − pattern, which will be searched and the main text

Output − the locations where the pattern is found

Begin
   patLen := length of pattern
   strLen := length of text.
   call badCharacterHeuristic(pattern, badCharacterArray)
   shift := 0

   while shift <= (strLen - patLen), do
      j := patLen -1
      while j >= 0 and pattern[j] = text[shift + j], do
         decrease j by 1
      done
      if j < 0, then
         print the shift as, there is a match
         if shift + patLen < strLen, then
            shift:= shift + patLen – badCharacterArray[text[shift + patLen]]
         else
            increment shift by 1
      else
         shift := shift + max(1, j-badCharacterArray[text[shift+j]])
   done
End

Example

#include<iostream>
#define MAXCHAR 256
using namespace std;

int maximum(int data1, int data2) {
   if(data1 > data2)
      return data1;
   return data2;
}

void badCharacterHeuristic(string pattern, int badCharacter[MAXCHAR]) {
   int n = pattern.size();                   //find length of pattern
   for(int i = 0; i<MAXCHAR; i++)
      badCharacter[i] = -1;                 //set all character distance as -1

   for(int i = 0; i < n; i++) {
      badCharacter[(int)pattern[i]] = i;   //set position of character in the array.
   }  
}

void searchPattern(string mainString, string pattern, int *array, int *index) {
   int patLen = pattern.size();
   int strLen = mainString.size();
   int badCharacter[MAXCHAR];           //make array for bad character position
   badCharacterHeuristic(pattern, badCharacter);       //fill bad character array
   int shift = 0;

   while(shift <= (strLen - patLen)) {
      int j = patLen - 1;
      while(j >= 0 && pattern[j] == mainString[shift+j]) {
         j--;     //reduce j when pattern and main string character is matching
      }

      if(j < 0) {
         (*index)++;
         array[(*index)] = shift;

         if((shift + patLen) < strLen) {
            shift += patLen - badCharacter[mainString[shift + patLen]];
         }else {
            shift += 1;
         }
      }else {
         shift += maximum(1, j - badCharacter[mainString[shift+j]]);
      }
   }
}

int main() {
   string mainString = "ABAAABCDBBABCDDEBCABC";
   string pattern = "ABC";
   int locArray[mainString.size()];
   int index = -1;
   searchPattern(mainString, pattern, locArray, &index);

   for(int i = 0; i <= index; i++) {
      cout << "Pattern found at position: " << locArray[i]<<endl;
   }
}

Output

Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18
raja
Published on 09-Jul-2018 08:50:53
Advertisements