Data StructureAlgorithmsPattern Searching Algorithms

The bad character heuristic method is one of the approaches of Boyer Moore Algorithm. Another approach is Good Suffix Heuristic. In this method we will try to find a bad character, that means a character of the main string, which is not matching with the pattern. When the mismatch has occurred, we will shift the entire pattern until the mismatch becomes a match, otherwise, pattern moves past the bad character.

Here the time complexity is O(m/n) for best case and O(mn)for the worst case, where n is the length of the text and m is the length of the pattern.

## Input and Output

Input:
Main String: “ABAAABCDBBABCDDEBCABC”, Pattern “ABC”
Output:
Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18

## Algorithm

Input − pattern, which will be searched, the bad character array to store location

Output: Fill the bad character array for future use

Begin
n := pattern length
for all entries of badCharacterArray, do
set all entries to -1
done

for all characters of the pattern, do
set last position of each character in badCharacterArray.
done
End

searchPattern(pattern, text)

Input − pattern, which will be searched and the main text

Output − the locations where the pattern is found

Begin
patLen := length of pattern
strLen := length of text.
shift := 0

while shift <= (strLen - patLen), do
j := patLen -1
while j >= 0 and pattern[j] = text[shift + j], do
decrease j by 1
done
if j < 0, then
print the shift as, there is a match
if shift + patLen < strLen, then
shift:= shift + patLen – badCharacterArray[text[shift + patLen]]
else
increment shift by 1
else
shift := shift + max(1, j-badCharacterArray[text[shift+j]])
done
End

## Example

#include<iostream>
#define MAXCHAR 256
using namespace std;

int maximum(int data1, int data2) {
if(data1 > data2)
return data1;
return data2;
}

int n = pattern.size();                   //find length of pattern
for(int i = 0; i<MAXCHAR; i++)
badCharacter[i] = -1;                 //set all character distance as -1

for(int i = 0; i < n; i++) {
badCharacter[(int)pattern[i]] = i;   //set position of character in the array.
}
}

void searchPattern(string mainString, string pattern, int *array, int *index) {
int patLen = pattern.size();
int strLen = mainString.size();
int shift = 0;

while(shift <= (strLen - patLen)) {
int j = patLen - 1;
while(j >= 0 && pattern[j] == mainString[shift+j]) {
j--;     //reduce j when pattern and main string character is matching
}

if(j < 0) {
(*index)++;
array[(*index)] = shift;

if((shift + patLen) < strLen) {
shift += patLen - badCharacter[mainString[shift + patLen]];
}else {
shift += 1;
}
}else {
shift += maximum(1, j - badCharacter[mainString[shift+j]]);
}
}
}

int main() {
string mainString = "ABAAABCDBBABCDDEBCABC";
string pattern = "ABC";
int locArray[mainString.size()];
int index = -1;
searchPattern(mainString, pattern, locArray, &index);

for(int i = 0; i <= index; i++) {
cout << "Pattern found at position: " << locArray[i]<<endl;
}
}

## Output

Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18