Anagram Pattern Search

Pattern Searching AlgorithmsData StructureAlgorithms

Anagrams are basically all permutations of a given string or pattern. This pattern searching algorithm is slightly different. In this case, not only the exact pattern is searched, it searches all possible arrangements of the given pattern in the text.

To solve this problem, we will divide the whole texts into several windows of length same as patterns. Then count on each character of the pattern is found and stored in an array. For each window, we also try to find the count array, then check whether they are matching or not.

The time Complexity of Anagram Pattern Search Algorithm is O(n).

Input and Output

Input:
The main String “AABAACBABBCABAABBA”. The pattern “AABC”.
Output:
Anagram found at position: 2
Anagram found at position: 3
Anagram found at position: 4
Anagram found at position: 10

Algorithm

anagramSearch(text, pattern)

Input − The main string and the pattern

Output − All locations where pattern and it’s all anagrams are found.

Begin
   define patternFreq array and stringFreq array
   patLne := length of pattern
   stringLen := length of the text
   set all entries of patternFreq array to 0

   for all characters present in pattern, do
      increase the frequency.
   done

   for i := 0 to i<= stringLen – patLen, do
      set all entries of stringFreq to 0
      for all characters of each window, do
         increase the frequency
      done

      if the stringFreq and patternFreq are same, then
         display the value of i, as anagram found at that location
   done
End

Example

#include<iostream>
#include<cstring>
#define LETTER 26
using namespace std;

bool arrayCompare(int *array1, int *array2, int n) {
   for(int i = 0; i<n; i++) {
      if(array1[i] != array2[i])
         return false; //if there is one mismatch stop working
   }
   return true; //arrays are identical
}

void setArray(int *array, int n, int value) {
   for(int i = 0; i<n; i++)
      array[i] = value; //put value for all places in the array
}

void anagramSearch(string mainString, string patt, int *array, int *index) {
   int strFreq[LETTER], pattFreq[LETTER];
   int patLen = patt.size();
   int stringLen = mainString.size();
   setArray(pattFreq, LETTER, 0);    //initialize all frequency to 0

   for(int i = 0; i<patLen; i++) {
      int patIndex = patt[i] - 'A';   //subtract ASCII of A
      pattFreq[patIndex]++;           //increase frequency
   }

   for(int i = 0; i<=(stringLen - patLen); i++) {    //the range where window will move
      setArray(strFreq, LETTER, 0);         //initialize all frequency to 0 for main string
      for(int j = i; j<(i+patLen); j++){    //update frequency for each window.
         int strIndex = mainString[j] - 'A';
         strFreq[strIndex]++;               //increase frequency
      }

      if(arrayCompare(strFreq, pattFreq, LETTER)) {    //when both arrays are identical
         (*index)++;
         array[*index] = i;           //anagram found at ith position
      }
   }
}

int main() {
   string mainStrng = "AABAACBABBCABAABBA";
   string pattern = "AABC";
   int matchLocation[mainStrng.size()];
   int index = -1;
   anagramSearch(mainStrng, pattern, matchLocation, &index);

   for(int i = 0; i<=index; i++) {
      cout << "Anagram found at position: " << matchLocation[i] << endl;
   }

}

Output

Anagram found at position: 2
Anagram found at position: 3
Anagram found at position: 4
Anagram found at position: 10
raja
Published on 09-Jul-2018 08:32:49
Advertisements