# Aho-Corasick Algorithm

This algorithm is helpful to find all occurrences of all given set of keywords. It is a kind of Dictionary-matching algorithm. It uses a tree structure using all keywords. After making the tree, it tries to convert the tree as an automaton to make the searching in linear time. There are three different phases of Aho-Corasick Algorithm.

These are Go-to, Failure, and Output. In the go-to stage, it makes the tree using all the keywords. In the next phase or in the Failure Phase, it tries to find the backward transition to get a proper suffix of some keywords. In the output stage, for every state ‘s’ of the automaton, it finds all words which are ending at the state ‘s’.

The time complexity of this algorithm is: O(N + L + Z), where N is the length of the text, L is the length of keywords and the Z is a number of matches.

## Input and Output

Input:
A set of patterns: {their, there, answer, any, bye}
Output:
Word there location: 2
Word any location: 7
Word bye location: 22

## Algorithm

buildTree(patternList, size)

Input − The list of all patterns, and the size of the list

Output − Generate the transition map to find the patterns

Begin
set all elements of output array to 0
set all elements of fail array to -1
set all elements of goto matrix to -1
state := 1       //at first there is only one state.

for all patterns ‘i’ in the patternList, do
word := patternList[i]
present := 0
for all character ‘ch’ of word, do
if goto[present, ch] = -1 then
goto[present, ch] := state
state := state + 1
present:= goto[present, ch]
done
output[present] := output[present] OR (shift left 1 for i times)
done

for all type of characters ch, do
if goto[0, ch] ≠ 0 then
fail[goto[0,ch]] := 0
insert goto[0, ch] into a Queue q.
done

while q is not empty, do
newState := first element of q
delete from q.
for all possible character ch, do
if goto[newState, ch] ≠ -1 then
failure := fail[newState]
while goto[failure, ch] = -1, do
failure := goto[failure, ch]
done

fail[goto[newState, ch]] = failure
output[goto[newState, ch]] :=output[goto[newState,ch]] OR output[failure]
insert goto[newState, ch] into q.
done
done
return state
End

getNextState(presentState, nextChar)

Input − present state and the next character to determine next state

Output: the next state

Begin
ch := nextChar

while goto[answer, ch] = -41, do
done
End

patternSearch(patternList, size, text)

Input − List of patterns, size of the list and the main text

Output − The indexes of the text where patterns are found

Begin
call buildTree(patternList, size)
presentState := 0

for all indexes of the text, do
if output[presentState] = 0
ignore next part and go for next iteration
for all patterns in the patternList, do
if the pattern found using output array, then
print the location where pattern is present
done
done
End

## Example

#include <iostream>
#include <queue>
#define MAXS 500    //sum of the length of all patterns
#define MAXC 26     //as 26 letters in alphabet
using namespace std;

int output[MAXS];
int fail[MAXS];
int gotoMat[MAXS][MAXC];

int buildTree(string array[], int size) {
for(int i = 0; i<MAXS; i++)
output[i] = 0;    //all element of output is 0

for(int i = 0; i<MAXS; i++)
fail[i] = -1;    //all element of failure array is -1

for(int i = 0; i<MAXS; i++)
for(int j = 0; j<MAXC; j++)
gotoMat[i][j] = -1;    //all element of goto matrix is -1

int state = 1;    //there is only one state

for (int i = 0; i < size; i++) {    //make trie structure for all pattern in array
//const string &word = array[i];
string word = array[i];
int presentState = 0;

for (int j = 0; j < word.size(); ++j) {    //all pattern is added
int ch = word[j] - 'a';
if (gotoMat[presentState][ch] == -1)    //ic ch is not present make new node
gotoMat[presentState][ch] = state++;    //increase state
presentState = gotoMat[presentState][ch];
}
output[presentState] |= (1 << i); //current word added in the output
}

for (int ch = 0; ch < MAXC; ++ch)   //if ch is not directly connected to root
if (gotoMat[0][ch] == -1)
gotoMat[0][ch] = 0;

queue<int> q;

for (int ch = 0; ch < MAXC; ++ch) {    //node goes to previous state when fails
if (gotoMat[0][ch] != 0) {
fail[gotoMat[0][ch]] = 0;
q.push(gotoMat[0][ch]);
}
}

while (q.size()) {
int state = q.front();    //remove front node
q.pop();

for (int ch = 0; ch <= MAXC; ++ch) {
if (gotoMat[state][ch] != -1) {    //if goto state is present
int failure = fail[state];
while (gotoMat[failure][ch] == -1)    //find deepest node with proper suffix
failure = fail[failure];
failure = gotoMat[failure][ch];
fail[gotoMat[state][ch]] = failure;
output[gotoMat[state][ch]] |= output[failure];   // Merge output values
q.push(gotoMat[state][ch]);    //add next level node to the queue
}
}
}
return state;
}

int getNextState(int presentState, char nextChar) {
int ch = nextChar - 'a'; //subtract ascii of 'a'

}

void patternSearch(string arr[], int size, string text) {
buildTree(arr, size);    //make the trie structure
int presentState = 0;    //make current state as 0

for (int i = 0; i < text.size(); i++) {    //find all occurances of pattern
presentState = getNextState(presentState, text[i]);
if (output[presentState] == 0)    //if no match continue;
for (int j = 0; j < size; ++j) {   //matching found and print words
if (output[presentState] & (1 << j)) {
cout << "Word " << arr[j] << " location: " << i - arr[j].size() + 1 << endl;
}
}
}
}

int main() {
string arr[] = {"their", "there", "answer", "any", "bye"};
int k = sizeof(arr)/sizeof(arr[0]);
patternSearch(arr, k, text);
return 0;
}

## Output

Word there location: 2
Word any location: 7
Word bye location: 22