# Explain about CYK Algorithm for Context Free Grammar

Data Structure AlgorithmsComputer ScienceComputers

CKY means Cocke-Kasami-Younger. It is one of the earliest recognition and parsing algorithms. The standard version of CKY can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF).

It is also possible to extend the CKY algorithm to handle some grammars which are not in CNF (Hard to understand).

Based on a “dynamic programming” approach −

• Build solutions compositionally from sub-solutions

• It uses the grammar directly.

## Algorithm

Begin
for ( i = 1 to n do )
Vi1 { A | A → a is a production where i th symbol of x is a }
for ( j = 2 to n do )
for ( i = 1 to n - j + 1 do )
Begin
Vij = ϕ
For k = 1 to j - 1 do
Vij = Vij ∪ { A | A → BC is a production where B is in Vik and C is in V(i + k)(j - k) }
End
End

## Example

CYK algorithm is used to find whether the given Context free grammar generates a given string or not.

The given Context free grammar (CFG) −

S --> AB | BC
A --> BA | a
B --> CC | b
C --> AB | a

The string need to check is w =ababa

The length of string |w| = 5

ababa
A→a
C→a
B→bA→a
C→a
B→bA→a
C→a
S→AB
CεAB
S→BC
A→BA
S→AB
CεAB
S→BC
A→BA

BεCS→AB
C→Ab
BεCC

B→CCB→CC

S,C,A

S is present in the last cell so the string is valid.

## Explanation

• First letter a can be find By the variable A or C. For b, variable B can find the terminal b. So, B will sit in the 2nd field in the first row.

• For row2 We need to make a pair of two terminals and it will reduce the 1 cell . As in row2 the row1's field will be made a pair like we will have ab,ba,ab,ba.

• So, in this need to find the variable which will find the ab and that variable will be placed in field row2, column1. For a, we have A, C which will find it. And for b we have B. So, for ab it will make a pair in sequence like AB, CB.

• Now we need to check whether these two productions AB, CB are there in the given grammar or not. AB can be found by S and C.

• So, S,C production will be placed there.

• Similarly for ba it will take B for b and A,C for a. So, the production will be BA,BC. And BA, BC can be find by the production S, A. So, this will be placed at row2, column2. Then again row2column3 is ab so, same as of row2column1. And row2column4 will ba will be same as row2Column2.

• Similarly, the next rows need to find the terminals aba,bab,aba And in sequence order the variable which can find it will be B for aba, S,C for bab and B for aba.

• Now row4 four terminals will be clubbed together as abab,baba. And its production will be B. In the last term ababa all five will be clubbed together and its production will be S, C, A.

• If the last one has S that is the starting state then the given string is accepted. So, the membership is true for a given string.

• Also you need to see that if three terminals are clubbed together then its production can be found as (ab a) or (a ba). Similarly for four terminals clubbed one (a bab) or (aba b) or (ab ab). Similarly for five (ab aba) or (aba ba) or (abab a) ....