What is the Lexical Analysis?

Compiler DesignProgramming LanguagesComputer Programming

Lexical Analysis is the first step of the compiler which reads the source code one character at a time and transforms it into an array of tokens.

Token − The token is a meaningful collection of characters in a program. These tokens can be keywords including do, if, while etc. and identifiers including x, num, count, etc. and operator symbols including >,>=, +, etc., and punctuation symbols including parenthesis or commas. The output of the lexical analyzer phase passes to the next phase called syntax analyzer or parser.

Example − A statement a = b + 5 will have the tokens.

Role of Lexical Analysis

The main function of lexical analysis are as follows −

  • It can separate tokens from the program and return those tokens to the parser as requested by it.

  • It can eliminate comments, whitespaces, newline characters, etc. from the string.

  • It can inserts the token into the symbol table.

  • Lexical Analysis will return an integer number for each token to the parser.

  • The correlating error messages that are produced by the compiler during lexical analyzer with the source program.

  • It can implement the expansion of macros, in the case of macro pre-processors are used in the source code.

Example1 − What will be operations performed by the lexical analysis phase on input string a = b + 5.


  • Find out tokens and their types.

Token TypeValues
  • Put information about Tokens into Symbol Table.

AddressToken Type, value
330id, integer, value = a
332id, integer, value = b

360constant, integer, value = 5
  • After finding out tokens and storing them into the symbol table, a token stream is generated as follows −

[i= id, 330] = [id, 332] + [const, 360]

where each pair is of the form [token – type, index]

token-type − It tells whether it is a constant, identifier, label, etc.

index − It tells about the address of the token in the symbol table.

Example2 − What will be the operation performed by lexical analysis on the statement. If (A=10) then GOTO 200.


  • Tokens will be

Token TypeValues
KeywordsIf, then, GOTO
  • Symbol Table

AddressToken Type, Value
236id, integer, value = A
238constant, integer, value = 10

288label, value = 200
  • Token Stream will be

If ([id, 236] = [constant, 238]) then GOTO [label, 288]
Published on 23-Oct-2021 11:38:29