What is the role of the lexical analyzer in compiler design?

The lexical analysis is the first phase of the compiler where a lexical analyser operate as an interface between the source code and the rest of the phases of a compiler. It reads the input characters of the source program, groups them into lexemes, and produces a sequence of tokens for each lexeme. The tokens are sent to the parser for syntax analysis.

If the lexical analyzer is located as a separate pass in the compiler it can need an intermediate file to locate its output, from which the parser would then takes its input. It can eliminate the need for the intermediate file, the lexical analyzer and the syntactic analyser (parser) are often grouped into the same pass where the lexical analyser operates either under the control of the parser or as a subroutine with the parser.

The lexical analyzer also interacts with the symbol table while passing tokens to the parser. Whenever a token is discovered, the lexical analyzer returns a representation for that token to the parser. If the token is a simple construct including parenthesis, comma, or a colon, then it returns an integer program. If the token is a more complex items including an identifier or another token with a value, the value is also passed to the parser.

Lexical analyzer separates the characters of the source language into groups that logically belong together, called tokens. It includes the token name which is an abstract symbol that define a type of lexical unit and an optional attribute value called token values. Tokens can be identifiers, keywords, constants, operators, and punctuation symbols including commas and parenthesis. A rule that represent a group of input strings for which the equal token is make as output is called the pattern.

Regular expression plays an essential role in specifying patterns. If a keyword is treated as a token, the pattern is only a sequence of characters. For identifiers and various tokens, patterns form a difficult structure.

The lexical analyzer also handles issues including stripping out the comments and whitespace (tab, newline, blank, and other characters that are used to separate tokens in the input). The correlating error messages that are generated by the compiler during lexical analyzer with the source program.

For example, it can maintain track of all newline characters so that it can relate an ambiguous statement line number with each error message. It can be implementing the expansion of macros, in the case of macro, pre-processors are used in the source program.

Updated on: 29-Oct-2021

27K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started