What are the Different Phases of a Compiler?

There are the various phases of the compiler which are as follows −

  • Lexical Analysis (Scanner)

It is also known as a scanner. This is the first step that works as an integration between the compiler and the source language code. It reads the source code one character at a time and designs a source code into a series of atomic units known as tokens.

Each token defines a sequence of characters that can be considered as a single logical entity. It may be an identifier, keyword, constant, operators, and punctuation symbol such as , and ;.

The character sequence constructing a token is known as the lexeme of the token. It also removes comments and unnecessary spaces.

  • Syntax Analysis (Parser)

It is also known as the parser. It receives tokens as its input generated from the previous phase (lexical analysis) and produces a hierarchical structure called syntax tree or parse tree.

There are two types of Parsers −

  • Bottom-up − It generates a parse tree from leaves and scans towards upwards, i.e., the root of the tree.

  • Top-Down − It generates a parse tree, from root level and moves downwards toward leaves.

  • Semantic Analysis

This phase makes the syntax tree input and determines the semantical accuracy of the program. However, the tokens are accurate and syntactically right; they may be precise, not semantically. Hence the semantic analyzer determines the semantics (meaning) of the statements construct.

  • Intermediate Code Generation

This phase takes the syntactically and semantically correct form as input and produces the same intermediate notation of the source code.

The Intermediate Code must have two essential properties which are as follows −

  • It must be easy to create.

  • It can simply translate into the target code.

  • Code Optimization

It is an optional phase. It converts the intermediate representation of the source program into an efficient code.

  • Code Generation

This is the final step of the compilation process. It converts optimized intermediate code into Machine/Assembly code. It allocates memory locations for variables in the program.

Better code generation can be only performed by efficient utilization of registers.

  • Example − A statement like A = B + C can be converted to assembly code as follows −

  • Symbol Table

It is a data structure including data for each identifier, fields for the attribute of the identifier. The data structure enables us to find the data for each identifier fastly and to save or retrieve the information for that record quickly.

  • Error Handling

The main task of the compiler is to perform error detection and reporting. This error handler is invoked when an error in the source code is found. Each phase encounters an error after discovering an error a phase should somehow manage with that error so that compilation can continue.