What is Input Buffering in Compiler Design?


Lexical Analysis has to access secondary memory each time to identify tokens. It is time-consuming and costly. So, the input strings are stored into a buffer and then scanned by Lexical Analysis.

Lexical Analysis scans input string from left to right one character at a time to identify tokens. It uses two pointers to scan tokens −

  • Begin Pointer (bptr) − It points to the beginning of the string to be read.

  • Look Ahead Pointer (lptr) − It moves ahead to search for the end of the token.

Example − For statement int a, b;

  • Both pointers start at the beginning of the string, which is stored in the buffer.

  • Look Ahead Pointer scans buffer until the token is found.

  • The character ("blank space") beyond the token ("int") have to be examined before the token ("int") will be determined.

  • After processing token ("int") both pointers will set to the next token ('a'), & this process will be repeated for the whole program.

A buffer can be divided into two halves. If the look Ahead pointer moves towards halfway in First Half, the second half is filled with new characters to be read. If the look Ahead pointer moves towards the right end of the buffer of the second half, the first half will be filled with new characters, and it goes on.

Sentinels − Sentinels are used to making a check, each time when the forward pointer is converted, a check is completed to provide that one half of the buffer has not converted off. If it is completed, then the other half should be reloaded.

Buffer Pairs − A specialized buffering technique can decrease the amount of overhead, which is needed to process an input character in transferring characters. It includes two buffers, each includes N-character size which is reloaded alternatively.

There are two pointers such as the lexeme Begin and forward are supported. Lexeme Begin points to the starting of the current lexeme which is discovered. Forward scans ahead before a match for a pattern are discovered. Before a lexeme is initiated, lexeme begin is set to the character directly after the lexeme which is only constructed, and forward is set to the character at its right end.

Preliminary Scanning − Certain processes are best performed as characters are moved from the source file to the buffer. For example, it can delete comments. Languages like FORTRAN which ignores blank can delete them from the character stream. It can also collapse strings of several blanks into one blank. Pre-processing the character stream being subjected to lexical analysis saves the trouble of moving the look ahead pointer back and forth over a string of blanks.

Updated on: 01-Nov-2023

35K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements