Parser-Lexer Communication – LEX and YACC
In this article, we will understand how Parser-Lexer (Syntax Analyzer and Lexical Analyzer) Communicates with each other.
Introduction to LEX:
Lex & YACC are the tools designed for writers of compilers & interpreters.
Lex & Yacc helps us write programs that transform structured input. In programs with structured input, two tasks occur again & again.
- Dividing the input into meaningful units (tokens).
- Establishing or discovering the relationship among the tokens.
Two Rules to remember of Lex:
- Lex will always match the longest (number of characters) token possible.
Ex: Input: abc
Then [a-z]+ matches abc rather than a or ab or bc. - If two or more possible tokens are of the same length, then the token with the regular expression that is defined first in the lex specification is favored.
Ex:
[a-z]+ {printf(“Hello”);}
[hit] {printf(“World”);}
Input: hit output: Hello
Parser-Lexer Communication
In the process of compilation, the lexical analyzer and parser work together. That means when the parser requires a string of tokens it invokes a lexical analyzer, in turn, the lexical analyzer supplies tokens to the parser.
When we use LEX and YACC together the YACC becomes a high-level routine. It calls the yylex()
function of the lexer in order to identify and collect the tokens. These generated tokens can then be demanded by the YACC parser for further arrangement. The parser collects a sufficient number of tokens and builds the parse tree. Not all the tokens are useful to the parser. Some tokens can be ignored for an efficient compilation process.
Generally, any token can be agreed upon by LEX and YACC using a token code. The token code can be defined using a macro #define. YACC writes the token codes in a separate header file called y.tab.h as follows:
#define NAME 259
The token code 0 represents the logical end of the input.
Summary:
This article discusses, how LEX and YACC Communicate with each other. If you like the article, do share it with your friends.