Regular Expressions LEX and YACC


Regular Expressions in LEX and YACC

In this article, we will discuss the concept of Regular Expressions LEX and YACC and different LEX actions used while writing LEX programs.

Definition of Regular Expressions:

A Regular Expression is a pattern description using a “meta” language, a language that you use to describe particular patterns of interest.

Characters that form the regular expressions:

Regular ExpressionDescription
.Matches any single character except \n.
*Match with zero or more occurrences of the preceding pattern or expression.
Example: [0-9]*
+Matches with one or more occurrences of the preceding pattern.
Example: [a-z]+
?Match with zero or one occurrence of the preceding pattern or expression.
Example: -?[0-9]* : starts with an – sign
^1. Matches the beginning of a line as the first character.
Example: ^verb means input starts with a verb word
2. Used as for negation in Character class.
Example: [^0-9]+ means Except 0-9
[ ]A character class. Matches any character in the brackets. – Used to denote a range.
Example: [A-Z] implies all characters from A to Z.
$Matches the end of the line as the last character of the pattern.
Example: a+b$
{ }Indicates how many times a pattern can be present.
Example: A{1,3} implies one or three occurrences of A may be present.
|Logical OR between expressions. i.e. another alternative.
Example: cow | horse
\Used to escape meta characters. Also used to remove the special meaning of characters as defined in this table.
Example: \” [a-z]+ \”
“  ”The string written in quotes matches literally.
Example: “hello”
/Look ahead. Matches the preceding pattern only if followed by the succeeding expression.  Example: A0/1 matches A0 only if A01 is the input.
( )Groups a series of regular expressions.
Example: ([0-9]+) | ([0-9]*\.[0-9]+)

LEX Actions:

There are various LEX actions that can be used for ease of programming using the LEX tool.

BEGINIt indicates the start state. The lexical analyzer starts at state 0.
ECHOIt emits the input as it is.
Char *yytextWhen the lexer matches or recognizes the token from the input token then the lexeme is stored in a null-terminated string called yytext.
FILE *yyinIt is the standard input file.
FILE *yyoutIt is the standard output file.
int  yylengIt stores the length or number of characters in the input string.
yylex( )This is an important function. As soon as a call to yylex( ) is encountered, the scanner starts scanning the source program.
yywrap( )It calls when the scanner encounters the end of file.
yylvalIt gives the value associated with the token.


This article discusses Regular Expressions in LEX and YACC and LEX actions. If you like the article, do share it with your friends.

Leave a Comment

Your email address will not be published. Required fields are marked *