当前位置:网站首页>Self made compiler learning 3: introduction to flex and bison

Self made compiler learning 3: introduction to flex and bison

2022-06-09 08:26:00 I_ belong_ to_ jesus

Flex and Bison install

Ubuntu Directly install in the environment :

apt-get install flex bison

first Flex Program

First, give the code (fb1-1.l)

/* Companion source code for "flex & bison", published by O'Reilly
 * Media, ISBN 978-0-596-15597-1
 * Copyright (c) 2009, Taughannock Networks. All rights reserved.
 * See the README file for license conditions and contact info.
 * $Header: /home/johnl/flnb/code/RCS/fb1-1.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
 */

/* fb1-1 just like unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}

%%

[a-zA-Z]+	{ words++; chars += strlen(yytext); }
\n		{ chars++; lines++; }
.		{ chars++; }

%%

main()
{
  yylex();
  printf("%8d%8d%8d\n", lines, words, chars);
}

flex The program is divided into 3 part , Each part is represented by a symbol %% To segment , The first part contains the declaration and option settings , The second part is a series of patterns and actions , The third part will be copied into the lexical generator C Code .

The first part ,%{%} The code between will be copied to the generation C The head of the document , As you can see, in this example, only 3 An integer variable .

The second part , Each pattern match is at the beginning of a line , Then there is what to do when matching C Code ,C The code is {} One or more lines of code between .[a-zA-Z]+ To match a word , Braces indicate that you can match any case of letters ,+ Means to match one or more preceding characters , A string of letters or a word ,flex in ,yytext Always specified as the input text for this match ,. To match any character .

The third part , The main program , call flex Lexical analysis routines , And output the result .

perform flex

flex -o fb1-1.c fb1-1.l

take Lexical generator Translate into c Program ,c File for :fb1-1.c

Use gcc Compile to generate executable :

gcc fb1-1.c -lfl -o fb1-1

Note that link libraries are required at compile time (-lfl), Execute the program , Input 3 Press... After a string ctrl+d sign out , The output is as follows :

 flex and bison Working together

Look at the second code :

/* Companion source code for "flex & bison", published by O'Reilly
 * Media, ISBN 978-0-596-15597-1
 * Copyright (c) 2009, Taughannock Networks. All rights reserved.
 * See the README file for license conditions and contact info.
 * $Header: /home/johnl/flnb/code/RCS/fb1-3.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
 */

/* recognize tokens for the calculator and print them out */

%%
"+"	{ printf("PLUS\n"); }
"-"	{ printf("MINUS\n"); }
"*"	{ printf("TIMES\n"); }
"/"	{ printf("DIVIDE\n"); }
"|"     { printf("ABS\n"); }
[0-9]+	{ printf("NUMBER %s\n", yytext); }
\n      { printf("NEWLINE\n"); }
[ \t] { }
.	{ printf("Mystery character %s\n", yytext); }
%%

Only the pattern matching is given here ,flex Of lfl The library provides a tiny main program to call this method analyzer , This is enough , The generation and compilation of lexical analyzer are the same as before , Generate an executable lexical analyzer and input after execution 12+34 and 13/34 Do lexical analysis , The results are as follows :

Consider a more complete calculator word Analyzer :

/* Companion source code for "flex & bison", published by O'Reilly
 * Media, ISBN 978-0-596-15597-1
 * Copyright (c) 2009, Taughannock Networks. All rights reserved.
 * See the README file for license conditions and contact info.
 * $Header: /home/johnl/flnb/code/RCS/fb1-4.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
 */

/* recognize tokens for the calculator and print them out */

%{
   enum yytokentype {
     NUMBER = 258,
     ADD = 259,
     SUB = 260,
     MUL = 261,
     DIV = 262,
     ABS = 263,
     EOL = 264 /* end of line */
   };

   int yylval;

%}

%%
"+"	{ return ADD; }
"-"	{ return SUB; }
"*"	{ return MUL; }
"/"	{ return DIV; }
"|"     { return ABS; }
[0-9]+	{ yylval = atoi(yytext); return NUMBER; }
\n      { return EOL; }
[ \t]   { /* ignore white space */ }
.	{ printf("Mystery character %c\n", *yytext); }
%%
main()
{
  int tok;

  while(tok = yylex()) {
    printf("%d", tok);
    if(tok == NUMBER) printf(" = %d\n", yylval);
    else printf("\n");
  }
}

The output is as follows :

  Let's analyze the code ,flex When word analysis returns the token stream , Every mark from Mark No (token number) and Token value (token's value) form , Where the token number is a small integer ,bison Automatically from 258 Numbered starting , And create a number defined .h file , Here, for the convenience of understanding, we pass enum Manually defined . Main program call yylex() Returns the token value , about NUMBER Convert to int.

parsers

Based on flex and bison To design a basic parser ,bison follow BNF(BackusNaur Form) grammar ,

stay BNF in ,::=(:) yes Defined as ,| Indicates its Choose either one on the left or right , The name on the left is Grammatical symbols (symbol), Effective BNF Yes. Recursion Of , For example, dealing with 1×5+2×3 This simple arithmetic expression :

<exp> ::= <factor>
    | <exp> + <factor>
<factor> ::= NUMBER
    | <factor> * NUMBER

exp Is defined as a factor perhaps factor + exp, and facotr Is defined as NUMBER perhaps factor×NUMBER, The leftmost part of each rule is a non terminator , To the right of the colon is the derivation rule of the non terminal character , If the derivation rule is more than one, then pass | separate .

We give bison Examples of code (fb1-5.y):

/* Companion source code for "flex & bison", published by O'Reilly
 * Media, ISBN 978-0-596-15597-1
 * Copyright (c) 2009, Taughannock Networks. All rights reserved.
 * See the README file for license conditions and contact info.
 * $Header: /home/johnl/flnb/code/RCS/fb1-5.y,v 2.1 2009/11/08 02:53:18 johnl Exp $
 */

/* simplest version of calculator */

%{
#  include <stdio.h>
%}

/* declare tokens */
%token NUMBER
%token ADD SUB MUL DIV ABS
%token OP CP
%token EOL

%%

calclist: /* nothing  Empty rules   Match from the beginning of the input */  
 | calclist exp EOL { printf("= %d\n> ", $2); } /*EOL Represents the end of an expression */
 | calclist EOL { printf("> "); } /* blank line or a comment */
 ;

exp: factor
 | exp ADD exp { $$ = $1 + $3; }
 | exp SUB factor { $$ = $1 - $3; }
 | exp ABS factor { $$ = $1 | $3; }
 ;

factor: term
 | factor MUL term { $$ = $1 * $3; }
 | factor DIV term { $$ = $1 / $3; }
 ;

term: NUMBER
 | ABS term { $$ = $2 >= 0? $2 : - $2; }
 | OP exp CP { $$ = $2; }
 ;
%%
main()
{
  printf("> "); 
  yyparse();
}

yyerror(char *s)
{
  fprintf(stderr, "error: %s\n", s);
}

bison The program also contains three parts : Declaration part , Rule matching part and C Code section .

The declaration part also uses %{%} Include and copy to as is C Code , And then %token Token declaration , Easy to tell bison The name of the token in the parser , Tokens are usually capitalized , Syntax symbols that are not declared as tokens must appear to the left of at least one rule .

The rule part is through a simple BNF Define the rules ,bison Use a single colon instead of ::=, At the same time, because the line spacing is not obvious , use Semicolon to indicate the end of the rule ,C The action code of is enclosed in curly brackets after each rule .

Let's look at the matching rules in detail , The first grammatical symbol calclist, The first match is null , The second is the expression (exp)+ Terminator (EOL), The third is the case where there is only a terminator .

The next grammatical symbol is term->factor->exp, Matching rules :

term: NUMBER perhaps Absolute value term perhaps Bracketed exp;

factor: term perhaps factor and term Multiplication of ( Left recursion ) perhaps factor and term( Left recursion ) Division perhaps The absolute value ;

exp: factor perhaps exp and exp Addition of perhaps exp and factor Multiplication of ( Left recursion ) perhaps exp and factor( Left recursion ) Division .

Obviously, it is composed of small to large ( Not absolutely ,term It will match exp), And the higher the priority, the higher the priority , because BNF The syntax is recursive , This order can ensure the correctness of the operation priority .

Target symbol ( The grammatical symbol to the left of the colon ) The value of is used in the action code $$ Instead of , The semantic values of the grammatical symbols on the right are $1 $2 $3 ....... When the lexical analyzer returns a token , Token values are always stored in yyval in .

Co compilation flex and bison Program

flex Code (fb1-5.l) as follows :

/* Companion source code for "flex & bison", published by O'Reilly
 * Media, ISBN 978-0-596-15597-1
 * Copyright (c) 2009, Taughannock Networks. All rights reserved.
 * See the README file for license conditions and contact info.
 * $Header: /home/johnl/flnb/code/RCS/fb1-5.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
 */

/* recognize tokens for the calculator and print them out */

%{
# include "fb1-5.tab.h"
%}

%%
"+"	{ return ADD; }
"-"	{ return SUB; }
"*"	{ return MUL; }
"/"	{ return DIV; }
"|"     { return ABS; }
"("     { return OP; }
")"     { return CP; }
[0-9]+	{ yylval = atoi(yytext); return NUMBER; }

\n      { return EOL; }
"//".*  
[ \t]   { /* ignore white space */ }
.	{ yyerror("Mystery character %c\n", *yytext); }
%%

We wrote a special Makefile file :

fb1-5:	fb1-5.l fb1-5.y
	bison -d fb1-5.y
	flex -o fb1-5.c fb1-5.l
	gcc -o [email protected] fb1-5.tab.c fb1-5.c -lfl

clean:
	rm -f fb1-5 \
	fb1-5.c fb1-5.tab.c fb1-5.tab.h

perform

bison -d fb1-5.y

Will generate files :fb1-5.tab.h and fb1-5.tab.c, stay flex Calling this header file in the file can realize joint compilation , Finally, the executable program is generated fb1-5, The test results are as follows :

You can see , Due to lack of support for negative numbers , Therefore, entering a negative number will result in an error . That's all , This section describes how to design the most basic lexical analyzer and parser , I'll be right back flex and bison Make a more detailed introduction .

end

原网站

版权声明
本文为[I_ belong_ to_ jesus]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090812499672.html