当前位置:网站首页>Self made compiler learning 3: introduction to flex and bison
Self made compiler learning 3: introduction to flex and bison
2022-06-09 08:26:00 【I_ belong_ to_ jesus】
Flex and Bison install
Ubuntu Directly install in the environment :
apt-get install flex bisonfirst Flex Program
First, give the code (fb1-1.l):
/* Companion source code for "flex & bison", published by O'Reilly
* Media, ISBN 978-0-596-15597-1
* Copyright (c) 2009, Taughannock Networks. All rights reserved.
* See the README file for license conditions and contact info.
* $Header: /home/johnl/flnb/code/RCS/fb1-1.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
*/
/* fb1-1 just like unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main()
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}flex The program is divided into 3 part , Each part is represented by a symbol %% To segment , The first part contains the declaration and option settings , The second part is a series of patterns and actions , The third part will be copied into the lexical generator C Code .
The first part ,%{%} The code between will be copied to the generation C The head of the document , As you can see, in this example, only 3 An integer variable .
The second part , Each pattern match is at the beginning of a line , Then there is what to do when matching C Code ,C The code is {} One or more lines of code between .[a-zA-Z]+ To match a word , Braces indicate that you can match any case of letters ,+ Means to match one or more preceding characters , A string of letters or a word ,flex in ,yytext Always specified as the input text for this match ,. To match any character .
The third part , The main program , call flex Lexical analysis routines , And output the result .
perform flex
flex -o fb1-1.c fb1-1.ltake Lexical generator Translate into c Program ,c File for :fb1-1.c
Use gcc Compile to generate executable :
gcc fb1-1.c -lfl -o fb1-1Note that link libraries are required at compile time (-lfl), Execute the program , Input 3 Press... After a string ctrl+d sign out , The output is as follows :

flex and bison Working together
Look at the second code :
/* Companion source code for "flex & bison", published by O'Reilly
* Media, ISBN 978-0-596-15597-1
* Copyright (c) 2009, Taughannock Networks. All rights reserved.
* See the README file for license conditions and contact info.
* $Header: /home/johnl/flnb/code/RCS/fb1-3.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
*/
/* recognize tokens for the calculator and print them out */
%%
"+" { printf("PLUS\n"); }
"-" { printf("MINUS\n"); }
"*" { printf("TIMES\n"); }
"/" { printf("DIVIDE\n"); }
"|" { printf("ABS\n"); }
[0-9]+ { printf("NUMBER %s\n", yytext); }
\n { printf("NEWLINE\n"); }
[ \t] { }
. { printf("Mystery character %s\n", yytext); }
%%Only the pattern matching is given here ,flex Of lfl The library provides a tiny main program to call this method analyzer , This is enough , The generation and compilation of lexical analyzer are the same as before , Generate an executable lexical analyzer and input after execution 12+34 and 13/34 Do lexical analysis , The results are as follows :

Consider a more complete calculator word Analyzer :
/* Companion source code for "flex & bison", published by O'Reilly
* Media, ISBN 978-0-596-15597-1
* Copyright (c) 2009, Taughannock Networks. All rights reserved.
* See the README file for license conditions and contact info.
* $Header: /home/johnl/flnb/code/RCS/fb1-4.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
*/
/* recognize tokens for the calculator and print them out */
%{
enum yytokentype {
NUMBER = 258,
ADD = 259,
SUB = 260,
MUL = 261,
DIV = 262,
ABS = 263,
EOL = 264 /* end of line */
};
int yylval;
%}
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
\n { return EOL; }
[ \t] { /* ignore white space */ }
. { printf("Mystery character %c\n", *yytext); }
%%
main()
{
int tok;
while(tok = yylex()) {
printf("%d", tok);
if(tok == NUMBER) printf(" = %d\n", yylval);
else printf("\n");
}
}The output is as follows :

Let's analyze the code ,flex When word analysis returns the token stream , Every mark from Mark No (token number) and Token value (token's value) form , Where the token number is a small integer ,bison Automatically from 258 Numbered starting , And create a number defined .h file , Here, for the convenience of understanding, we pass enum Manually defined . Main program call yylex() Returns the token value , about NUMBER Convert to int.
parsers
Based on flex and bison To design a basic parser ,bison follow BNF(BackusNaur Form) grammar ,
stay BNF in ,::=(:) yes Defined as ,| Indicates its Choose either one on the left or right , The name on the left is Grammatical symbols (symbol), Effective BNF Yes. Recursion Of , For example, dealing with 1×5+2×3 This simple arithmetic expression :
<exp> ::= <factor>
| <exp> + <factor>
<factor> ::= NUMBER
| <factor> * NUMBERexp Is defined as a factor perhaps factor + exp, and facotr Is defined as NUMBER perhaps factor×NUMBER, The leftmost part of each rule is a non terminator , To the right of the colon is the derivation rule of the non terminal character , If the derivation rule is more than one, then pass | separate .
We give bison Examples of code (fb1-5.y):
/* Companion source code for "flex & bison", published by O'Reilly
* Media, ISBN 978-0-596-15597-1
* Copyright (c) 2009, Taughannock Networks. All rights reserved.
* See the README file for license conditions and contact info.
* $Header: /home/johnl/flnb/code/RCS/fb1-5.y,v 2.1 2009/11/08 02:53:18 johnl Exp $
*/
/* simplest version of calculator */
%{
# include <stdio.h>
%}
/* declare tokens */
%token NUMBER
%token ADD SUB MUL DIV ABS
%token OP CP
%token EOL
%%
calclist: /* nothing Empty rules Match from the beginning of the input */
| calclist exp EOL { printf("= %d\n> ", $2); } /*EOL Represents the end of an expression */
| calclist EOL { printf("> "); } /* blank line or a comment */
;
exp: factor
| exp ADD exp { $$ = $1 + $3; }
| exp SUB factor { $$ = $1 - $3; }
| exp ABS factor { $$ = $1 | $3; }
;
factor: term
| factor MUL term { $$ = $1 * $3; }
| factor DIV term { $$ = $1 / $3; }
;
term: NUMBER
| ABS term { $$ = $2 >= 0? $2 : - $2; }
| OP exp CP { $$ = $2; }
;
%%
main()
{
printf("> ");
yyparse();
}
yyerror(char *s)
{
fprintf(stderr, "error: %s\n", s);
}
bison The program also contains three parts : Declaration part , Rule matching part and C Code section .
The declaration part also uses %{%} Include and copy to as is C Code , And then %token Token declaration , Easy to tell bison The name of the token in the parser , Tokens are usually capitalized , Syntax symbols that are not declared as tokens must appear to the left of at least one rule .
The rule part is through a simple BNF Define the rules ,bison Use a single colon instead of ::=, At the same time, because the line spacing is not obvious , use Semicolon to indicate the end of the rule ,C The action code of is enclosed in curly brackets after each rule .
Let's look at the matching rules in detail , The first grammatical symbol calclist, The first match is null , The second is the expression (exp)+ Terminator (EOL), The third is the case where there is only a terminator .
The next grammatical symbol is term->factor->exp, Matching rules :
term: NUMBER perhaps Absolute value term perhaps Bracketed exp;
factor: term perhaps factor and term Multiplication of ( Left recursion ) perhaps factor and term( Left recursion ) Division perhaps The absolute value ;
exp: factor perhaps exp and exp Addition of perhaps exp and factor Multiplication of ( Left recursion ) perhaps exp and factor( Left recursion ) Division .
Obviously, it is composed of small to large ( Not absolutely ,term It will match exp), And the higher the priority, the higher the priority , because BNF The syntax is recursive , This order can ensure the correctness of the operation priority .
Target symbol ( The grammatical symbol to the left of the colon ) The value of is used in the action code $$ Instead of , The semantic values of the grammatical symbols on the right are $1 $2 $3 ....... When the lexical analyzer returns a token , Token values are always stored in yyval in .
Co compilation flex and bison Program
flex Code (fb1-5.l) as follows :
/* Companion source code for "flex & bison", published by O'Reilly
* Media, ISBN 978-0-596-15597-1
* Copyright (c) 2009, Taughannock Networks. All rights reserved.
* See the README file for license conditions and contact info.
* $Header: /home/johnl/flnb/code/RCS/fb1-5.l,v 2.1 2009/11/08 02:53:18 johnl Exp $
*/
/* recognize tokens for the calculator and print them out */
%{
# include "fb1-5.tab.h"
%}
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
"(" { return OP; }
")" { return CP; }
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
\n { return EOL; }
"//".*
[ \t] { /* ignore white space */ }
. { yyerror("Mystery character %c\n", *yytext); }
%%We wrote a special Makefile file :
fb1-5: fb1-5.l fb1-5.y
bison -d fb1-5.y
flex -o fb1-5.c fb1-5.l
gcc -o [email protected] fb1-5.tab.c fb1-5.c -lfl
clean:
rm -f fb1-5 \
fb1-5.c fb1-5.tab.c fb1-5.tab.hperform
bison -d fb1-5.yWill generate files :fb1-5.tab.h and fb1-5.tab.c, stay flex Calling this header file in the file can realize joint compilation , Finally, the executable program is generated fb1-5, The test results are as follows :

You can see , Due to lack of support for negative numbers , Therefore, entering a negative number will result in an error . That's all , This section describes how to design the most basic lexical analyzer and parser , I'll be right back flex and bison Make a more detailed introduction .
end
边栏推荐
- 【读点论文】GhostNet: More Features from Cheap Operations 卷积操作还是比较昂贵,特征图冗余可以线性变换获得
- Quarkus实战学习一
- RMAN备份概念_关于映像拷贝(Image Copy)
- Laravel cross database query
- Self made compiler learning 1: use of CB compiler
- Leetcode basic programming: Search
- OpenInfra Summit 2022 | 安超云用户脱颖而出 入围超级用户大奖
- English语法_时间副词
- Self made compiler learning 4: using flex
- 苹果胜诉 法官驳回iPhone安全欺诈集体诉讼
猜你喜欢

苹果胜诉 法官驳回iPhone安全欺诈集体诉讼

Leetcode basic programming: Search

Apple wins judge dismisses iPhone Security Fraud Class Action

Blow up the idea artifact in use recently

Self made compiler learning 2: compilation process

自制编译器学习3:Flex和Bison简介

English语法_副词

【TeXstudio】【2】一般的图片和表格的表现形式

RMAN备份概念_关于备份集(Backup Set)

Alibaba cloud ack pull free enterprise ACR image
随机推荐
Market Research - current market situation and future development trend of aloe leaf powder in the world and China
2022-2028 investigation and trend analysis report on global anti UAV netting gun industry
Excerpt of a verse of compassion and gladness
No, you still can't read the log with requestid?
Elk+filebeat deployment and installation
配置RMAN备份的环境_配置备份优化(BACKUP OPTIMIZATION)
2022-2028 global nonlinear node detector industry survey and trend analysis report
3D programming mode: dependent isolation mode
Self made compiler learning 2: compilation process
Kibana:Kibana 入门 (一)
Market Research - current situation and future development trend of global and Chinese floating fish feed market
Leetcode: find the number of recent palindromes
Could not find artifact com. retail. stock:retail-stock-center:pom:1.0-SNAPSHOT in snapshots
Understand the difference between left join, right join and join
[pat (basic level) practice] - [sort] 1077 mutual evaluation score calculation
English语法_地点副词
Redlock red lock security debate (Part 1)
Self made compiler learning 4: using flex
JS 实现三级联动
RMAN备份数据库_指定备份输出选项