当前位置：网站首页>Compilation principle learning notes 1 (compilation principle overview and lexical analysis)

Compilation principle learning notes 1 (compilation principle overview and lexical analysis)

2022-07-28 17:55:00 【jsBeSelf】

1 summary

1.1 The build process

The compilation process has Five Stage , Pictured 1（ Fundamentals of image source compilation The second edition ） Shown .
Lexical analysis , Syntax analysis , Semantic analysis + Intermediate code generation , Optimize , And object code generation . Learn only the first two stages in your notes , There is no in-depth study in the last three stages .
These five stages are very similar to English Translation , Can compare memory . Like lexical analysis -> Recognize English words , Syntax analysis -> Sentence structure analysis , Semantic analysis + Intermediate code generation -> Preliminary translation , Optimize -> Modify the translation , Object code generation -> Write the final translation .
The output of each stage is the input of the next stage . The whole compilation process can also be divided into two parts , That is, the analysis part （123） And the integrated part （45）, Also called front-end and back-end , We only study the front end .
The build process

1.2 Tools and other treatments

Five stages , There are also some analysis tools ：

1） Lexical analysis ： Such as finite automata , Used to describe word formation rules ;
2） Syntax analysis ： Such as context free grammar , Used to describe grammatical rules .

Besides , Symbol table management and error handling also run through the whole compilation process .

2 Lexical analysis

2.1 Main work

Lexical analysis here has double meanings ：

1） Stipulate the rules of word formation , Also known as word formation rules ;
2） Recognize input sequences according to word formation rules , Also known as lexical analysis .

Lexical analyzer is the only part of the compiler that deals with the source program .
The main work is ：

1） Filter out the comments in the source program , Useless parts like spaces ;
2） Identification mark , Give it to the parser ;
3） Call symbol table manager or error handler （ Handle lexical errors ）.

The function of lexical analyzer is to identify each mark in the source program , Form a token stream , Pass to parser .

2.2 String related

The operations between strings are ： and , hand over , Connect , Bad , Closure , Positive closure, etc .
Sometimes strings may not be enumerable , But there is a certain law , It can be used Normal form To describe it , Represents a kind of string .

2.3 Identification of marks

The identification of marks can be achieved by Finite automaton To complete .
NFA（ Uncertain finite automata ） It's a quintuple ,M=(S, ∑, move, s0, F), Respectively represent finite state sets , Limited input character set , State transition function , Unique initial state , Final state set .
There are three ways to describe finite automata ： Text definition ; Conversion chart ; Transformation matrix . As shown in the figure below （ Source compilation principle Second Edition ）

NFA Is characterized by uncertainty , For the same character , There may be more than one next state transition , namely move The function is one to many .
And here is the corresponding DFA（ Deterministic finite automata ）, yes NFA A special case of , among ： The state transition diagram is not marked ε The edge of , For each state s And every character a, At most, there is only one next state , namely move The function is one-to-one .
stay DFA No backtracking is required for identifying input sequences on , Improved recognition efficiency . So think more about using DFA.
If two finite automata recognize the same normal set , Then two finite automata are equivalent , by NFA Turn into DFA Foreshadowing .

2.4 From normal form to lexical analyzer （ a key ）

The whole process is ：

1） Describe patterns in normal form ;
2） For each normal form, construct a NFA;
3） Will be constructed NFA Convert to equivalent DFA;
4） Optimize DFA, namely DFA To minimize ;
5） According to the minimum DFA Construct a lexical analyzer （ Write code ）.

1） There's no way , It's about finding the rules

2） From normal form to NFA： There are two ways ：Thompson Algorithm ; Decomposition .
Thompson Algorithm （ Source compilation principle Second Edition ）：

The decomposition method is relatively simple , Such as below （ I drew it myself ）：

3） from NFA To DFA
The basic idea ： Determine the next state of uncertainty . Method ： Subset method , Here's the picture （ Source compilation principle Second Edition ）：

4）DFA To minimize the
To minimize the DFA, Will a DFA Think in reverse , Could become NFA, It indicates that some states are redundant , At this time, if you turn the opposite NFA Also become DFA, It's both positive and negative DFA Minimum DFA 了 .
practice ： First, divide all non terminal states and all terminal states into two sets , Then determine the output direction of each element in each set , If each element in a single set is also input , The destination is the same set , Then there is no need to distinguish , Otherwise, different elements will be eliminated , Until it is impossible to distinguish .

Miscellany

1） How to judge whether two sets are equal ： That is, every set A The elements in , It can be deduced that it belongs to the set B, meanwhile , Each set B The elements in , It can be deduced that it belongs to the set A.

原网站

版权声明
本文为[jsBeSelf]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281633289973.html