当前位置:网站首页>Key points of compilation principle examination in 2021-2022 academic year [overseas Chinese University]
Key points of compilation principle examination in 2021-2022 academic year [overseas Chinese University]
2022-07-02 15:01:00 【Sylvan Ding】
2021-2022 Key points of compilation principle examination in the academic year
notes : Please refer to Compile the principle experiment report
A term is used to explain
compiler : Compile phase , User input source program , The target program is translated by the compiler , The target program accepts input data at run time , Get data output
Interpreter : Combine the translation of the source program with the operation of the target program , Translate a source program , Then execute it , This way is called interpretation
memory : The function of compiler and interpreter is to translate and run the target program , The compiler separates the two , The interpreter performs translation and operation at the same time .
Lexical analysis : According to lexical rules , Identify each mark in the source program (token), Every token Represents a class of words (lexeme)
Syntax analysis : According to the rules of grammar , Identify the grammatical structure in the token stream , And construct a parsing tree that reflects the structure
Semantic analysis : According to semantic rules , Static semantic check of syntax tree , Ensure that the sentences with correct structure are also semantically correct
Intermediate code generation : According to the results of semantic analysis, intermediate code independent of specific machine is generated , Then the intermediate code is interpreted and executed according to the specific machine
Intermediate code optimization : The generation of intermediate code is mechanical and fixed , Equivalent transformation of intermediate code , Improve the time and space efficiency of code
Target code generation : Transform intermediate code into instruction code running on a specific machine
The symbol table : Record information about symbols in the source program , After reasonable organization , Fast at all stages of the compiler 、 Accurate addition, deletion, modification, query and other operations
Error handling : It is irresponsible to make the compiler stop working when encountering an error , Therefore, the compiler should have a wide range of error checking capabilities , Accurately report the type and location of errors
Language : Language L L L Is a finite alphabet Σ \Sigma Σ Set of finite length symbol strings on
Normal form : Limited alphabet Σ \Sigma Σ Normal forms and normal sets on L L L For the definition of
- ϵ \epsilon ϵ It's normal , L ( ϵ ) = { ϵ } L(\epsilon)=\{\epsilon\} L(ϵ)={ ϵ}
- a ∈ Σ a\in \Sigma a∈Σ, a a a It's normal , L ( a ) = { a } L(a)=\{a\} L(a)={ a}
- r , s r,s r,s It's normal , Corresponding to normal set L ( r ) , L ( s ) L(r),L(s) L(r),L(s), be
- r ∣ s r|s r∣s It's normal , L ( r ∣ s ) = L ( r ) ∪ L ( s ) L(r|s)=L(r)\cup L(s) L(r∣s)=L(r)∪L(s)
- r s rs rs It's normal , L ( r s ) = L ( r ) L ( s ) L(rs)=L(r)L(s) L(rs)=L(r)L(s)
- r ∗ r^* r∗ It's normal , L ( r ∗ ) = ( L ( r ) ) ∗ L(r^*)=(L(r))^* L(r∗)=(L(r))∗
- ( r ) (r) (r) It's normal , L ( ( r ) ) = L ( r ) L((r))=L(r) L((r))=L(r)
memory : Define the normal form on the finite alphabet and its corresponding normal set , First ϵ \epsilon ϵ It's normal , Its normal set is { ϵ } \{\epsilon\} { ϵ}. secondly , Each letter in the alphabet forms a normal form , The corresponding normal set is the set of this letter . Then define the basic operation of normal form : Or operations 、 Join operation 、 Closure operations 、 Brackets specify priority .
Pattern : Rules for generating and recognizing words
mark : According to some rules ( Pattern ) Identified elements
NFA:Nondeterministic Finite Automata( Indefinite finite state machine ) It's a quintuple ( Q , Σ , δ , S , F ) (Q,\Sigma,\delta,S,F) (Q,Σ,δ,S,F), among
- Q Q Q Is a set of finite states
- Σ \Sigma Σ Is a finite alphabet , Include ϵ \epsilon ϵ
- δ \delta δ Is the state transition function , δ \delta δ Is a multivalued function , The definition is as follows :
δ : Q × Σ → 2 Q \delta:Q\times \Sigma \to 2^Q δ:Q×Σ→2Q
among , 2 Q 2^Q 2Q yes Q Q Q Power set of . - S S S Is the initial state set , S ⊆ Q S\subseteq Q S⊆Q
- F F F Is the final state set , F ⊆ Q F\subseteq Q F⊆Q
DFA:Deterministic Finite Automata( Deterministic finite state machine ) It's a quintuple ( Q , Σ , δ , S , F ) (Q,\Sigma,\delta,S,F) (Q,Σ,δ,S,F), among
- Q Q Q Is a set of finite states
- Σ \Sigma Σ Is a finite alphabet
- δ \delta δ Is the state transition function , δ \delta δ Is a single valued function , The definition is as follows :
δ : Q × Σ → Q \delta:Q\times \Sigma \to Q δ:Q×Σ→Q - S S S Is the only initial state , S ∈ Q S\in Q S∈Q
- F F F Is the final state set , F ⊆ Q F\subseteq Q F⊆Q
memory : And NFA comparison ,DFA With certainty , therefore DFA Of Σ \Sigma Σ It doesn't contain ϵ \epsilon ϵ、 δ \delta δ Is a single valued mapping function 、 S S S Is the only initial state .
Context-free grammar : Four tuple ( V N , V T , P , S ) (V_N,V_T,P,S) (VN,VT,P,S), among
- V N V_N VN Is a finite set of non terminators
- V T V_T VT Is a finite set of terminators
- P P P Is a finite set of production , Each production type is like A → α A\to \alpha A→α, α ∈ ( V N ∪ V T ) ∗ \alpha \in (V_N\cup V_T)^* α∈(VN∪VT)∗
- S S S Is the starting sign of grammar
Terminator : The inseparable basic symbols that make up a language
non-terminal : The left part of the production can derive the symbol or symbol string
Grammar production : Recursively describe the sentence pattern composed of terminator and non terminator . The general form is : α → β \alpha \to \beta α→β. among , α , β ∈ ( V N ∪ V T ) ∗ \alpha, \beta \in (V_N\cup V_T)^* α,β∈(VN∪VT)∗, And α \alpha α Contains at least one non terminator
deduction : Start with grammar symbols S S S Start , Use production repeatedly , Replace the left nonterminal character of the production with the right grammatical symbol sequence , Until you get a terminator sequence
reduction : Is the inverse process of derivation , Replace the right part with the left part of production repeatedly , Until the input string analysis is completed
Sentence patterns and sentences : Constructive law G [ S ] G[S] G[S], if S ⇒ ∗ x , x ∈ ( V N ∪ V T ) ∗ S\stackrel{*}{\Rightarrow}x,x \in (V_N\cup V_T)^* S⇒∗x,x∈(VN∪VT)∗, be x x x A sentence pattern called this grammar . if S ⇒ ∗ x , x ∈ V T ∗ S\stackrel{*}{\Rightarrow}x,x \in V_T^* S⇒∗x,x∈VT∗, be x x x A sentence called this grammar
Two senses : If a grammar has a sentence corresponding to two different grammar trees , That is, it has different leftmost ( The most right ) deduction , This grammar is ambiguous
Pushdown automaton : Expand DFA Make it possible to access a stack , It is called push down stack . Enter a string of token streams , The pushdown automaton will be based on “ Finite state transition control ”, Decide whether to accept the token stream
First Set : F i r s t ( α ) = { a ∣ α ⇒ ∗ a … a n d a ∈ V T } First(\alpha)=\{a|\alpha\stackrel{*}{\Rightarrow} a\dots\ and\ a\in V_T \} First(α)={ a∣α⇒∗a… and a∈VT}, if α ⇒ ∗ ϵ \alpha \stackrel{*}{\Rightarrow} \epsilon α⇒∗ϵ, be ϵ ∈ F i r s t ( α ) \epsilon \in First(\alpha) ϵ∈First(α)
Follow Set : F o l l o w ( A ) = { a ∣ S ⇒ ∗ … A a … , a ∈ V T } Follow(A)=\{a|S\stackrel{*}{\Rightarrow}\dots Aa\dots,a\in V_T \} Follow(A)={ a∣S⇒∗…Aa…,a∈VT}, if S ⇒ ∗ … A S\stackrel{*}{\Rightarrow}\dots A S⇒∗…A, be # ∈ F o l l o w ( A ) \#\in Follow(A) #∈Follow(A)
Live prefix : Standardize sentence pattern prefix , Without any symbols after the handle . if S ⇒ ∗ δ A ω S\stackrel{*}{\Rightarrow}\delta A\omega S⇒∗δAω, And we can continue to deduce S ⇒ ∗ δ α β ω S\stackrel{*}{\Rightarrow}\delta \alpha \beta \omega S⇒∗δαβω, among , δ ∈ V ∗ , A ∈ V N , ω ∈ V N ∗ , α ∈ V + \delta \in V^*,A\in V_N,\omega\in V_N^*,\alpha\in V^+ δ∈V∗,A∈VN,ω∈VN∗,α∈V+, be α β \alpha\beta αβ yes δ α β ω \delta\alpha\beta\omega δαβω The handle of , δ α β \delta\alpha\beta δαβ Any prefix of is δ α β ω \delta\alpha\beta\omega δαβω Live prefix of
Move into statute conflict : A project set , Existing move in project , There are also statute projects , At this time, it cannot be decided whether to generate the move in action or the Convention action
Grammar guided translation : Match each production with corresponding semantic rules , In the process of derivation or specification, the corresponding semantic actions are executed according to the semantic rules of production
Comprehensive attributes : if A → α , b : = f ( c 1 , c 2 , … , c k ) A\to \alpha,b:=f(c_1,c_2,\dots,c_k) A→α,b:=f(c1,c2,…,ck), among , b b b yes A A A Properties of , c 1 , c 2 , … , c k c_1,c_2,\dots,c_k c1,c2,…,ck yes α \alpha α The attribute of some grammatical symbols in or A A A Other properties of , be b b b It is a comprehensive attribute
Inheritance attribute : if A → α , b : = f ( c 1 , c 2 , … , c k ) A\to \alpha,b:=f(c_1,c_2,\dots,c_k) A→α,b:=f(c1,c2,…,ck), among , b b b yes α \alpha α The attribute of a grammatical symbol in , c 1 , c 2 , … , c k c_1,c_2,\dots,c_k c1,c2,…,ck yes A A A The property of or α \alpha α Properties of other grammatical symbols in , be b b b Is an inherited property
memory : Comprehensive attributes “ Bottom up , Include yourself ”; Inheritance attribute “ From top to bottom , Including brothers ”.
Three address codes : Each code contains an operation and three addresses , for example x = y o p z x=y\ op\ z x=y op z, operation op, Two addresses y , z y,z y,z Used to store operands , Address x x x Used to store operation results
Quaternion : ( o p , a r g 1 , a r g 2 , r e s u l t ) (op,arg1,arg2,result) (op,arg1,arg2,result), Quaternion is a record structure with four fields , Its meaning is arg1 and arg2 Conduct op Specified operation , The results are stored in result in
Zipper and backfill : When the steering in the three address code is uncertain , Pull all three address codes that turn to the same address into a chain , Once the turn address is determined , Backfill the steering address along this chain
Short answer
Comparison between compiler and interpreter :
The function of compiler and interpreter is to translate and run the target program , From the perspective of translation , The two are similar , But the compiler separates the translation of the source program from the operation of the object code , The interpreter performs translation and operation at the same time .
The advantages of an interpreter :- It has good dynamic characteristics . When the target program runs , Control is in the interpreter , Users can dynamically modify the source program ;
- It has good portability . Recompile the interpreter , You can make it run in different environments .
Disadvantages of the interpreter :
- Time lost . The interpreter needs to translate 、 Running at the same time , It takes time to check the source program at runtime ;
- Large space loss . The operation of the interpreter also needs to occupy memory space .
memory :( identical ) The translation process is similar .( difference ) The compiler translates and runs separately , The interpreter performs at the same time .( Advantages and disadvantages of interpreter )“ Mobile spacetime ”——( optimal ) Portability 、 Dynamic characteristics ,( Lack of ) Great loss of time and space 、 Low operating efficiency .
The general process of compilation :
- Input : Source program
- Output : Target program
- The process —— Lexical analysis 、 Syntax analysis 、 Semantic analysis 、 Intermediate code generation 、 Code optimization 、 Target code generation
- rely on —— Symbol table management 、 Error handling
Comparison between multi pass scanning and one pass scanning :
- Definition —— logically , Every stage of compiler work , The program should be completely scanned and analyzed , It is called multi pass scanning . But actually , Compilers often combine several stages of work , Do a scan . In principle, , I hope the fewer times you scan, the better .
- Advantages of one pass scanning —— Avoid repeated scanning , Improve compilation speed ;
- Disadvantages of one pass scanning —— Algorithm logic is not clear 、 It is not convenient for code optimization 、 When grammatical and semantic errors occur, the preliminary work is invalid
memory :( Definition ) Multi pass scanning refers to the complete scanning and analysis of the source program for multiple times during the compiler compilation process , One pass scanning refers to the compiler combining several stages of work , Only scan and analyze once .( Advantages and disadvantages of one-time scanning )—— optimal : Efficient , Reduce repetitive work ; shortcoming :( Algorithm 、 Code optimization 、 Error handling ) Algorithm logic is not clear 、 Not conducive to code optimization 、 Errors in the grammatical and semantic analysis process lead to the invalidation of the results of the previous analysis process .
The operation of language :
memory : hand over 、 and 、 Bad 、 Connect 、 Closure 、 Positive closure
- X = L ∩ M , X = { s ∣ s ∈ L a n d s ∈ M } X=L\cap M,X=\{s|s\in L\ and\ s\in M \} X=L∩M,X={ s∣s∈L and s∈M}
- X = L ∪ M , X = { s ∣ s ∈ L o r s ∈ M } X=L\cup M,X=\{s|s\in L\ or\ s\in M \} X=L∪M,X={ s∣s∈L or s∈M}
- X = L − M , X = { s ∣ s ∈ L a n d s ∉ M } X=L- M,X=\{s|s\in L\ and\ s\notin M \} X=L−M,X={ s∣s∈L and s∈/M}
- X = L M , X = { s t ∣ s ∈ L a n d t ∈ M } X=LM,X=\{st|s\in L\ and\ t\in M \} X=LM,X={ st∣s∈L and t∈M}
- X = L ∗ , X = { s ∣ s ∈ L 0 ∪ L 1 ∪ … } X=L^*,X=\{s|s\in L^0\cup L^1\cup\dots \} X=L∗,X={ s∣s∈L0∪L1∪…}
- X = L + , X = { s ∣ s ∈ L 1 ∪ L 2 ∪ … } X=L^+,X=\{s|s\in L^1\cup L^2\cup\dots \} X=L+,X={ s∣s∈L1∪L2∪…}
Construct normal form :
- Construction of identifier :
d i g i t → [ 0 − 9 ] digit\to [0-9] digit→[0−9]
l e t t e r → [ a − z A − Z _ ] letter\to [a-zA-Z\_] letter→[a−zA−Z_]
i d → l e t t e r ( l e t t e r ∣ d i g i t ) ∗ id\to letter(letter|digit)^* id→letter(letter∣digit)∗ - Construction of floating point numbers :
z e r o → 0 zero\to 0 zero→0
n o n z e r o → [ 1 − 9 ] nonzero\to [1-9] nonzero→[1−9]
d i g i t → z e r o ∣ n o n z e r o digit\to zero|nonzero digit→zero∣nonzero
d i g i t s → d i g i t d i g i t ∗ digits\to digit\ digit^* digits→digit digit∗
i n t e g e r P a r t → z e r o ∣ n o n z e r o d i g i t ∗ integerPart\to zero|nonzero\ digit^* integerPart→zero∣nonzero digit∗
o p t i o n a l F r a c t i o n → . d i g i t s ∣ ϵ optionalFraction\to .digits|\epsilon optionalFraction→.digits∣ϵ
F l o a t N u m = i n t e g e r P a r t o p t i n a l F r a c t i o n FloatNum=integerPart\ optinalFraction FloatNum=integerPart optinalFraction
- Construction of identifier :
Proof of normal equivalence :
If normal P、Q Represents the same normal set , said P、Q Equivalent , Write it down as P=Q. The algebraic properties of identity operations of normal formulas can be used , Simplify complex normal form , So as to judge whether the normal formula is equivalent .memory :“ or ” Operation has commutative law 、 Associative law , Connection operation has the law of Association , Positive closure operation r + = r r ∗ = r ∗ r r^+=rr^*=r^*r r+=rr∗=r∗r, Closure operations r ∗ = r + ∣ ϵ r^*=r^+|\epsilon r∗=r+∣ϵ, Default operation r ? = r ∣ ϵ r?=r|\epsilon r?=r∣ϵ.
DFA Three forms of transformation :
- State transition functions s j = δ ( s i , a ) s_j=\delta(s_i,a) sj=δ(si,a)
- State transition diagram ( notes : The start state is pointed to by a double arrow , The end state is marked with double circles )
- State transition table ( notes : Each table element represents the transition state of a state when an input symbol )
Thompson Algorithm :
effect —— Construct recognition normal NFA.
form —— r 1 r 2 , r 1 ∣ r 2 , r 1 ∗ r_1r_2,r_1|r_2,r_1^* r1r2,r1∣r2,r1∗memory :Thompson Constructed by algorithm NFA in , non-existent “ Self return ” And a certain state can only lead to one non ϵ \epsilon ϵ Transfer .
improvement : In accordance with the normal structure NFA In practice , Always use “ The improved Thompson Algorithm ”, Simplify NFA The state of .
Eliminate ambiguity :
- Introduce a new non Terminator , Add a substructure and raise the priority level
- The position of recursive non terminator in production formula can reflect the associativity of grammatical symbols
else Suspension problem :
S → i f b t h e n S S\to if\ b\ then\ S S→if b then S
∣ i f b t h e n S e l s e S | if\ b\ then\ S\ else\ S ∣if b then S else S
∣ A |A ∣Aif b then if b then A else A
It has two meanings . The reason for this ambiguity is else and then The numbers are different , Undetermined else And the first then Corresponding to the second then Corresponding . therefore , take S It is divided into exact match (MS) And do not exactly match (UMS), Matching perfectly else-then The same number , Every else Can be found then With the corresponding , and UMS in else-then The numbers are different .MS and UMS It should be reflected in else The right combination of , That is, closest to else Sinister then matching .The result of eliminating ambiguity :
S → M S ∣ U M S S\to MS|UMS S→MS∣UMS
M S → i f b t h e n M S e l s e M S MS\to if\ b\ then\ MS\ else\ MS MS→if b then MS else MS
U M S → i f b t h e n S ∣ i f b t h e n M S e l s e U M S UMS\to if\ b\ then\ S\ |\ if\ b\ then\ MS\ else\ UMS UMS→if b then S ∣ if b then MS else UMSpredictive analyzer :
By the drive 、 Symbol stack and prediction analysis table . The input buffer stores the string to be analyzed , With # Mark the end of the input string , Finally output the result of parsing . Drive control read mark , According to the stack top symbol and the current read symbol , The operation of symbol stack is determined by the prediction analysis table .Move into the protocol analyzer :
By the drive 、 Symbol state stack and move in reduction analysis table , Its working mode is exactly the same as that of the prediction analyzer . The input buffer stores the string to be analyzed , With # Mark the end of the input string , Finally output the result of parsing . Drive control read mark , According to the top state of the stack and the current read symbol , Move into the reduction analysis table to decide what operation to do with the symbol state stack .Conversion between infix and suffix :
- Infix to suffix : The order of operands in the suffix formula is consistent with that in the infix formula , Operators are placed after the corresponding operands in the order of operation ;
- Suffix to infix : Sequence traversal suffix , If the current input is an operand , Then enter the stack ; If the current input is an operator , Then pop the operand from the stack for operation , Push the result back on the stack .
memory : The introduction of suffix eliminates brackets , Easy to calculate .
reference
- Liu Jian . Fundamentals of compiler principles . Xi'an : Xi'an University of Electronic Science and Technology Press ,2008.
- Liu Ming . Compiler principle [M]. Beijing : Electronic industry press ,2018.
- Huang Xianying . Compilation principle and practice tutorial [M]. Beijing : tsinghua university press ,2019.
- Keith Cooper. Compiler design [M]. Beijing : People's post and Telecommunications Press ,2012.
- Huang Xianying . Compiler principle : Analysis of key and difficult points · Problem solving · Experimental instruction [M]. Beijing : Mechanical industry press ,2008.
边栏推荐
- OpenCV调用USB摄像头的点滴
- 一张图彻底掌握prototype、__proto__、constructor之前的关系(JS原型、原型链)
- 4、数组指针和指针数组
- LeetCode - 搜索二维矩阵
- Bit by bit of OpenCV calling USB camera
- LeetCode 2310. The number of digits is the sum of integers of K
- JMeter script parameterization
- MFC CString 转 char*
- MFC CString to char*
- Huawei interview question: no palindrome string
猜你喜欢
使用mathtype编辑公式,复制粘贴时设置成仅包含mathjax语法的公式
[email protected] : The platform “win32“ is incompatible with this module."/>
info [email protected] : The platform “win32“ is incompatible with this module.
复用和分用
Btrace- (bytecode) dynamic tracking tool
LeetCode 2320. 统计放置房子的方式数
LeetCode 2310. The number of digits is the sum of integers of K
[development environment] install the visual studio community 2013 development environment (download the installation package of visual studio community 2013 with update 5 version)
taobao. trade. memo. Add (add remarks to a transaction) interface, Taobao store flag insertion interface, Taobao order flag insertion API interface, oauth2.0 interface
Reuse and distribution
电脑怎么设置扬声器播放麦克风的声音
随机推荐
fatal: unsafe repository is owned by someone else 的解决方法
fatal: unsafe repository is owned by someone else 的解决方法
LeetCode 2310. The number of digits is the sum of integers of K
C#延时、在线程中开启定时器、获取系统时间
Reuse and distribution
为什么只会编程的程序员无法成为优秀的开发者?
taobao.trade.memo.add( 对一笔交易添加备注 )接口,淘宝店铺插旗接口,淘宝订单插旗API接口,oAuth2.0接口
STM32 standard firmware library function name memory (II)
C语言习题---(数组)
Li Chuang EDA learning notes 15: draw border or import border (DXF file)
传感器数据怎么写入电脑数据库
[untitled] leetcode 2321 Maximum score of concatenated array
taobao.logistics.dummy.send( 无需物流发货处理 )接口,淘宝店铺发货API接口,淘宝订单发货接口,淘宝r2接口,淘宝oAu2.0接口
报错:npm WARN config global `--global`, `--local` are deprecated. Use `--location=global` instead.
Check password
[Space & single cellomics] phase 1: single cell binding space transcriptome research PDAC tumor microenvironment
LeetCode_滑动窗口_中等_395.至少有 K 个重复字符的最长子串
【空间&单细胞组学】第1期:单细胞结合空间转录组研究PDAC肿瘤微环境
C language exercises - (array)
C# richTextBox控制显示最大行数