当前位置:网站首页>How fast does it take to implement a super simple language
How fast does it take to implement a super simple language
2022-07-29 03:36:00 【Luo Zhaocheng CSDN】
This is the first lesson of teacher Gong Wenxue , After reading the content of this course , I don't know exactly what I want to do , What to do , After reading the supporting original code , Finally understand what needs to be done , This is for record only , Comb your mind . Here is the main body :
0x00 Code snippet definition
What we need to do in this course is to do a simple grammar analysis , Will input Token Array , Into a syntax tree , Then deal with the relationship between functions and function calls , Let the function connect with the function call , Finally, run the generated syntax tree .( I used in the article Java Language to achieve )
First , Take a look at the content of the code segment we need to parse :
function sayHello() {
println("Hello World!");
}
sayHello();Above code , It can be easily read , Defines a function sayHello, In this function , The built-in function was called println , Used to output Hello World! This string , At the end of the code , Call the defined sayHello function .
0x01 Lexical disassembly
First , We manually split this code segment into Token Array . about Token Come on , There are several types :
keyword TokenKind.KeyWord , for example function
identifier TokenKind.Identifier , for example sayHello, println
character string TokenKind.StringLiteral, for example "Hello World!"
Separator TokenKind.Separator , for example Space ,{}, ;
Terminator TokenKind.EOF , stay Token At the end of the array , Indicates the end of the code
Split the above code segment into Token After the array, it will become like this :
static Token[] tokenArray = new Token[]{
new Token(TokenKind.KeyWord, "function"),
new Token(TokenKind.Identifier, "sayHello"),
new Token(TokenKind.Separator, "("),
new Token(TokenKind.Separator, ")"),
new Token(TokenKind.Separator, "{"),
new Token(TokenKind.Identifier, "println"),
new Token(TokenKind.Separator, "("),
new Token(TokenKind.StringLiteral, "Hello World!"),
new Token(TokenKind.Separator, ")"),
new Token(TokenKind.Separator, ";"),
new Token(TokenKind.Separator, "}"),
new Token(TokenKind.Identifier, "sayHello"),
new Token(TokenKind.Separator, "("),
new Token(TokenKind.Separator, ")"),
new Token(TokenKind.Separator, ";"),
new Token(TokenKind.EOF, "")
};0x02 Grammar explanation
Back to parsing , Need to traverse the whole Token list , Afterlife idiom method tree . Syntax tree is to put the whole program code , Deduce according to certain rules , Form a tree structure .
Back to this example , To simplify the process of understanding , In our code , Contains only simple functions 、 Code for function calls , And system print function println Call to . In the code snippet I parsed , Use ; To indicate the end of a line of code .
A simple function represents , With keywords function start , With function name , And the function does not contain parameters . In the body of a function , Can only contain calls to any number of functions ( Functions defined in the code , And system print function println Call to ). The function body of the function is contained in the delimiter
{}middle . The function name needs to be followed by a separator().System print function println For the definition of
println(String... args), That is, when we call this function , You can pass any number of strings to it , Used for printout on the console . An example of the call is as follows :
println(); // Do not output any value
println("Hello World !"); // Pass a parameter , The result is :Hello World !
println("Hello", "World", "!")// Passing multiple parameters , The result is :Hello World ! Be careful , In this case , "Hello World!" Turning into Token When , hold "" Get rid of , The results are as follows
new Token(TokenKind.StringLiteral, "Hello World!"),0x03 Grammar definition
Go through the above analysis , Be able to know clearly , What does the grammar parser need to do .
Get Token After array , You need to traverse each from scratch Token , For a piece of code , You need to know that the current code should follow Function definition To analyze or Function call Parsing , The solution is simple , That is one kind of trial , First give the code to the parser , First according to Function definition Parsing , If in the process of parsing , Found that he was not Function definition , Just continue to the next possibility , Press Function call Parsing . If it is still not , In this case , You need to report an error , The prompt code is wrong .
Function definition
With keywords function start , It will be followed by an identifier as the function name , In this case , Parameters are not supported in defined functions , Therefore, the function name will be followed by
(), In the back is by{}Enclosed function body content , Of course, the function body can be empty , At this time, only{}.Function call
Start with an identifier , This identifier is the name of the function to be called , Followed by the
(, It may be followed by any number of strings ( Only used in system print functions ), Finally, there will be a), At the end of the code , There needs to be;To end with .The body of the function
As described in Section 2 above , The function body contains multiple function calls , Traverse directly backwards Token, Continue to parse by function call , Until the next Token by
}end .
If in the above parsing process , There is an error , You can throw an exception directly .
From the description above , have access to EBNF Format , Make more professional expression :
prog = (functionDecl | functionCall)*
functionDecl: "function" Identifier "(" ")" functionBody
functionBody: "{" functionCall* "}"
functionCall: Identifier "(" parameterList? ")"
parameterList: StringLiteral ("," StringLiteral)*0x04 Code implementation
According to the description above , The program code will loop through all the code , And try to define according to the function 、 Function calls two rules to parse code , Build a grammar tree , So the top code is as follows :
while (true) { // 1. First analyze according to the function definition
Statement statement = this.parseFunctionDecl();
if (statement != null) {
// After parsing to the function definition , Store it , Do the following Token analysis
statements.add(statement);
continue;
}
// 2. After function definition parsing fails , Try parsing by function call
statement = this.parseFunctionCall();
if (statement != null) {
// After parsing to function call , Store it ,
statements.add(statement);
continue;
}
// 3. Not a function definition , Nor is it a function call , Will resolve unsuccessfully , Terminate this parsing
if (statement == null) {
break;
}
}Next , Analyze the function definition , The code is as follows :
if (token.kind == TokenKind.KeyWord && "function".equals(token.text)) {
// Get function name
token = this.tokenizer.next();
// here token Should be function name , In this example , It should be followed by left and right parentheses
if (token.kind == TokenKind.Identifier) {
Token leftBracketsToken = this.tokenizer.next(); // Left parenthesis
Token rightBracketsToken = this.tokenizer.next(); // Right bracket
if ("(".equals(leftBracketsToken.text) && ")".equals(rightBracketsToken.text)) {
// Analytic function body
FunctionBody body = this.parseFunctionBody();
return new FunctionDecl(token.text, body);
}
}
}The next step is to implement the parsing of function calls , The code is as follows :
if (token.kind == TokenKind.Identifier) {
Token leftBracketsToken = this.tokenizer.next(); // Left parenthesis
if (leftBracketsToken.text.equals("(")) {
Token nextToken = this.tokenizer.next();
// Not for ) when , There may be String Multiple parameters of , Define in the system print function .
while (!")".equals(nextToken.text)) {
if (nextToken.kind == TokenKind.StringLiteral) {
params.add(nextToken.text);
}
nextToken = this.tokenizer.next();
if (!")".equals(nextToken.text)) {
if (",".equals(nextToken.text)) {
nextToken = this.tokenizer.next();
}
}
}
// Consume... At the end of the statement ; Number
nextToken = this.tokenizer.next();
if (";".equals(nextToken.text)) {
return new FunctionCall(token.text, params);
}
}
}Finally, the analysis of the function body , The code is as follows :
// The function body is contained in {} Inside
if ("{".equals(token.text)) {
// The function body contains only function calls , So the loop keeps calling the function and calling the parsing method , To parse function calls , Only until the function call statement is not found .
FunctionCall functionCall = this.parseFunctionCall();
while (functionCall != null) {
calls.add(functionCall);
functionCall = this.parseFunctionCall();
}
token = this.tokenizer.next();
// Judge the last token Is it }, If not , Description code syntax error .
if (!"}".equals(token.text)) {
System.err.println("Expecting '}' in FunctionBody, while we got a " + token.text);
} else {
return new FunctionBody(calls);
}
}0x05 Reference resolution
In the first article , It contains many function definitions and function calls , Is the function called at the function call defined ? therefore , You need to associate the function call with the tree node of the function definition , On the one hand, it can judge whether the current function call is legal , On the other hand , It can also speed up the efficiency of code runtime , Avoid repeatedly querying function definitions .
For the syntax number in this example , You need to traverse the method calls under the root node , And method calls in the function body . The code example is as follows :
public void visitProg(Prog prog) {
this.prog = prog;
// Traverse the current prog All nodes below
for (Statement stmt : this.prog.getStmts()) {
if (stmt instanceof FunctionCall) {
this.resolveFunctionCall((FunctionCall) stmt);
} else {
this.visitFunctionDecl(((FunctionDecl) stmt).getBody());
}
}
}
public void visitFunctionDecl(FunctionBody body) {
// Traverse all function calls in the function body
for (FunctionCall functionCall : body.getFunctionCalls()) {
this.resolveFunctionCall(functionCall);
}
}
private void resolveFunctionCall(FunctionCall call) {
FunctionDecl decl = this.findFunctionDecl(call.getName());
if (decl != null) {
call.setDefinition(decl);
} else {
if (!"println".equals(call.getName())) {
// If it's not a built-in function , Direct error
}
}
}
private FunctionDecl findFunctionDecl(String name) {
// The first version , In custom functions , No parameters , So match the name directly .
for (Statement stmt : this.prog.getStmts()) {
if (stmt instanceof FunctionDecl && ((FunctionDecl) stmt).getName().equals(name)) {
return (FunctionDecl) stmt;
}
}
return null;
}0x06
Last, last , Is to make the parsed code run , How to run ? The function call executes the corresponding function . How does the function we have defined work ? The defined function is defined by N Formed by function calls , So back and forth , And he began to run . The code example is as follows :
public void visitProg(Prog prog) {
for (Statement stmt : prog.getStmts()) {
if (stmt instanceof FunctionCall) { // Find function call , To execute this function call
this.runFunction((FunctionCall) stmt);
}
}
}
private void visitFunctionBody(FunctionBody body) {
// Execute the method call in the function body
for (FunctionCall functionCall : body.getFunctionCalls()) {
this.runFunction(functionCall);
}
}
private void runFunction(FunctionCall stmt) {
// If it is a system print function , Directly traverse the parameters for printing
if ("println".equals(stmt.getName())) {
for (String parameter : stmt.getParameters()) {
System.out.print(parameter + " ");
}
System.out.println();
} else {
// Judge whether the current function call has a declared function , If there is , Then execute the function call in the function body .
if (stmt.getDefinition() != null) {
this.visitFunctionBody(stmt.getDefinition().getBody());
}
}
}above , Record all the contents for this study .
边栏推荐
- 3.1 common neural network layer (I) image correlation layer
- ShardingSphere之水平分表实战(三)
- 暴力递归到动态规划 01 (机器人移动)
- Military product development process - transition phase
- Anti vulnerability · benefit from uncertainty --- management?
- "The programming is not standardized, and my colleagues are in tears!"
- C obtains JSON format data asynchronously from the web address
- Understanding of p-type problems, NP problems, NPC problems, and NP hard problems in natural computing
- Simple use of eventbus
- three.js 第五十四用如何给shader传递结构体数组
猜你喜欢

Producer consumer model of concurrent model

Ten thousand words detailed Google play online application standard package format AAB

(2022 Hangdian multi school III) 1002 boss rush (pressure dp+ dichotomy)

Vs code must know and know 20 shortcut keys!

Simple code implementation of decision tree

Violence recursion to dynamic programming 01 (robot movement)

暴力递归到动态规划 01 (机器人移动)

Rongyun IM & RTC capabilities on new sites

Inclusion exclusion principle

(2022 Hangdian multi school III) 1011 link is as bear (thinking + linear basis)
随机推荐
The latest second edition of comic novels, listening to books, three in one, complete source code / integrated visa free interface / building tutorials / with acquisition interface
Easy to use remote sensing data set download website~~~
mysql的timestamp存在的时区问题怎么解决
今晚7:30 | 连界、将门、百度、碧桂园创投四位大佬眼中的AI世界,是继续高深还是回归商业本质?...
Rdkit I: using rdkit to screen the structural characteristics of chemical small molecules
安装抓包证书
Environment configuration stepping pit during colab use
一种简单通用的获取函数栈空间大小的方法
Excel splicing database statement
Summary of basic knowledge points of C language
1.4 nn. Module neural network (II)
C language programming | exchange binary odd and even bits (macro Implementation)
for_each用法示例
实例搭建Flask服务(简易版)
Functions and comparison of repeaters, hubs, bridges, switches and routers
Ten thousand words detailed Google play online application standard package format AAB
无法一次粘贴多张图片
The difference between /g /m /i of JS regular expressions
Swing V2: towards a larger model with larger capacity and higher resolution
深入C语言(3)—— C的输入输出流