当前位置:网站首页>Write a jison parser from scratch (4/10): detailed explanation of the syntax format of the jison parser generator

Write a jison parser from scratch (4/10): detailed explanation of the syntax format of the jison parser generator

2022-07-04 09:28:00 Hu Zhenghui

Write one from scratch Jison Parser (4/10):Jison Parser generator syntax format details

  1. Write one from scratch Jison Parser (1/10):Jison, No Json

  2. Write one from scratch Jison Parser (2/10): Learn parser generator parser generator The right posture

  3. Write one from scratch Jison Parser (3/10): A good beginning is half the battle ——《 political science 》 ( Aristotle )

  4. Write one from scratch Jison Parser (4/10):Jison Parser generator syntax format details

Parser parsed by line

The parser parsed by line has been successfully run previously , This article takes this as an example to explain in detail Jison Syntax format of parser generator .

%lex

%%

[^\n\r]+    {
     return 'LINE'; }
[\n\r]+     {
     return 'EOL'; }
<<EOF>>     {
     return 'EOF'; }

/lex

%%

p
    : ll EOF
        {
     console.log($1); }
    ;

ll
    : ll l
        {
     $$ = $1 + $2; }
    | l
        {
     $$ = $1; }
    ;

l
    : LINE EOL
        {
     $$ = $1 + $2; }
    ;

Jison Syntax structure of parser generator

Jison Is a widely used parser generator Bison Of JavaScript transplant , except Bison The semantics of static types in and are specific to C Outside the concept of language ,Jison and Bison The concepts in are basically the same .

although Jison Its name is Bison The first letter of B Change it to JavaScript Of J, However, it is not a simple transplant , It also includes Bison And used together Flex The function of .

Use Flex and Bison The process of parsing includes two parts : Marker interpretation tokenizing And parsing parsing, Marker interpretation tokenizing Use Flex, This process is also called lexical analysis lexical analysis, and Bison Responsible for parsing parsing, This process is also called semantic analysis semantic analyse.

Describe parsing parsing The way is called formal grammar formal grammar, Follow formal grammar formal grammar The language of is called formal language formal language, Formal grammar formal grammar According to Chomsky pedigree, there are four kinds :

  • Unrestricted grammar unrestricted grammar
  • Context sensitive grammar context-sensitive grammar
  • Context-free grammar context-free grammar
  • Regular grammar regular grammar

Jison Syntactic analysis of parsing And Bison Context free grammar is also used context-free grammar.

Jison The grammar file of can analyze the morphology lexical analysis And semantic analysis semantic analyse Divided into two files , You can also include these two parts in one file . When using a file , Use %lex and /lex Mark distinguishing lexical analysis lexical analysis Part and semantic analysis semantic analyse part , %lex and /lex In the tag lex That is to say lexical Abbreviation .

Refer to the previous grammar , from %lex Start to /lex It's lexical analysis lexical analysis,/lex To the end is parsing parsing.

Jison Lexical analysis in grammar lexical analysis part

Lexical analysis lexical analysis Use in %% In two parts ,%% Before that, lexical analysis lexical analysis The definition of definitions part ,%% Then lexical analysis lexical analysis The rules of rules part , In this example, only rules rules part , Definition definitions There is no content in this part . The rules rules The section contains a series of rules in the following form rules

pattern action

among pattern For this rule rules Matching patterns ,action Action performed when matching patterns .pattern Based on regular expressions , And add some extended functions . Lexical analysis in this example lexical analysis The first two rules of rules The pattern in [^\n\r]+ and [\n\r]+ Regular expression .

[^\n\r]+    { return 'LINE'; }
[\n\r]+     { return 'EOL'; }

Lexical analysis in this example lexical analysis The third rule of rules Medium <<EOF>> That is, the pattern at the end of the matching file .

<<EOF>>     { return 'EOF'; }

Three rules in this example rules Action in action Are all direct return tags token The name of , Respectively LINEEOLEOF.Jison Lexical analysis in lexical analysis The action of action Use JavaScript grammar .

Jison Semantic analysis in grammar semantic analyse part

Semantic analysis semantic analyse Also used in %% In two parts ,%% Before that, semantic analysis semantic analyse Statement of declarations part ,%% Then semantic analysis semantic analyse Grammatical rules grammar rules part , In this example, only syntax rules are included grammar rules part , Statement declarations There is no content in this part .

Semantic analysis semantic analyse Grammar rules in grammar rules Compare lexical analysis lexical analysis The rules in the rules More complicated , The form of foundation is

result	: components...
				;

among components For grammar rules grammar rules Component part ,result To match grammar rules grammar rules Result . The third syntax rule in this example grammar rules This is the form .

l
    : LINE EOL
    ;

among LINE EOL Indicates the syntax rule grammar rules You need to match the two components LINE and EOL, When matching, you can get the result l, Or say LINE and EOL Combine into l.

LINE and EOL The space in the middle is used to split LINE and EOL, Does not participate in matching , Therefore, you can add spaces as needed .

Rule of grammar grammar rules You can also set the action to be performed when matching action, action action Put it in the component component after , The third syntax rule in this example grammar rules The action of action by { $$ = $1 + $2; }.

l
    : LINE EOL
        {
     $$ = $1 + $2; }
    ;

Jison Grammar rules in grammar rules The action of action Use JavaScript grammar . In this case $1 The value of the variable is match LINE Value , That is, lexical analysis lexical analysis Back in LINE The pattern in the rule pattern Defined [^\n\r]+ Matching string . In this case $2 The value of the variable is match EOL Value , That is, lexical analysis lexical analysis Back in EOL The pattern in the rule pattern Defined [\n\r]+ Matching string . Assign a value to a variable $$ Namely LINE and EOL Combine into l Value , In this case $1 and $2 String simple connection , In practice, it can be handled on demand , There are more complex examples in the following articles .

Contains multiple rules rule Grammatical rules grammar rules

Rule of grammar grammar rules You can also match multiple situations to get the same result , Pipe symbols are used between multiple rules | Connect , In the form of

result	: rule1-components...
        | rule2-components...
        ...
        ;

The second syntax rule in this example grammar rules This is the form .

ll
    : ll l
    | l
    ;

The first rule ll l Indicates the syntax rule grammar rules Need to match two parts ll and l, The second rule l Indicates the syntax rule grammar rules Need to match a part l. among l The third grammatical rule explained above grammar rules Definition , From this grammatical rule grammar rules It can be seen that , The components can not only be lexical analysis lexical analysis Tags in token, It can also be grammatical rules grammar rules Rules defined in .

Recursive rule recursive rule

The components are grammatical rules grammar rules Rules defined in , It can not only be other grammatical rules grammar rules Result , For example, the second rule l, It can also be itself , For example, the first rule is ll and ·l, This rule contains its own grammatical rules grammar rules There is a special name ———— Recursive rule recursive rule.

The recursive rule in this example recursive rule It's the simplest form , First match the second rule l, The result is ll, then ll And the next l Match the first rule ll l, The result is still ll, At this time ll It's equivalent to two l, And so on .

When grammar rules grammar rules When there are multiple rules in , Each rule can set actions action, action action Then the corresponding rules , In this example, the two rules set actions { $$ = $1 + $2; } and { $$ = $1; }.

ll
    : ll l
        {
     $$ = $1 + $2; }
    | l
        {
     $$ = $1; }
    ;

These two actions action The syntax of is the same as before , Because it is a recursive rule recursive rule, action action Also corresponding recursive execution , For example, match the second rule at the beginning l, Executive action { $$ = $1; }, then ll And the next l Match the first rule ll l, Executive action { $$ = $1 + $2; }, Among them $1 That is to match the second rule at the beginning l Value , It is equivalent to two matching l The corresponding string connection , And so on . The result of the two actions in this example is to combine strings in the matching order .

action action The code in can be written freely , For example, the action of the first rule { $$ = $1 + $2; } It is amended as follows { $$ = $2 + $1; } That is, connect the strings in reverse order , There are more complex examples in the later articles .

Because of recursive rules recursive rule The result can match itself , So we are using recursive rules recursive rule when , To avoid ambiguity in matching , It's best to use recursion rules recursive rule The results of are combined into mismatched recursive rules recursive rule Reuse later , For example, the first syntax rule in this example grammar rules Lieutenant general ll and EOF Combine into p, ll and EOF The combination of does not match ll, Use this way p You can avoid using it directly ll Resulting ambiguity .

p
    : ll EOF
    ;

because p It's a grammar rule grammar rules The last combination in , In this case, directly in p Output in the action of .

p
    : ll EOF
        {
     console.log($1); }
    ;

action action The grammar of is javascript, Although it matches here EOF, But in action action Not used $2, Nor is it assigned to $$. action action The program in can be written freely , The following articles will explain the design points in detail when explaining more complex examples .

Jison Summary of grammatical structure

Combine the above ,Jison The structure of the file is

/*  Lexical analysis  lexical analysis  Start  */

%lex

/*  Lexical analysis  lexical analysis  Defined in the  definition */

%%
  
/*  Lexical analysis  lexical analysis  The rules in the  rule */
  
/*  Every rule  rule  Include patterns  pattern  And the action  action */
  
pattern1 action1
pattern2 action2
pattern3 action3

/*  Lexical analysis  lexical analysis  end , Semantic analysis  semantic analyse  Start  */

/lex

/*  Semantic analysis  semantic analyse  In a statement  declarations */

%%
  
/*  Semantic analysis  semantic analyse  Grammar rules in  grammar rules */
  
/*  Every grammatical rule  grammar rule  You can define one or more rules , Each rule has several components , Each rule can define corresponding actions  action */
  
result1	: rule11-components...
        | rule12-components...
        ...
        ;

result2	: rule21-components...
        | rule22-components...
        ...
        ;
        
result3	: rule31-components...
        | rule32-components...
        ...
        ;
原网站

版权声明
本文为[Hu Zhenghui]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202141426162375.html

随机推荐