![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3m5tef1rzj218a0u07ju.jpg)# Before the preface [gscript](https://crossoverjie.top/2022/05/30/gscript/gscript01/) At that time, I was wondering whether there was a more practical tool based on the compilation principle ? After all, the difficulty of writing a language is not low , And it's hard to really apply it . Once I saw someone mention `JSON` Parser , This kind of tool is full of our daily development , It's very widely used . I have thought about how it is realized before , In the process, once it has something to do with the compilation principle, it can't help persuading it to retreat ; But after this period of practice, I found that the implementation of `JSON` The parser doesn't seem difficult either , Only some knowledge of the front end of the compilation principle is fully sufficient . Thanks to the `JSON` Lightweight of , And the grammar is simple , So the core code probably uses only 800 Line then realizes a grammatically perfect `JSON` Parser .<!--more--> First, let's see the effect :```goimport "github.com/crossoverJie/xjson"func TestJson(t *testing.T) { str := `{ "glossary": { "title": "example glossary", "age":1, "long":99.99, "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML", true, null] }, "GlossSee": "markup" } } } }}` decode, err := xjson.Decode(str) assert.Nil(t, err) fmt.Println(decode) v := decode.(map[string]interface{}) glossary := v["glossary"].(map[string]interface{}) assert.Equal(t, glossary["title"], "example glossary") assert.Equal(t, glossary["age"], 1) assert.Equal(t, glossary["long"], 99.99) glossDiv := glossary["GlossDiv"].(map[string]interface{}) assert.Equal(t, glossDiv["title"], "S") glossList := glossDiv["GlossList"].(map[string]interface{}) glossEntry := glossList["GlossEntry"].(map[string]interface{}) assert.Equal(t, glossEntry["ID"], "SGML") assert.Equal(t, glossEntry["SortAs"], "SGML") assert.Equal(t, glossEntry["GlossTerm"], "Standard Generalized Markup Language") assert.Equal(t, glossEntry["Acronym"], "SGML") assert.Equal(t, glossEntry["Abbrev"], "ISO 8879:1986") glossDef := glossEntry["GlossDef"].(map[string]interface{}) assert.Equal(t, glossDef["para"], "A meta-markup language, used to create markup languages such as DocBook.") glossSeeAlso := glossDef["GlossSeeAlso"].(*[]interface{}) assert.Equal(t, (*glossSeeAlso)[0], "GML") assert.Equal(t, (*glossSeeAlso)[1], "XML") assert.Equal(t, (*glossSeeAlso)[2], true) assert.Equal(t, (*glossSeeAlso)[3], "") assert.Equal(t, glossEntry["GlossSee"], "markup")}``` From this use case, you can see the support string 、 Boolean value 、 floating-point 、 plastic 、 Arrays and various nested relationships . # Realization principle ![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3mz1xo687j20iw0evt9f.jpg) Here is a brief description of the implementation principle , It's essentially two steps :1. ** lexing **: According to the original input `JSON` The string resolves to token, Which is similar to `"{" "obj" "age" "1" "[" "]"` Such identifiers , Just sort these identifiers .2. According to the generated set `token` aggregate , Read as a stream , Finally, the tree structure in the diagram can be generated , That's one `JSONObject` . Let's focus on what these two steps have done .## Lexical analysis ```shellBeginObject {String "name"SepColon :String "cj"SepComma ,String "object"SepColon :BeginObject {String "age"SepColon :Number 10SepComma ,String "sex"SepColon :String "girl"EndObject }SepComma ,String "list"SepColon :BeginArray [``` In fact, lexical parsing is the process of constructing a finite automaton (`DFA`), The goal is to generate such a collection (token), We just need to put these token It is classified so that it can be processed during subsequent parsing . such as `"{"` Such a left curly bracket is a `BeginObject` Represents the beginning of an object declaration , and `"}"` It is `EndObject` Represents the end of an object . among `"name"` Such is considered to be `String` character string , And so on `"["` representative `BeginArray` Here, I define the following categories token type :```gotype Token stringconst ( Init Token = "Init" BeginObject = "BeginObject" EndObject = "EndObject" BeginArray = "BeginArray" EndArray = "EndArray" Null = "Null" Null1 = "Null1" Null2 = "Null2" Null3 = "Null3" Number = "Number" Float = "Float" BeginString = "BeginString" EndString = "EndString" String = "String" True = "True" True1 = "True1" True2 = "True2" True3 = "True3" False = "False" False1 = "False1" False2 = "False2" False3 = "False3" False4 = "False4" // SepColon : SepColon = "SepColon" // SepComma , SepComma = "SepComma" EndJson = "EndJson")```> You can see true/false/null There will be multiple types , Ignore this first , I'll explain later . With this paragraph `JSON` For example :`{"age":1}`, Its state is reversed as shown in the figure below :![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n83xkv82j21360i0t9r.jpg) In general, it is to traverse the string in turn , Then update a global status , Different operations are performed according to the value of this state . Part of the code is as follows :![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n866cfqfj20u01jyjxt.jpg)![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n86kgy9qj20u012kadx.jpg) Interested friends can run alone debug It's easy to understand at once :[https://github.com/crossoverJie/xjson/blob/main/token_test.go](https://github.com/crossoverJie/xjson/blob/main/token_test.go) With this paragraph JSON For example :```gofunc TestInitStatus(t *testing.T) { str := `{"name":"cj", "age":10}` tokenize, err := Tokenize(str) assert.Nil(t, err) for _, tokenType := range tokenize { fmt.Printf("%s %s\n", tokenType.T, tokenType.Value) }}``` Finally generated `token` Set it up as follows :```shellBeginObject {String "name"SepColon :String "cj"SepComma ,String "age"SepColon :Number 10EndObject }```### Check in advance due to `JSON` The grammar of is simple , Some rules can even be checked in lexical rules . for instance :`JSON` It is allowed that `null` value , When we have... In the string `nu nul` Such mismatches `null` The value of , You can throw an exception in advance .![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n8dt7d1aj211w0tigpa.jpg) For example, when the first string is detected as n when , The following must be `u->l->l` Or throw an exception . The same goes for floating point numbers , When there are multiple... In a value . a.m. , You still need to throw an exception .![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n8fk2xsbj20u010hgp2.jpg) This is also mentioned above `true/false/null` These types need to have multiple reasons for intermediate states .## Generate JSONObject Tree is discussing generation `JSONObject` Before the tree, let's look at this problem , Given a set of parentheses , Judge whether it is legal .- `[<()>]` It's legal .- `[<()>)` And this is illegal . How to achieve it ? It's also very simple , Just use the stack to complete , As shown in the figure below :![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n8u1brbbj216u0rsgp1.jpg) Take advantage of stack features , Traverse the data in turn , If it is the symbol on the left, it will be put on the stack , When the right symbol is encountered, it matches the data at the top of the stack , If it can match, it will be out of the stack . If there is no match, the format is wrong , After data traversal, if the stack is empty, the data is legal . Actually, observe carefully `JSON` The grammar of is similar :```{ "name": "cj", "object": { "age": 10, "sex": "girl" }, "list": [ { "1": "a" }, { "2": "b" } ]}````BeginObject:{` And `EndObject:}` It must be in pairs , The middle is also paired no matter how nested . And for `"age":10` Data like this ,: There must also be data after the colon to match , Otherwise, it is an illegal format . So based on the bracket matching principle just now , We can also use a similar method to parse `token` aggregate . We also need to create a stack , When you meet `BeginObject` When you are in the stack Map, When I meet a `String` The value is also put on the stack . When you meet `value` when , There will be a stack `key`, At the same time, write the data to the top of the current stack `map` in . Of course it's traversing `token` A global state is also required in the process of , So here is also a ** Finite state machine **.--- for instance : When we traverse to `Token` The type is `String`, The value is `"name"` when , Expect the next `token` Should be : The colon ; So we have to put the current status Record as `StatusColon`, Once it is subsequently resolved to token by `SepColon` when , You need to judge the current status Is it `StatusColon` , If not, it means that the syntax is wrong , You can throw an exception .![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n9ilzooqj20u00u442z.jpg)![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n9j1iex0j21bc09q0uj.jpg) At the same time, it is worth noting that `status` It's actually a ` aggregate `, Because the next state may be multiple .`{"e":[1,[2,3],{"d":{"f":"f"}}]}` For example, when we parse to a `SepColon` When colon , Subsequent states may be `value` or `BeginObject {` or `BeginArray [`![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n9pwpspuj21hg09q76e.jpg) So here we have to take these three situations into account , And so on . The specific parsing process can refer to the source code :[https://github.com/crossoverJie/xjson/blob/main/parse.go](https://github.com/crossoverJie/xjson/blob/main/parse.go)--- Although we can use a stack structure to `JSON` End of analysis , I don't know if you find a problem : It's very easy to miss the rules , For example, there are three cases after a colon just mentioned , And one `BeginArray` There are even four cases (`StatusArrayValue, StatusBeginArray, StatusBeginObject, StatusEndArray`) Such code is not very intuitive to read , At the same time, it is easy to miss grammar , Only problems can be fixed . Since the problem is mentioned, there are corresponding solutions , In fact, it is a common recursive descent algorithm in syntax analysis .![](https://tva1.sinaimg.cn/large/e6c9d24ely1h3n9vzwepij20u01ecwgy.jpg) We just need to `JSON` The grammatical definition of , Write the algorithm recursively , This makes the code very clear to read , At the same time, the rules will not be missed . complete `JSON` Syntax check here :[https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4](https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4) I also expect to change the next version to the recursive descent algorithm .# In conclusion, so far, it has only achieved a very basic `JSON` analysis , There is no performance optimization , And the official `JSON` The package performance is not a bit poor .```shellcpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHzBenchmarkJsonDecode-12 372298 15506 ns/op 512 B/op 12 allocs/opBenchmarkDecode-12 141482 43516 ns/op 30589 B/op 962 allocs/opPASS``` At the same time, some basic functions have not been realized , For example, the parsed `JSONObject` Can reflect and generate custom `Struct`, And the support I ultimately want to achieve `JSON` Four operations of :```jsonxjson.Get("glossary.age+long*(a.b+a.c)")``` At present, it seems that I haven't found any similar library to implement this function , It should be very interesting after it is really finished , Interested friends, please continue to pay attention to . Source code :[https://github.com/crossoverJie/xjson](https://github.com/crossoverJie/xjson)
原网站版权声明
本文为[Golang Chinese community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060035360975.html