当前位置:网站首页>Hundreds of lines of code to implement a JSON parser
Hundreds of lines of code to implement a JSON parser
2022-06-30 16:06:00 【crossoverJie】

Preface
I was writing before gscript At that time, I was wondering whether there was a more practical tool based on the compilation principle ? After all, the difficulty of writing a language is not low , And it's hard to really apply it .
Once I saw someone mention JSON Parser , This kind of tool is full of our daily development , It's very widely used .
I have thought about how it is realized before , In the process, once it has something to do with the compilation principle, it can't help persuading it to retreat ; But after this period of practice, I found that the implementation of JSON The parser doesn't seem difficult either , Only some knowledge of the front end of the compilation principle is fully sufficient .
Thanks to the JSON Lightweight of , And the grammar is simple , So the core code probably uses only 800 Line then realizes a grammatically perfect JSON Parser .
<!--more-->
First, let's see the effect :
import "github.com/crossoverJie/gjson"func TestJson(t *testing.T) { str := `{ "glossary": { "title": "example glossary", "age":1, "long":99.99, "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML", true, null] }, "GlossSee": "markup" } } } }}` decode, err := gjson.Decode(str) assert.Nil(t, err) fmt.Println(decode) v := decode.(map[string]interface{}) glossary := v["glossary"].(map[string]interface{}) assert.Equal(t, glossary["title"], "example glossary") assert.Equal(t, glossary["age"], 1) assert.Equal(t, glossary["long"], 99.99) glossDiv := glossary["GlossDiv"].(map[string]interface{}) assert.Equal(t, glossDiv["title"], "S") glossList := glossDiv["GlossList"].(map[string]interface{}) glossEntry := glossList["GlossEntry"].(map[string]interface{}) assert.Equal(t, glossEntry["ID"], "SGML") assert.Equal(t, glossEntry["SortAs"], "SGML") assert.Equal(t, glossEntry["GlossTerm"], "Standard Generalized Markup Language") assert.Equal(t, glossEntry["Acronym"], "SGML") assert.Equal(t, glossEntry["Abbrev"], "ISO 8879:1986") glossDef := glossEntry["GlossDef"].(map[string]interface{}) assert.Equal(t, glossDef["para"], "A meta-markup language, used to create markup languages such as DocBook.") glossSeeAlso := glossDef["GlossSeeAlso"].(*[]interface{}) assert.Equal(t, (*glossSeeAlso)[0], "GML") assert.Equal(t, (*glossSeeAlso)[1], "XML") assert.Equal(t, (*glossSeeAlso)[2], true) assert.Equal(t, (*glossSeeAlso)[3], "") assert.Equal(t, glossEntry["GlossSee"], "markup")}From this use case, you can see the support string 、 Boolean value 、 floating-point 、 plastic 、 Arrays and various nested relationships .
Realization principle

Here is a brief description of the implementation principle , It's essentially two steps :
- lexing : According to the original input
JSONThe string resolves to token, Which is similar to"{" "obj" "age" "1" "[" "]"Such identifiers , Just sort these identifiers . - According to the generated set
tokenaggregate , Read as a stream , Finally, the tree structure in the diagram can be generated , That's oneJSONObject.
Let's focus on what these two steps have done .
Lexical analysis
BeginObject {String "name"SepColon :String "cj"SepComma ,String "object"SepColon :BeginObject {String "age"SepColon :Number 10SepComma ,String "sex"SepColon :String "girl"EndObject }SepComma ,String "list"SepColon :BeginArray [ In fact, lexical parsing is the process of constructing a finite automaton (DFA), The goal is to generate such a collection (token), We just need to put these token It is classified so that it can be processed during subsequent parsing .
such as "{" Such a left curly bracket is a BeginObject Represents the beginning of an object declaration , and "}" It is EndObject Represents the end of an object .
among "name" Such is considered to be String character string , And so on "[" representative BeginArray
Here, I define the following categories token type :
type Token stringconst ( Init Token = "Init" BeginObject = "BeginObject" EndObject = "EndObject" BeginArray = "BeginArray" EndArray = "EndArray" Null = "Null" Null1 = "Null1" Null2 = "Null2" Null3 = "Null3" Number = "Number" Float = "Float" BeginString = "BeginString" EndString = "EndString" String = "String" True = "True" True1 = "True1" True2 = "True2" True3 = "True3" False = "False" False1 = "False1" False2 = "False2" False3 = "False3" False4 = "False4" // SepColon : SepColon = "SepColon" // SepComma , SepComma = "SepComma" EndJson = "EndJson")> You can see true/false/null There will be multiple types , Ignore this first , I'll explain later .
With this paragraph JSON For example :{"age":1}, Its state is reversed as shown in the figure below : 
In general, it is to traverse the string in turn , Then update a global status , Different operations are performed according to the value of this state .
Part of the code is as follows : 

Interested friends can run alone debug It's easy to understand at once :
https://github.com/crossoverJie/gjson/blob/main/token_test.go
With this paragraph JSON For example :
func TestInitStatus(t *testing.T) { str := `{"name":"cj", "age":10}` tokenize, err := Tokenize(str) assert.Nil(t, err) for _, tokenType := range tokenize { fmt.Printf("%s %s\n", tokenType.T, tokenType.Value) }} Finally generated token Set it up as follows :
BeginObject {String "name"SepColon :String "cj"SepComma ,String "age"SepColon :Number 10EndObject }Check in advance
because JSON The grammar of is simple , Some rules can even be checked in lexical rules .
for instance : JSON It is allowed that null value , When we have... In the string nu nul Such mismatches null The value of , You can throw an exception in advance .
For example, when the first string is detected as n when , The following must be u->l->l Or throw an exception .
The same goes for floating point numbers , When there are multiple... In a value . a.m. , You still need to throw an exception . 
This is also mentioned above true/false/null These types need to have multiple reasons for intermediate states .
Generate JSONObject Trees
In the discussion of generating JSONObject Before the tree, let's look at this problem , Given a set of parentheses , Judge whether it is legal .
[<()>]It's legal .[<()>)And this is illegal .
How to achieve it ? It's also very simple , Just use the stack to complete , As shown in the figure below :
Take advantage of stack features , Traverse the data in turn , If it is the symbol on the left, it will be put on the stack , When the right symbol is encountered, it matches the data at the top of the stack , If it can match, it will be out of the stack .
If there is no match, the format is wrong , After data traversal, if the stack is empty, the data is legal .
Actually, observe carefully JSON The grammar of is similar :
{ "name": "cj", "object": { "age": 10, "sex": "girl" }, "list": [ { "1": "a" }, { "2": "b" } ]}BeginObject:{ And EndObject:} It must be in pairs , The middle is also paired no matter how nested . And for "age":10 Data like this ,: There must also be data after the colon to match , Otherwise, it is an illegal format .
So based on the bracket matching principle just now , We can also use a similar method to parse token aggregate .
We also need to create a stack , When you meet BeginObject When you are in the stack Map, When I meet a String The value is also put on the stack .
When you meet value when , There will be a stack key, At the same time, write the data to the top of the current stack map in .
Of course it's traversing token A global state is also required in the process of , So here is also a Finite state machine .
for instance : When we traverse to Token The type is String, The value is "name" when , Expect the next token Should be : The colon ;
So we have to put the current status Record as StatusColon, Once it is subsequently resolved to token by SepColon when , You need to judge the current status Is it StatusColon , If not, it means that the syntax is wrong , You can throw an exception . 

At the same time, it is worth noting that status It's actually a aggregate , Because the next state may be multiple .
{"e":[1,[2,3],{"d":{"f":"f"}}]} For example, when we parse to a SepColon When colon , Subsequent states may be value or BeginObject { or BeginArray [
So here we have to take these three situations into account , And so on .
The specific parsing process can refer to the source code : https://github.com/crossoverJie/gjson/blob/main/parse.go
Although we can use a stack structure to JSON End of analysis , I don't know if you find a problem : It's very easy to miss the rules , For example, there are three cases after a colon just mentioned , And one BeginArray There are even four cases (StatusArrayValue, StatusBeginArray, StatusBeginObject, StatusEndArray)
Such code is not very intuitive to read , At the same time, it is easy to miss grammar , Only problems can be fixed .
Since the problem is mentioned, there are corresponding solutions , In fact, it is a common recursive descent algorithm in syntax analysis .
We just need to JSON The grammatical definition of , Write the algorithm recursively , This makes the code very clear to read , At the same time, the rules will not be missed .
complete JSON Syntax check here : https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4
I also expect to change the next version to the recursive descent algorithm .
summary
So far, it has only realized a very basic JSON analysis , There is no performance optimization , And the official JSON The package performance is not a bit poor .
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHzBenchmarkJsonDecode-12 372298 15506 ns/op 512 B/op 12 allocs/opBenchmarkDecode-12 141482 43516 ns/op 30589 B/op 962 allocs/opPASS At the same time, some basic functions have not been realized , For example, the parsed JSONObject Can reflect and generate custom Struct, And the support I ultimately want to achieve JSON Four operations of :
gjson.Get("glossary.age+long*(a.b+a.c)")At present, it seems that I haven't found any similar library to implement this function , It should be very interesting after it is really finished , Interested friends, please continue to pay attention to .
Source code : https://github.com/crossoverJie/gjson
边栏推荐
- Oracle 导出视图的创建语句
- CVPR 2022 - Tesla AI proposed: generalized pedestrian re recognition based on graph sampling depth metric learning
- Mysql事务/锁/日志总结
- Visualization of provincial GDP with CSV metabase processing
- 【Leetcode】链表排序(逐步提高时空复杂度)
- String common API
- What role does "low code" play in enterprise digital transformation?
- halcon变量窗口的图像变量不显示,重启软件和电脑都没用
- LeCun指明下一代AI方向:自主机器智能
- 360数科、蚂蚁集团等入选中国信通院“业务安全推进计划”成员单位
猜你喜欢

Policy Center > Malware > Malware
![[algorithm] after summarizing the four linked lists, I brushed two interview questions](/img/04/1843e01cc91cdf10ae3d3d6a4f1e05.png)
[algorithm] after summarizing the four linked lists, I brushed two interview questions

Model system: Sword (1)

ASP. Net core signalr tutorial

Summary of gradient descent optimizer (rmsprop, momentum, Adam)

Cesium-1.72 learning (deploy offline resources)

Management joint examination - mathematical formula

技不压身,快速入门ETH智能合约开发,带你进入ETH世界

Lecun points out the direction of next generation AI: autonomous machine intelligence

今晚19:00知识赋能第2期直播丨OpenHarmony智能家居项目之控制面板界面设计
随机推荐
Unsupported major.minor version 52.0
Policy Center-Permissions and APIs that Access Sensitive Information
【Unity UGUI】ScrollRect 动态缩放格子大小,自动定位到中间的格子
Go micro installation
《你的灯亮着吗》开始解决问题前,得先知道“真问题”是什么
[sub matrix quantity statistics] cf1181c flag sub matrix quantity statistics
Message queue ten questions
Simulate user login function
互联网研发效能之去哪儿网(Qunar)核心领域DevOps落地实践
Flask Sqlalchemy - automatically export the table model (flask sqlacodegen) & no single primary key problem ---orm (8)
数数据可视化实战案例(timeline轮播图,streamlit 控件年份 metabase可视化使用教程)2.0
大学生研究生毕业找工作,该选择哪个方向?
ASP. Send information in sinalr controller of net core
How to browse mobile web pages on your computer
Specific steps for installing mysql8.0 on Windows system
How the edge computing platform helps the development of the Internet of things
Three development trends of enterprise application viewed from the third technological revolution
[leetcode] linked list sorting (gradually increasing the space-time complexity)
How to get the preferential activities for stock account opening? Is online account opening safe?
Interview experience of service end test engineer