当前位置:网站首页>JSON Library Tutorial from scratch (III): parsing strings, learning and sorting notes
JSON Library Tutorial from scratch (III): parsing strings, learning and sorting notes
2022-06-25 05:17:00 【zhongh58】
Original website : From scratch JSON Library Tutorial - You know
1. JSON String Syntax
JSON String Syntax and C The language is very similar , The characters are enclosed in double quotation marks , Such as "Hello". But the string is separated by double quotation marks , So how can I insert a double quotation mark into a string ? hold a"b It's written in "a"b" Definitely not , I don't know where the string ends .
therefore , We need to introduce escape characters (escape character),C Language and JSON All use \( Backslash ) As escape character , that " In the string, it is expressed as \",a"b Of JSON The string is written as "a\"b". As shown in the following String Syntax ,JSON In support of 9 Escape sequence :
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFFA simple translation ,JSON A string is composed of two double quotation marks with zero or more characters . Character separation no escape character or escape sequence . The escape sequence has 9 Kind of , It all starts with a backslash , As is common \n Represents a newline character . What's more special is \uXXXX, among XXXX by 16 Carry on UTF-16 code , This unit will not handle this escape sequence , Let's break it down later .
Non escape characters are ordinary characters , The syntax lists the legal code point range ( The code point is introduced in the next unit ). It should be noted that , This scope does not include 0 to 31、 Double quotes and backslashes , These code points must be expressed by escape .
2. String representation
stay C In language , Strings are generally represented as empty terminated strings (null-terminated string), Empty character ('\0') Represents the end of the string . However ,JSON Strings are allowed to contain empty characters , For example, this JSON "Hello\u0000World" Is a single string , After parsing, it is 11 Characters . If you simply use a null terminated character to indicate JSON Results after analysis , Can't handle empty characters .
therefore , We can allocate memory to store the parsed characters , And the number of recording characters ( The length of the string ). Because most C Programs assume that a string is an empty terminated string , Let's add an empty character at the end , Then there is no need to deal with \u0000 The application of this character can simply regard it as a null terminated string .
lept_value In fact, it is a variant type (variant type), We go through type To determine what type it is now , This also determines which members are valid . First, we simply add two members to the structure :
typedef struct {
char* s;
size_t len;
double n;
lept_type type;
}lept_value;We know , A value cannot be both a number and a string , So we can use C Linguistic union To save memory :
typedef struct {
union {
struct { char* s; size_t len; }s; /* string */
double n; /* number */
}u;
lept_type type;
}lept_value;These two designs are in 32 The memory layout of bit platform is as follows , It can be seen that... Is used on the right union Can save memory .

We're going to take the previous v->n Change to v->u.n. To access the string data , Use v->u.s.s and v->u.s.len. This kind of writing is troublesome , Actually C11 New anonymous struct/union grammar , You can use v->n、v->s、v->len To make an interview .
3. memory management
Because the length of the string is not fixed , We need to allocate memory dynamically . For the sake of simplicity , We use the standard library <stdlib.h> Medium malloc()、realloc() and free() To allocate / Free memory .
When setting a value as a string , We need to make a copy of the string in the parameter :
void lept_set_string(lept_value* v, const char* s, size_t len) {
assert(v != NULL && (s != NULL || len == 0));
lept_free(v);
v->u.s.s = (char*)malloc(len + 1);
memcpy(v->u.s.s, s, len);
v->u.s.s[len] = '\0';
v->u.s.len = len;
v->type = LEPT_STRING;
}Setting this v Before , We need to call lept_free(v) Go empty v Memory that may be allocated . For example, there is already a string , We have to release it first . Then simply use malloc() Distribution and use memcpy() Copy , And fill in the blank character at the end .malloc(len + 1) Medium 1 Because the end is empty .
void lept_free(lept_value* v) {
assert(v != NULL);
if (v->type == LEPT_STRING)
free(v->u.s.s);
v->type = LEPT_NULL;
}Because we will check v The type of , Before calling all access functions , We must initialize this type . So we joined in lept_init(v), Because it is very simple, we use macros to realize :
#define lept_init(v) do { (v)->type = LEPT_NULL; } while(0)4. Buffer and stack
We parse the string ( And then the array 、 object ) when , You need to store the parsing results in a temporary buffer , Last but not least lept_set_string() Set the result of the buffer into the value . Before you finish parsing a string , The size of this buffer is unpredictable . therefore , We can use dynamic arrays (dynamic array) This data structure , That is, when the array space is insufficient , Can automatically expand .C++ Standard library std::vector It is also a dynamic array .
If every time you parse a string , Re create a dynamic array , It is time-consuming . We can reuse this dynamic array , Every time you parse JSON You just need to create one . And we will find , Whether it's parsing strings 、 Array or object , We only need to access this dynamic array in a first in, last out way . let me put it another way , We need a dynamic stack data structure .
We put data from a dynamic stack into lept_context in :
typedef struct {
const char* json;
char* stack;
size_t size, top;
}lept_context;among size Is the current stack capacity ,top Is the top of the stack ( Because we will expand stack, So don't put top Store in pointer form ).
We're creating lept_context Initialization when stack And finally free memory :
int lept_parse(lept_value* v, const char* json) {
lept_context c;
int ret;
assert(v != NULL);
c.json = json;
c.stack = NULL; /* <- */
c.size = c.top = 0; /* <- */
lept_init(v);
lept_parse_whitespace(&c);
if ((ret = lept_parse_value(&c, v)) == LEPT_PARSE_OK) {
/* ... */
}
assert(c.top == 0); /* <- */
free(c.stack); /* <- */
return ret;
}On release , Add assertions to ensure that all data is ejected .
then , We implement stack push and pop operations . It is different from the normal stack , Our stack is stored in bytes . Data of any size can be pressed every time , It returns a pointer to the beginning of the data ( Meeting C++ You can refer to [1]):
#ifndef LEPT_PARSE_STACK_INIT_SIZE
#define LEPT_PARSE_STACK_INIT_SIZE 256
#endif
static void* lept_context_push(lept_context* c, size_t size) {
void* ret;
assert(size > 0);
if (c->top + size >= c->size) {
if (c->size == 0)
c->size = LEPT_PARSE_STACK_INIT_SIZE;
while (c->top + size >= c->size)
c->size += c->size >> 1; /* c->size * 1.5 */
c->stack = (char*)realloc(c->stack, c->size);
}
ret = c->stack + c->top;
c->top += size;
return ret;
}
static void* lept_context_pop(lept_context* c, size_t size) {
assert(c->top >= size);
return c->stack + (c->top -= size);
}5. Parse string
With the above tools , The task of parsing strings becomes very simple . We just need to back up the top of the stack first , Then put the parsed characters on the stack , Finally, calculate the length and pop up all the characters at once , Then set it to the value . The following is a partial implementation , The escape and some illegal character verification are not handled .
#define PUTC(c, ch) do { *(char*)lept_context_push(c, sizeof(char)) = (ch); } while(0)
static int lept_parse_string(lept_context* c, lept_value* v) {
size_t head = c->top, len;
const char* p;
EXPECT(c, '\"');
p = c->json;
for (;;) {
char ch = *p++;
switch (ch) {
case '\"':
len = c->top - head;
lept_set_string(v, (const char*)lept_context_pop(c, len), len);
c->json = p;
return LEPT_PARSE_OK;
case '\0':
c->top = head;
return LEPT_PARSE_MISS_QUOTATION_MARK;
default:
PUTC(c, ch);
}
}
}about *p++, to *p, After the operation, proceed to ++ operation .
边栏推荐
- Svg code snippet of loading animation
- How to download and use Xiaobai one click reload on the official website
- Create an environment for new projects
- How micro engine uploads remote attachments
- Deeply understand the characteristics of standard flow and off standard elements
- Customize the console plot result style
- Two dimensional array and function call cases of C language
- Various pits encountered in the configuration of yolov3 on win10
- [relax's law of life lying on the square] those poisonous chicken soup that seem to be too light and too heavy, but think carefully and fear
- Working principle of asemi three-phase rectifier bridge
猜你喜欢

Enhanced paste quill editor

Use serialize in egg to read and write split tables

Specific operations for uploading pictures in PHP

Small sample learning data set

CTFHUB SSRF

Summary of SQL injection (I)

Dynamic programming example 2 leetcode62 unique paths

buuctf web

How to use the Magic pig system reinstallation master

API interface management setup -eolinker4.0
随机推荐
Penetration information collection steps (simplified version)
Vue uses keep alive to cache page optimization projects
Database overview
Database query optimization method
CTFHub-rce
XSS (cross site script attack) summary (II)
Critical dependency: require function is used in a way in which dependencies
Compatible with Internet Explorer
Specific operations for uploading pictures in PHP
For in JS Of and for in
hr竟主动给这位测试小姐姐涨工资,她是怎么做到的?
Visual studio 2022 interface beautification tutorial
A review of small sample learning
JS function to realize simple calculator
Array: force deduction dichotomy
Understand JS high-order function and write a high-order function
Activereportsjs V3.0 comes on stage
2021-10-24
[relax's law of life lying on the square] those poisonous chicken soup that seem to be too light and too heavy, but think carefully and fear
Precise delay based on Cortex-M3 and M4 (systick delay of system timer can be used for STM32, aducm4050, etc.)