当前位置:网站首页>Opengauss kernel: analysis of SQL parsing process
Opengauss kernel: analysis of SQL parsing process
2022-06-28 18:50:00 【Huawei cloud developer Alliance】
Abstract : In a traditional database SQL The engine generally refers to the SQL Statement parsing 、 Optimized software modules .SQL The parsing process of is mainly divided into : morphology 、 Grammar and semantic analysis .
This article is shared from Huawei cloud community 《 openGauss Kernel analysis ( 3、 ... and ):SQL analysis 》, author :Gauss Squirrel Club .
In a traditional database SQL The engine generally refers to the SQL Statement parsing 、 Optimized software modules .
SQL The parsing process of is mainly divided into :
• Lexical analysis : Enter the SQL Statements are broken down into words (Token) Sequence , And identify the key words 、 identification 、 Constant etc. .
• Syntax analysis : The word parsed by the lexical analyzer (Token) Whether the sequence satisfies the syntax SQL Rule of grammar .
• Semantic analysis : Semantic analysis is SQL A logical phase of the parsing process , The main task is to examine the nature of context on the basis of correct grammar , stay SQL This stage of the parsing process completes the table name 、 The operator 、 Type and other elements , At the same time, semantic ambiguity is detected .
openGauss stay pg_parse_query Call in raw_parser Function on user input SQL Command for lexical analysis and syntax analysis , Generate a syntax tree and add it to the linked list parsetree_list in . After parsing , about parsetree_list Every syntax tree in parsetree, Would call parse_**yze Function for semantic analysis , according to SQL Different orders , Execute the corresponding entry function , Finally, the query tree is generated .

Lexical analysis
openGauss Use flex Tools for lexical analysis .flex The tool compiles the defined lexical file , Generate lexical analysis code . The lexical file is scan.l, It is based on SQL The language standard is right SQL Keywords in language 、 identifier 、 The operator 、 Constant 、 The terminator is defined and identified . stay kwlist.h A large number of keywords are defined in , In alphabetical order , It is convenient to find keywords by dichotomy . stay scan.l In dealing with “ identifier ” when , Will be matched in the keyword list , If an identifier matches a keyword , It is considered to be a keyword , Otherwise, it is the identifier , Keyword first . With “select a, b from item” As an example to illustrate the results of lexical analysis .

Syntax analysis
openGauss It defines bison Syntax files recognized by the tool gram.y, according to SQL Different languages define a series of expressions Statement The structure of the body ( These structures are usually Stmt As a naming suffix ), Used to save parsing results . With SELECT For example, query , Its corresponding Statement The structure is as follows .
typedef struct SelectStmt
{
NodeTag type;
List *distinctClause; /* NULL, list of DISTINCT ON exprs, or
* lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* the target list (of ResTarget) */
List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
WithClause *withClause; /* WITH clause */
List *valuesLists; /* untransformed list of expression lists */
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
……
} SelectStmt;This structure can be regarded as a multi fork tree , Each leaf node expresses SELECT A syntax structure in a query statement , Corresponding to gram.y in , It will have a SelectStmt. The code is as follows :

from simple_select The grammatical structure shows that , A simple query statement consists of the following clauses : Remove line duplicates distinctClause、 Target properties targetList、SELECT INTO Clause intoClause、FROM Clause fromClause、WHERE Clause whereClause、GROUP BY Clause groupClause、HAVING Clause havingClause、 Window clause windowClause and plan_hint Clause . After successful matching simple_select After the grammatical structure , Will create a Statement Structure , Assign corresponding values to each clause . Yes simple_select for , Target properties 、FROM Clause 、WHERE Clause is the most important part .SelectStmt The relationship with other structures is as follows :

Let's say “select a, b from item” As an example, the description is simple select Statement parsing process , function exec_simple_query call pg_parse_query Execution Analysis , There is only one element in the parse tree .

(gdb) p *parsetree_list
$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}List The node type in is T_SelectStmt.
(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)
$45 = {type = T_SelectStmt}see SelectStmt Structure ,targetList and fromClause Non empty .
(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,
havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,
larg = 0x0, rarg = 0x0, hasPlus = false}see SelectStmt Of targetlist, There are two ResTarget.
(gdb) p *($stmt->targetList)
$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)
$57 = {type = T_ResTarget}
(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget1->val
$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((ColumnRef *)$restarget1->val)->fields
$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}
(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget2->val
$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((ColumnRef *)$restarget2->val)->fields
$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}}see SelectStmt Of fromClause, There is one RangeVar.
(gdb) p *$stmt->fromClause
$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,
location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}From the above analysis, we can get the syntax tree structure .

Semantic analysis
After lexical analysis and grammar analysis ,parse_Ana lyze The function will be based on the type of syntax tree , call transformSelectStmt take parseTree Rewrite to query tree .

(gdb) p *result
$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,
hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,
jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
(gdb) p *result->targetList
$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)
$8 = {type = T_TargetEntry}
(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)
$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)
$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb) p *(Node *)(result->rtable->head.data->ptr_value)
$15 = {type = T_RangeTblEntry}
(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)
$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}The resulting query tree structure is as follows :

Complete morphology 、 After grammatical and semantic analysis ,SQL The parsing process is complete ,SQL The engine starts to perform query optimization , It will be analyzed in the next issue .
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- Native implementation Net5.0+ custom log
- Huawei cloud AOM released version 2.0, and three features appeared
- Can I open an account today and buy shares today? Is it safe to open an account online?
- Seata implementation of sharing JDBC distributed transaction
- Konva series tutorial 3: Customizing drawings
- Anonymous function this pointing and variable promotion
- C语言文件操作
- postgresql数据库docker
- PMP怎么补考?补考费用是多少?
- 原生实现.NET 5.0+ 自定义日志
猜你喜欢
随机推荐
模块化操作
Memory leak
What are the design requirements for PCB layout and wiring?
use. NETCORE's own background job, which simply simulates producers and consumers' processing of request response data in and out of the queue
An in-depth analysis of the election mechanism in kubernetes
About Statistical Distributions
用户网络模型与QoE
内存抖动
抗兔Dylight 488丨Abbkine通用型免疫荧光(IF)工具箱
use. NETCORE's own background job, which simply simulates producers and consumers' processing of request response data in and out of the queue
OOM out of memory 内存溢出
PHP使用栈解决迷宫问题
postgresql数据库docker
C# 41. int与string互转
几行代码就能实现复杂的 Excel 导入导出,这个工具类真心强大!
id门禁卡复制到手机_怎么把手机变成门禁卡 手机NFC复制门禁卡图文教程
leetcode 1689. Partitioning Into Minimum Number Of Deci-Binary Numbers(最少的“二进制数“个数)
Advanced technology management - how managers communicate performance and control risks
【Unity3D】发射(RayCast)物理射线(Ray)
PCB线路板布局和布线都有哪些设计要求?









