当前位置:网站首页>Opengauss kernel: analysis of SQL parsing process
Opengauss kernel: analysis of SQL parsing process
2022-06-28 18:50:00 【Huawei cloud developer Alliance】
Abstract : In a traditional database SQL The engine generally refers to the SQL Statement parsing 、 Optimized software modules .SQL The parsing process of is mainly divided into : morphology 、 Grammar and semantic analysis .
This article is shared from Huawei cloud community 《 openGauss Kernel analysis ( 3、 ... and ):SQL analysis 》, author :Gauss Squirrel Club .
In a traditional database SQL The engine generally refers to the SQL Statement parsing 、 Optimized software modules .
SQL The parsing process of is mainly divided into :
• Lexical analysis : Enter the SQL Statements are broken down into words (Token) Sequence , And identify the key words 、 identification 、 Constant etc. .
• Syntax analysis : The word parsed by the lexical analyzer (Token) Whether the sequence satisfies the syntax SQL Rule of grammar .
• Semantic analysis : Semantic analysis is SQL A logical phase of the parsing process , The main task is to examine the nature of context on the basis of correct grammar , stay SQL This stage of the parsing process completes the table name 、 The operator 、 Type and other elements , At the same time, semantic ambiguity is detected .
openGauss stay pg_parse_query Call in raw_parser Function on user input SQL Command for lexical analysis and syntax analysis , Generate a syntax tree and add it to the linked list parsetree_list in . After parsing , about parsetree_list Every syntax tree in parsetree, Would call parse_**yze Function for semantic analysis , according to SQL Different orders , Execute the corresponding entry function , Finally, the query tree is generated .

Lexical analysis
openGauss Use flex Tools for lexical analysis .flex The tool compiles the defined lexical file , Generate lexical analysis code . The lexical file is scan.l, It is based on SQL The language standard is right SQL Keywords in language 、 identifier 、 The operator 、 Constant 、 The terminator is defined and identified . stay kwlist.h A large number of keywords are defined in , In alphabetical order , It is convenient to find keywords by dichotomy . stay scan.l In dealing with “ identifier ” when , Will be matched in the keyword list , If an identifier matches a keyword , It is considered to be a keyword , Otherwise, it is the identifier , Keyword first . With “select a, b from item” As an example to illustrate the results of lexical analysis .

Syntax analysis
openGauss It defines bison Syntax files recognized by the tool gram.y, according to SQL Different languages define a series of expressions Statement The structure of the body ( These structures are usually Stmt As a naming suffix ), Used to save parsing results . With SELECT For example, query , Its corresponding Statement The structure is as follows .
typedef struct SelectStmt
{
NodeTag type;
List *distinctClause; /* NULL, list of DISTINCT ON exprs, or
* lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* the target list (of ResTarget) */
List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
WithClause *withClause; /* WITH clause */
List *valuesLists; /* untransformed list of expression lists */
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
……
} SelectStmt;This structure can be regarded as a multi fork tree , Each leaf node expresses SELECT A syntax structure in a query statement , Corresponding to gram.y in , It will have a SelectStmt. The code is as follows :

from simple_select The grammatical structure shows that , A simple query statement consists of the following clauses : Remove line duplicates distinctClause、 Target properties targetList、SELECT INTO Clause intoClause、FROM Clause fromClause、WHERE Clause whereClause、GROUP BY Clause groupClause、HAVING Clause havingClause、 Window clause windowClause and plan_hint Clause . After successful matching simple_select After the grammatical structure , Will create a Statement Structure , Assign corresponding values to each clause . Yes simple_select for , Target properties 、FROM Clause 、WHERE Clause is the most important part .SelectStmt The relationship with other structures is as follows :

Let's say “select a, b from item” As an example, the description is simple select Statement parsing process , function exec_simple_query call pg_parse_query Execution Analysis , There is only one element in the parse tree .

(gdb) p *parsetree_list
$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}List The node type in is T_SelectStmt.
(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)
$45 = {type = T_SelectStmt}see SelectStmt Structure ,targetList and fromClause Non empty .
(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,
havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,
larg = 0x0, rarg = 0x0, hasPlus = false}see SelectStmt Of targetlist, There are two ResTarget.
(gdb) p *($stmt->targetList)
$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)
$57 = {type = T_ResTarget}
(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget1->val
$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((ColumnRef *)$restarget1->val)->fields
$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}
(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget2->val
$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((ColumnRef *)$restarget2->val)->fields
$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}}see SelectStmt Of fromClause, There is one RangeVar.
(gdb) p *$stmt->fromClause
$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,
location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}From the above analysis, we can get the syntax tree structure .

Semantic analysis
After lexical analysis and grammar analysis ,parse_Ana lyze The function will be based on the type of syntax tree , call transformSelectStmt take parseTree Rewrite to query tree .

(gdb) p *result
$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,
hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,
jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
(gdb) p *result->targetList
$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)
$8 = {type = T_TargetEntry}
(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)
$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)
$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb) p *(Node *)(result->rtable->head.data->ptr_value)
$15 = {type = T_RangeTblEntry}
(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)
$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}The resulting query tree structure is as follows :

Complete morphology 、 After grammatical and semantic analysis ,SQL The parsing process is complete ,SQL The engine starts to perform query optimization , It will be analyzed in the next issue .
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- 浅谈软件研发的复杂性与效能提升之道
- CVPR2022 | 浙大、蚂蚁集团提出基于标签关系树的层级残差多粒度分类网络,建模多粒度标签间的层级知识
- Lumiprobe ProteOrange 蛋白质凝胶染料说明书
- 1 goal, 3 fields, 6 factors and 9 links of digital transformation
- Yixin Huachen: real estate enterprises want to grasp the opportunity of the times for digital transformation
- SqlTransaction
- 百度时间因子添加
- idea其他分支合并到dev分支
- C语言文件操作
- An in-depth analysis of the election mechanism in kubernetes
猜你喜欢

About Statistical Distributions

Concept and code implementation of heap

ONEFLOW source code parsing: automatic inference of operator signature

Voice network VQA: make the user's subjective experience of unknown video quality in real-time interaction known

刷题分析工具

数据资产为王,如何解析企业数字化转型与数据资产管理的关系?

Michael Wooldridge, professeur à l'Université d'Oxford: comment la communauté de l'IA voit les réseaux neuronaux depuis près de 40 ans

idea其他分支合并到dev分支

被315点名的流氓下载器,又回来了…

About covariance and correlation
随机推荐
Analysis of the core components of mybayis
模块化操作
Openfire用户以及群组关系移植
亿信华辰:地产企业数字化转型想要把握时代机遇
数字化转型的1个目标,3大领域,6大因素和9个环节
An in-depth analysis of the election mechanism in kubernetes
Analyzing the practical development of robot teaching
BioVendor游离轻链(κ和λ)Elisa 试剂盒检测步骤
实时Transformer:美团在单图像深度估计上的研究
19.2 容器分类、array、vector容器精解
注意!PMP紧急缓考今天就截止了!
从理论到实践增强STEAM和工程教育
Chapter 2 processing files, cameras and GUI Cameo applications
业务层修改--根据现有框架的反推修改
Enhancing steam and engineering education from theory to practice
Lumiprobe丨Lumizol RNA 提取试剂解决方案
深入解析kubernetes中的选举机制
使用Karmada实现Helm应用的跨集群部署
Object tracking using tracker in opencv
原生实现.NET 5.0+ 自定义日志