当前位置:网站首页>从内核代码了解SQL如何解析
从内核代码了解SQL如何解析
2022-06-11 15:36:00 【Gauss松鼠会】
目录
在传统数据库中SQL引擎一般指对用户输入的SQL语句进行解析、优化的软件模块。
SQL的解析过程主要分为:
词法分析Lexical Analysis:将用户输入的SQL语句拆解成单词(Token)序列,并识别出关键字、标识、常量等。
语法分析Syntax Analysis:分析器对词法分析器解析出来的单词(Token)序列在语法上是否满足SQL语法规则。
语义分析Semantic Analysis:语义分析是SQL解析过程的一个逻辑阶段,主要任务是在语法正确的基础上进行上下文有关性质的审查,在SQL解析过程中该阶段完成表名、操作符、类型等元素的合法性判断,同时检测语义上的二义性。
openGauss在pg_parse_query中调用raw_parser函数对用户输入的SQL命令进行词法分析和语法分析,生成语法树添加到链表parsetree_list中。完成语法分析后,对于parsetree_list中的每一颗语法树parsetree,会调用parse_analyze函数进行语义分析,根据SQL命令的不同,执行对应的入口函数,最终生成查询树。

词法分析Lexical Analysis
openGauss使用flex工具进行词法分析。flex工具通过对已经定义好的词法文件进行编译,生成词法分析的代码。词法文件是scan.l,它根据SQL语言标准对SQL语言中的关键字、标识符、操作符、常量、终结符进行了定义和识别。在kwlist.h中定义了大量的关键字,按照字母的顺序排列,方便在查找关键字时通过二分法进行查找。在scan.l中处理“标识符”时,会到关键字列表中进行匹配,如果一个标识符匹配到关键字,则认为是关键字,否则才是标识符,即关键字优先. 以“select a, b from item”为例说明词法分析结果。
名称 | 词性 | 内容 | 说明 |
关键字 | keyword | SELECT,FROM | 如SELECT/FROM/WHERE等,对大小写不敏感 |
标识符 | IDENT | a,b,item | 用户自己定义的名字、常量名、变量名和过程名,若无括号修饰则对大小写不敏感 |
语法分析Syntax Analysis
openGauss中定义了bison工具能够识别的语法文件gram.y,根据SQL语言的不同定义了一系列表达Statement的结构体(这些结构体通常以Stmt作为命名后缀),用来保存语法分析结果。以SELECT查询为例,它对应的Statement结构体如下。
typedef struct SelectStmt
{
NodeTag type;
List *distinctClause; /* NULL, list of DISTINCT ON exprs, or
* lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* the target list (of ResTarget) */
List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
WithClause *withClause; /* WITH clause */
List *valuesLists; /* untransformed list of expression lists */
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
……
} SelectStmt;这个结构体可以看作一个多叉树,每个叶子节点都表达了SELECT查询语句中的一个语法结构,对应到gram.y中,它会有一个SelectStmt。代码如下:

从simple_select语法分析结构可以看出,一条简单的查询语句由以下子句组成:去除行重复的distinctClause、目标属性targetList、SELECT INTO子句intoClause、FROM子句fromClause、WHERE子句whereClause、GROUP BY子句groupClause、HAVING子句havingClause、窗口子句windowClause和plan_hint子句。在成功匹配simple_select语法结构后,将会创建一个Statement结构体,将各个子句进行相应的赋值。对simple_select而言,目标属性、FROM子句、WHERE子句是最重要的组成部分。SelectStmt与其他结构体的关系如下:

下面以“select a, b from item”为例说明简单select语句的解析过程,函数exec_simple_query调用pg_parse_query执行解析,解析树中只有一个元素。

(gdb) p *parsetree_list
$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}List中的节点类型为T_SelectStmt
(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)
$45 = {type = T_SelectStmt}查看SelectStmt结构体,targetList 和fromClause非空
(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,
havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,
larg = 0x0, rarg = 0x0, hasPlus = false}查看SelectStmt的targetlist,有两个ResTarget
(gdb) p *($stmt->targetList)
$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)
$57 = {type = T_ResTarget}(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget1->val
$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((ColumnRef *)$restarget1->val)->fields
$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget2->val
$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((ColumnRef *)$restarget2->val)->fields
$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}查看SelectStmt的fromClause,有一个RangeVar
(gdb) p *$stmt->fromClause
$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,
location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}综合以上分析可以得到语法树结构

语义分析Semantic Analysis
在完成词法分析和语法分析后,parse_analyze函数会根据语法树的类型,调用transformSelectStmt将parseTree改写为查询树。

(gdb) p *result
$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,
hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,
jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
(gdb) p *result->targetList
$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)
$8 = {type = T_TargetEntry}
(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)
$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)
$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb) p *(Node *)(result->rtable->head.data->ptr_value)
$15 = {type = T_RangeTblEntry}
(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)
$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}得到的查询树结构如下:

完成词法、语法和语义分析后,SQL解析过程完成,SQL引擎开始执行查询优化。
边栏推荐
- 2022.02.28
- Dynamically set the layoutinflater of the layout
- Flower shop window (linear DP)
- 2022年软件测试的前景如何?需不需要懂代码?
- Microservices - use of Nacos
- Analysis on the architecture of distributed systems - transaction and isolation level (multi object, multi operation) Part 2
- Qcustomplot 1.0.1 learning (3) - plotting quadratic functions
- Learnopongl notes (IV) - Advanced OpenGL II
- 从0到1稳稳掌握大厂主流技术,年后涨薪不是必须的吗?
- How to play seek tiger, which has attracted much attention in the market?
猜你喜欢

从0到1稳稳掌握大厂主流技术,年后涨薪不是必须的吗?

每日一博 - 微服务权限一二事

Hebei huangjinzhai scenic spot adds "AED automatic defibrillator" to ensure the life safety of tourists!

Uniapp développe des applets Wechat, de la construction à la mise en ligne

03 _ Transaction isolation: why can't I see it after you change it?

uniapp开发微信小程序,从构建到上线

2022 Tibet's latest eight major construction personnel (labor workers) simulation test question bank and answers

数据分析系统的设计与实现

Tencent interviewers share their interview experience, how to evaluate the interviewers' technical and personal comprehensive quality, and give you some suggestions on the interview

19. insertion, deletion and pruning of binary search tree
随机推荐
Everything about JS functions
Cisco Rui submitted the registration of sci tech Innovation Board: proposed to raise 600million yuan, with annual revenue of 222million yuan
Find combination number (function)
[creation mode] prototype mode
Managing technology debt in a microservice architecture
你还不懂线程池的设计及原理吗?掰开揉碎了教你设计线程池
A brief talk on the feelings after working at home | community essay solicitation
高数_第6章无穷级数__马克劳林级数
对于事务的认识
腾讯面试官分享面试经验,如何考察面试者技术及个人综合素质,给正在面试的你一点建议
With a loss of 13.6 billion yuan in three years, can listing revive Weima?
[multi thread performance tuning] what operations cause context switching?
02 _ Log system: how does an SQL UPDATE statement execute?
2022 Tibet's latest junior firefighter simulation test question bank and answers
见微知著,细节上雕花:SVG生成矢量格式网站图标(Favicon)探究
Illustration of tiger international quarterly report: revenue of USD 52.63 million continued to be internationalized
Art plus online school: Sketch common sitting posture test questions, these three angles must be mastered~
基于 GateWay 和 Nacos 实现微服务架构灰度发布方案
Microservices - use of Nacos
04 _ 深入浅出索引(上)