当前位置:网站首页>从内核代码了解SQL如何解析
从内核代码了解SQL如何解析
2022-06-11 15:36:00 【Gauss松鼠会】
目录
在传统数据库中SQL引擎一般指对用户输入的SQL语句进行解析、优化的软件模块。
SQL的解析过程主要分为:
词法分析Lexical Analysis:将用户输入的SQL语句拆解成单词(Token)序列,并识别出关键字、标识、常量等。
语法分析Syntax Analysis:分析器对词法分析器解析出来的单词(Token)序列在语法上是否满足SQL语法规则。
语义分析Semantic Analysis:语义分析是SQL解析过程的一个逻辑阶段,主要任务是在语法正确的基础上进行上下文有关性质的审查,在SQL解析过程中该阶段完成表名、操作符、类型等元素的合法性判断,同时检测语义上的二义性。
openGauss在pg_parse_query中调用raw_parser函数对用户输入的SQL命令进行词法分析和语法分析,生成语法树添加到链表parsetree_list中。完成语法分析后,对于parsetree_list中的每一颗语法树parsetree,会调用parse_analyze函数进行语义分析,根据SQL命令的不同,执行对应的入口函数,最终生成查询树。

词法分析Lexical Analysis
openGauss使用flex工具进行词法分析。flex工具通过对已经定义好的词法文件进行编译,生成词法分析的代码。词法文件是scan.l,它根据SQL语言标准对SQL语言中的关键字、标识符、操作符、常量、终结符进行了定义和识别。在kwlist.h中定义了大量的关键字,按照字母的顺序排列,方便在查找关键字时通过二分法进行查找。在scan.l中处理“标识符”时,会到关键字列表中进行匹配,如果一个标识符匹配到关键字,则认为是关键字,否则才是标识符,即关键字优先. 以“select a, b from item”为例说明词法分析结果。
名称 | 词性 | 内容 | 说明 |
关键字 | keyword | SELECT,FROM | 如SELECT/FROM/WHERE等,对大小写不敏感 |
标识符 | IDENT | a,b,item | 用户自己定义的名字、常量名、变量名和过程名,若无括号修饰则对大小写不敏感 |
语法分析Syntax Analysis
openGauss中定义了bison工具能够识别的语法文件gram.y,根据SQL语言的不同定义了一系列表达Statement的结构体(这些结构体通常以Stmt作为命名后缀),用来保存语法分析结果。以SELECT查询为例,它对应的Statement结构体如下。
typedef struct SelectStmt
{
NodeTag type;
List *distinctClause; /* NULL, list of DISTINCT ON exprs, or
* lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* the target list (of ResTarget) */
List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
WithClause *withClause; /* WITH clause */
List *valuesLists; /* untransformed list of expression lists */
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
……
} SelectStmt;这个结构体可以看作一个多叉树,每个叶子节点都表达了SELECT查询语句中的一个语法结构,对应到gram.y中,它会有一个SelectStmt。代码如下:

从simple_select语法分析结构可以看出,一条简单的查询语句由以下子句组成:去除行重复的distinctClause、目标属性targetList、SELECT INTO子句intoClause、FROM子句fromClause、WHERE子句whereClause、GROUP BY子句groupClause、HAVING子句havingClause、窗口子句windowClause和plan_hint子句。在成功匹配simple_select语法结构后,将会创建一个Statement结构体,将各个子句进行相应的赋值。对simple_select而言,目标属性、FROM子句、WHERE子句是最重要的组成部分。SelectStmt与其他结构体的关系如下:

下面以“select a, b from item”为例说明简单select语句的解析过程,函数exec_simple_query调用pg_parse_query执行解析,解析树中只有一个元素。

(gdb) p *parsetree_list
$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}List中的节点类型为T_SelectStmt
(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)
$45 = {type = T_SelectStmt}查看SelectStmt结构体,targetList 和fromClause非空
(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,
havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,
larg = 0x0, rarg = 0x0, hasPlus = false}查看SelectStmt的targetlist,有两个ResTarget
(gdb) p *($stmt->targetList)
$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)
$57 = {type = T_ResTarget}(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget1->val
$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((ColumnRef *)$restarget1->val)->fields
$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget2->val
$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((ColumnRef *)$restarget2->val)->fields
$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}查看SelectStmt的fromClause,有一个RangeVar
(gdb) p *$stmt->fromClause
$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,
location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}综合以上分析可以得到语法树结构

语义分析Semantic Analysis
在完成词法分析和语法分析后,parse_analyze函数会根据语法树的类型,调用transformSelectStmt将parseTree改写为查询树。

(gdb) p *result
$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,
hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,
jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
(gdb) p *result->targetList
$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)
$8 = {type = T_TargetEntry}
(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)
$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)
$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb) p *(Node *)(result->rtable->head.data->ptr_value)
$15 = {type = T_RangeTblEntry}
(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)
$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}得到的查询树结构如下:

完成词法、语法和语义分析后,SQL解析过程完成,SQL引擎开始执行查询优化。
边栏推荐
- 简单的C语言版本通讯录
- CF662B Graph Coloring题解--zhengjun
- LoveLive! Published an AI paper: generating models to write music scores automatically
- [creation mode] factory method mode
- [azure application service] nodejs express + msal realizes the authentication experiment of API Application token authentication (AAD oauth2 idtoken) -- passport authenticate()
- IDEA2021.1版本安装教程
- Google Earth engine (GEE) - create a simple panel demo to display the map
- Don't you understand the design and principle of thread pool? Break it up and crush it. I'll teach you how to design the thread pool
- 02 _ Log system: how does an SQL UPDATE statement execute?
- Backtracking / activity scheduling maximum compatible activities
猜你喜欢

Illustration of tiger international quarterly report: revenue of USD 52.63 million continued to be internationalized

06 _ Global lock and table lock: Why are there so many obstacles to adding a field to a table?

Design and implementation of data analysis system

从0到1稳稳掌握大厂主流技术,年后涨薪不是必须的吗?

每日一博 - 微服务权限一二事

Interview shock 26: how to stop threads correctly?

02 _ 日志系统:一条SQL更新语句是如何执行的?

Backtracking / solution space tree permutation tree

Arthas实践操作文档记录
![[creation mode] abstract factory mode](/img/16/d0086ba4cceb1c174d3f88ef5448e6.png)
[creation mode] abstract factory mode
随机推荐
见微知著,细节上雕花:SVG生成矢量格式网站图标(Favicon)探究
[creation mode] prototype mode
Small open source projects based on stm32f1
Don't you understand the design and principle of thread pool? Break it up and crush it. I'll teach you how to design the thread pool
Backtracking / solution space tree permutation tree
Analysis on the architecture of distributed systems - transaction and isolation level (multi object, multi operation) Part 2
Safepoint explanation and analysis of its placement ideas
Mysql database optimization details
02 _ 日志系统:一条SQL更新语句是如何执行的?
【创建型模式】单例模式
Arthas practice documentation
A brief talk on the feelings after working at home | community essay solicitation
leetcode 120. Triangle minimum path sum
回溯法/活动安排 最大兼容活动
从0到1稳稳掌握大厂主流技术,年后涨薪不是必须的吗?
Qcustomplot 1.0.1 learning (3) - plotting quadratic functions
How can local retail release the "imprisoned value" and make physical stores grow again?
The research results of Professor xuweixin from the school of atmosphere of Sun Yat sen University on extreme precipitation caused by weak convection were reported by science highlights
Basic configuration command of Xinhua 3 switch system
Square and storage box (linear DP)