当前位置:网站首页>Learn how to parse SQL from kernel code
Learn how to parse SQL from kernel code
2022-06-11 15:50:00 【Gauss squirrel Club】
Catalog
Lexical analysis Lexical Analysis
Syntax analysis Syntax Analysis
Semantic analysis Semantic Analysis
In a traditional database SQL The engine generally refers to the SQL Statement parsing 、 Optimized software modules .
SQL The parsing process of is mainly divided into :
Lexical analysis Lexical Analysis: Enter the SQL Statements are broken down into words (Token) Sequence , And identify the key words 、 identification 、 Constant etc. .
Syntax analysis Syntax Analysis: The word parsed by the lexical analyzer (Token) Whether the sequence satisfies the syntax SQL Rule of grammar .
Semantic analysis Semantic Analysis: Semantic analysis is SQL A logical phase of the parsing process , The main task is to examine the nature of context on the basis of correct grammar , stay SQL This stage of the parsing process completes the table name 、 The operator 、 Type and other elements , At the same time, semantic ambiguity is detected .
openGauss stay pg_parse_query Call in raw_parser Function on user input SQL Command for lexical analysis and syntax analysis , Generate a syntax tree and add it to the linked list parsetree_list in . After parsing , about parsetree_list Every syntax tree in parsetree, Would call parse_analyze Function for semantic analysis , according to SQL Different orders , Execute the corresponding entry function , Finally, the query tree is generated .

Lexical analysis Lexical Analysis
openGauss Use flex Tools for lexical analysis .flex The tool compiles the defined lexical file , Generate lexical analysis code . The lexical file is scan.l, It is based on SQL The language standard is right SQL Keywords in language 、 identifier 、 The operator 、 Constant 、 The terminator is defined and identified . stay kwlist.h A large number of keywords are defined in , In alphabetical order , It is convenient to find keywords by dichotomy . stay scan.l In dealing with “ identifier ” when , Will be matched in the keyword list , If an identifier matches a keyword , It is considered to be a keyword , Otherwise, it is the identifier , Keyword first . With “select a, b from item” As an example to illustrate the results of lexical analysis .
name | The part of speech | Content | explain |
keyword | keyword | SELECT,FROM | Such as SELECT/FROM/WHERE etc. , Case insensitive |
identifier | IDENT | a,b,item | User defined name 、 Constant names 、 Variable name and procedure name , If there is no parenthesis modifier, it is not case sensitive |
Syntax analysis Syntax Analysis
openGauss It defines bison Syntax files recognized by the tool gram.y, according to SQL Different languages define a series of expressions Statement The structure of the body ( These structures are usually Stmt As a naming suffix ), Used to save parsing results . With SELECT For example, query , Its corresponding Statement The structure is as follows .
typedef struct SelectStmt
{
NodeTag type;
List *distinctClause; /* NULL, list of DISTINCT ON exprs, or
* lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause; /* target for SELECT INTO */
List *targetList; /* the target list (of ResTarget) */
List *fromClause; /* the FROM clause */
Node *whereClause; /* WHERE qualification */
List *groupClause; /* GROUP BY clauses */
Node *havingClause; /* HAVING conditional-expression */
List *windowClause; /* WINDOW window_name AS (...), ... */
WithClause *withClause; /* WITH clause */
List *valuesLists; /* untransformed list of expression lists */
List *sortClause; /* sort clause (a list of SortBy's) */
Node *limitOffset; /* # of result tuples to skip */
Node *limitCount; /* # of result tuples to return */
……
} SelectStmt;This structure can be regarded as a multi fork tree , Each leaf node expresses SELECT A syntax structure in a query statement , Corresponding to gram.y in , It will have a SelectStmt. The code is as follows :

from simple_select The grammatical structure shows that , A simple query statement consists of the following clauses : Remove line duplicates distinctClause、 Target properties targetList、SELECT INTO Clause intoClause、FROM Clause fromClause、WHERE Clause whereClause、GROUP BY Clause groupClause、HAVING Clause havingClause、 Window clause windowClause and plan_hint Clause . After successful matching simple_select After the grammatical structure , Will create a Statement Structure , Assign corresponding values to each clause . Yes simple_select for , Target properties 、FROM Clause 、WHERE Clause is the most important part .SelectStmt The relationship with other structures is as follows :

Let's say “select a, b from item” As an example, the description is simple select Statement parsing process , function exec_simple_query call pg_parse_query Execution Analysis , There is only one element in the parse tree .

(gdb) p *parsetree_list
$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}List The node type in is T_SelectStmt
(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)
$45 = {type = T_SelectStmt}see SelectStmt Structure ,targetList and fromClause Non empty
(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,
havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,
larg = 0x0, rarg = 0x0, hasPlus = false}see SelectStmt Of targetlist, There are two ResTarget
(gdb) p *($stmt->targetList)
$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)
$57 = {type = T_ResTarget}(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget1->val
$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((ColumnRef *)$restarget1->val)->fields
$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget2->val
$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((ColumnRef *)$restarget2->val)->fields
$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}see SelectStmt Of fromClause, There is one RangeVar
(gdb) p *$stmt->fromClause
$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,
location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}From the above analysis, we can get the syntax tree structure

Semantic analysis Semantic Analysis
After lexical analysis and grammar analysis ,parse_analyze The function will be based on the type of syntax tree , call transformSelectStmt take parseTree Rewrite to query tree .

(gdb) p *result
$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,
hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,
jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
(gdb) p *result->targetList
$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)
$8 = {type = T_TargetEntry}
(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)
$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)
$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)
(gdb) p *result->rtable
$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb) p *(Node *)(result->rtable->head.data->ptr_value)
$15 = {type = T_RangeTblEntry}
(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)
$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}The resulting query tree structure is as follows :

Complete morphology 、 After grammatical and semantic analysis ,SQL The parsing process is complete ,SQL The engine starts to perform query optimization .
边栏推荐
- [0006] title, keyword and page description
- 从内核代码了解SQL如何解析
- openGauss 多线程架构启动过程详解
- Everything about JS functions
- GO语言-值类型和引用类型
- NielsenIQ宣布任命Tracey Massey为首席运营官
- 2022年软件测试的前景如何?需不需要懂代码?
- From 0 to 1, master the mainstream technology of large factories steadily. Isn't it necessary to increase salary after one year?
- Frontier technology exploration deepsql: in Library AI algorithm
- Performance of MOS transistor 25n120 of asemi in different application scenarios
猜你喜欢
![[0006] titre, mots clés et description de la page](/img/28/973bdb04420c9e6e9a2331663c6948.png)
[0006] titre, mots clés et description de la page

Cf662b graph coloring problem solution
![[creation mode] single instance mode](/img/80/b90c7358de9670e9b07d28752efee5.png)
[creation mode] single instance mode

每日一博 - 微服务权限一二事

Tianjin Port coke wharf hand in hand map flapping software to visually unlock the smart coke port

关于 JS 函数的一切

2022 Tibet's latest junior firefighter simulation test question bank and answers

Opengauss version 3.0.0 was officially released, and immediately experience the first lightweight version in the community

Hands on, how should selenium deal with pseudo elements?

Google Earth engine (GEE) - create a simple panel demo to display the map
随机推荐
2022年软件测试的前景如何?需不需要懂代码?
2022 Tibet's latest junior firefighter simulation test question bank and answers
Hands on, how should selenium deal with pseudo elements?
码农必备SQL调优(上)
[Yugong series] June 2022 Net architecture class 079 cluster principle of distributed middleware schedulemaster
【0006】title、關鍵字及頁面描述
AI4DB:人工智能之慢SQL根因分析
【Azure 应用服务】NodeJS Express + MSAL 实现API应用Token认证(AAD OAuth2 idToken)的认证实验 -- passport.authenticate('oauth-bearer', {session: false})
Shuttle-- common commands
Tianjin Port coke wharf hand in hand map flapping software to visually unlock the smart coke port
MAUI 入门教程系列(1.框架简介)
GO语言-值类型和引用类型
Take you in-depth understanding of AGC cloud database
Introduction to thread practice [hard core careful entry!]
Everything about JS functions
从屡遭拒稿到90后助理教授,罗格斯大学王灏:好奇心驱使我不断探索
What is the future of software testing in 2022? Do you need to understand the code?
Easy to use GS_ Dump and GS_ Dumpall command export data
Go language slice
关于 JS 函数的一切