当前位置:网站首页>Learn how to parse SQL from kernel code

Learn how to parse SQL from kernel code

2022-06-11 15:50:00 Gauss squirrel Club

Catalog

Lexical analysis Lexical Analysis

Syntax analysis Syntax Analysis

Semantic analysis Semantic Analysis


In a traditional database SQL The engine generally refers to the SQL Statement parsing 、 Optimized software modules .

SQL The parsing process of is mainly divided into :

  • Lexical analysis Lexical Analysis: Enter the SQL Statements are broken down into words (Token) Sequence , And identify the key words 、 identification 、 Constant etc. .

  • Syntax analysis Syntax Analysis: The word parsed by the lexical analyzer (Token) Whether the sequence satisfies the syntax SQL Rule of grammar .

  • Semantic analysis Semantic Analysis: Semantic analysis is SQL A logical phase of the parsing process , The main task is to examine the nature of context on the basis of correct grammar , stay SQL This stage of the parsing process completes the table name 、 The operator 、 Type and other elements , At the same time, semantic ambiguity is detected .

openGauss stay pg_parse_query Call in raw_parser Function on user input SQL Command for lexical analysis and syntax analysis , Generate a syntax tree and add it to the linked list parsetree_list in . After parsing , about parsetree_list Every syntax tree in parsetree, Would call parse_analyze Function for semantic analysis , according to SQL Different orders , Execute the corresponding entry function , Finally, the query tree is generated .

Lexical analysis Lexical Analysis

openGauss Use flex Tools for lexical analysis .flex The tool compiles the defined lexical file , Generate lexical analysis code . The lexical file is scan.l, It is based on SQL The language standard is right SQL Keywords in language 、 identifier 、 The operator 、 Constant 、 The terminator is defined and identified . stay kwlist.h A large number of keywords are defined in , In alphabetical order , It is convenient to find keywords by dichotomy . stay scan.l In dealing with “ identifier ” when , Will be matched in the keyword list , If an identifier matches a keyword , It is considered to be a keyword , Otherwise, it is the identifier , Keyword first . With “select a, b from item” As an example to illustrate the results of lexical analysis .

name

The part of speech

Content

explain

keyword

keyword

SELECT,FROM

Such as SELECT/FROM/WHERE etc. , Case insensitive

identifier

IDENT

a,b,item

User defined name 、 Constant names 、 Variable name and procedure name , If there is no parenthesis modifier, it is not case sensitive

Syntax analysis Syntax Analysis

openGauss It defines bison Syntax files recognized by the tool gram.y, according to SQL Different languages define a series of expressions Statement The structure of the body ( These structures are usually Stmt As a naming suffix ), Used to save parsing results . With SELECT For example, query , Its corresponding Statement The structure is as follows .


typedef struct SelectStmt
{
NodeTag  type;
List     *distinctClause; /* NULL, list of DISTINCT ON exprs, or
 * lcons(NIL,NIL) for all (SELECT DISTINCT) */
IntoClause *intoClause;  /* target for SELECT INTO */
List     *targetList;  /* the target list (of ResTarget) */
List     *fromClause;  /* the FROM clause */
Node     *whereClause;  /* WHERE qualification */
List     *groupClause;  /* GROUP BY clauses */
Node     *havingClause;  /* HAVING conditional-expression */
List     *windowClause;  /* WINDOW window_name AS (...), ... */
WithClause *withClause;  /* WITH clause */
List     *valuesLists;  /* untransformed list of expression lists */
List     *sortClause;  /* sort clause (a list of SortBy's) */
Node     *limitOffset;  /* # of result tuples to skip */
Node     *limitCount;  /* # of result tuples to return */
    ……
} SelectStmt;

This structure can be regarded as a multi fork tree , Each leaf node expresses SELECT A syntax structure in a query statement , Corresponding to gram.y in , It will have a SelectStmt. The code is as follows :

from simple_select The grammatical structure shows that , A simple query statement consists of the following clauses : Remove line duplicates distinctClause、 Target properties targetList、SELECT INTO Clause intoClause、FROM Clause fromClause、WHERE Clause whereClause、GROUP BY Clause groupClause、HAVING Clause havingClause、 Window clause windowClause and plan_hint Clause . After successful matching simple_select After the grammatical structure , Will create a Statement Structure , Assign corresponding values to each clause . Yes simple_select for , Target properties 、FROM Clause 、WHERE Clause is the most important part .SelectStmt The relationship with other structures is as follows :

 

Let's say “select a, b from item” As an example, the description is simple select Statement parsing process , function exec_simple_query call pg_parse_query Execution Analysis , There is only one element in the parse tree .

 

(gdb) p *parsetree_list
$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}

List The node type in is T_SelectStmt

(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)
$45 = {type = T_SelectStmt}

see SelectStmt Structure ,targetList and fromClause Non empty

(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)
(gdb) p *$stmt
$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,
  havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,
  larg = 0x0, rarg = 0x0, hasPlus = false}

see SelectStmt Of targetlist, There are two ResTarget

(gdb) p *($stmt->targetList)
$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}
(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)
$57 = {type = T_ResTarget}
(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)
(gdb) p *$restarget1
$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}
(gdb) p *$restarget1->val
$63 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget1->val
$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}
(gdb) p *((ColumnRef *)$restarget1->val)->fields
$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}
(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$67 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value
$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}
(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)
(gdb) p *$restarget2
$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}
(gdb) p *$restarget2->val
$90 = {type = T_ColumnRef}
(gdb) p *(ColumnRef *)$restarget2->val
$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}
(gdb) p *((ColumnRef *)$restarget2->val)->fields
$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}
(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$93 = {type = T_String}
(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value
$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}

see SelectStmt Of fromClause, There is one RangeVar

(gdb) p *$stmt->fromClause
$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}
(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)
(gdb) p *$fromclause
$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,
  location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}

From the above analysis, we can get the syntax tree structure

 

Semantic analysis Semantic Analysis

After lexical analysis and grammar analysis ,parse_analyze The function will be based on the type of syntax tree , call transformSelectStmt take parseTree Rewrite to query tree .

 

(gdb) p *result
$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,
  hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,
  jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
 
(gdb) p *result->targetList
$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
 
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)
$8 = {type = T_TargetEntry}
(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)
$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}
(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)
$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}
(gdb)


(gdb) p *result->rtable
$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}
(gdb)  p *(Node *)(result->rtable->head.data->ptr_value)
$15 = {type = T_RangeTblEntry}
(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)
$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}

The resulting query tree structure is as follows :

 

Complete morphology 、 After grammatical and semantic analysis ,SQL The parsing process is complete ,SQL The engine starts to perform query optimization .

原网站

版权声明
本文为[Gauss squirrel Club]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206111535102189.html