当前位置:网站首页>Postgresql源码(53)plpgsql语法解析关键流程、函数分析
Postgresql源码(53)plpgsql语法解析关键流程、函数分析
2022-06-12 16:18:00 【mingjie】
相关 《Postgresql源码(41)plpgsql函数编译执行流程分析》《Postgresql源码(46)plpgsql中的变量类型及对应关系》《Postgresql源码(49)plpgsql函数编译执行流程分析总结》《Postgresql源码(53)plpgsql语法解析关键流程、函数分析》
0-0 总结
plpgsql_yylex等价于server端的base_yylex,都是在lex的基础上做了封装用于获取一个token。
(server端语法解析参考:《Postgresql源码(44)server端语法解析流程分析》)
区别是plpgsql_yylex做了两层封装,base_yylex做了一层封装:
- plpgsql_yylex调用internal_yylex调用core_yylex(internal_yylex主要用于读那些lookahead的token 还有用来解析
<< >> #) - base_yylex调用core_yylex
base_yylex在解析是有时会lookahead向前多看一个token,对于server端现有语法来说就足够了。但是对于plpgsql复杂语法来说只向前看一个是不够的,所以plpgsql_yylex中会有多次调用internal_yylex拿后面的token,最多可能会向前看5个token(例如定义时有这样的变量i3 public.tf1.c1%TYPE;变量类型需要一起解析出来,单独看每个token是没有意义的)
plpgsql语法解析的整体流程和server类似:
- 拿到需要编译的字符串
- plpgsql_yylex解析字符返回token,有时需要向前看几个才知道应该返回什么token
- 进入pl_gram.y匹配语法树匹配token
0-1 函数总结速查
plpgsql_yylex
五种情况
0、非IDENT :直接返回
1、IDENT :例如:i1 int;中的int
2、IDENT . :例如:i1 int.; 语法错误
3、IDENT . IDENT :例如:i2 tf1.c2%TYPE;中的tf1.c2
4、IDENT . IDENT . :例如:i1 int.int.; 语法错误
5、IDENT . IDENT . IDENT :例如:i3 public.tf1.c1%TYPE;中的public.tf1.c1其中除了2、4语法错误的,除了0直接返回的,剩下1、3、5会走专门的函数处理
IDENT :plpgsql_parse_word
IDENT . IDENT :plpgsql_parse_dblword
IDENT . IDENT . IDENT :plpgsql_parse_tripword注意所有向前看的token,不用的话都要push_back_token到队列,下次internal_yylex的时候会优先用队列里面的,没有才会调lex的。
plpgsql_parse_word/plpgsql_parse_dblword/plpgsql_parse_tripword
调用场景:一/二/三个单词的场景,在函数声明中总是返回T_WORD
功能:判断当前word是否在命名空间中(下面分析plpgsql_ns_lookup)
- 如果在token为T_DATUM,这是一个变量,启用PLwdatum *wdatum
- 如果不在token为T_WORD,没什么特殊含义,启用PLword *word
T_DATUM例子:上面用例中的i3 = -1;,i3已经定义过在ns中了,所以在后面遇到i3就有意义了。
plpgsql_ns_lookup
总结:函数只匹配var类型或label+var组合类型
返回值:
- names_used返回1:name1直接匹配var
- names_used返回2:name1匹配label,name2匹配var
plpgsql_ns_lookup_label
相对于plpgsql_ns_lookup,该函数只扫label
read_datatype
总结:
- 类型名会在plpgsql_yylex中解析为T_WORD(例如int)或T_CWORD(例如public.tf1.c1)或关键字
- 类型名构造
- 如果是单个单词的int直接查pg_type然后build_datatype构造类型;
- 如果是public.tf1.c1%TYPE这样的某个表的列,会先检查对象类型,再找到列类型,然后build_datatype构造类型
- 如果是xxx%TYPE会先查namespace,找到指定的datum,拿到类型,然后build_datatype构造类型;如果ns没有,当做单个单词处理
- 返回构造好的PLpgSQL_type
0-3 测试用例
drop table tf1;
create table tf1(c1 int, c2 int, c3 varchar(32), c4 varchar(32), c5 int);
insert into tf1 values(1,1000, 'China','Dalian', 23000);
insert into tf1 values(2,4000, 'Janpan', 'Tokio', 45000);
insert into tf1 values(3,1500, 'China', 'Xian', 25000);
insert into tf1 values(4,300, 'China', 'Changsha', 24000);
insert into tf1 values(5,400,'USA','New York', 35000);
insert into tf1 values(6,5000, 'USA', 'Bostom', 15000);
CREATE OR REPLACE FUNCTION tfun1() RETURNS int AS $$
DECLARE
i3 public.tf1.c1%TYPE;
i2 tf1.c2%TYPE;
i1 int;
row1 tf1%ROWTYPE;
BEGIN
i3 = -1;
i2 = pg_catalog.abs(i3);
i1 = pg_catalog.abs(i3);
SELECT * INTO row1 FROM tf1 WHERE c1 = 2;
i1 = row1.c2;
return i1;
END;
$$ LANGUAGE plpgsql;正文开始:
1 例子
例如下面函数中,i3 public.tf1.c1%TYPE;变量定义的匹配decl_statement的过程:
-- sql
CREATE OR REPLACE FUNCTION tfun1() RETURNS int AS $$
DECLARE
i3 public.tf1.c1%TYPE;
...
...
-- 匹配语法
decl_statement : decl_varname decl_const decl_datatype decl_collate decl_notnull decl_defval
{
PLpgSQL_variable *var;
...
}
;第一步:i3匹配decl_varname
i3被lex识别为IDENT返回,然后被plpgsql_yylex识别后转换为T_WORD返回给yacc,匹配到decl_varname
decl_varname : T_WORD
{
...
}
;第二步:decl_const空匹配
i3识别完了,继续识别后面的类型public.tf1.c1%TYPE,这个类型会在plpgsql_yylex中转换为T_CWORD(表示复杂类型)
这里变量没有const修饰,但是语法树还是会走一遍流程,注意这里是拿着T_CWORD进来的,没有匹配就没有消费掉,所以继续向后匹配。
decl_const :
{ $$ = false; }
| K_CONSTANT
{ $$ = true; }
;第三步:decl_datatype匹配到类型
注意这里是一个没有token类型的匹配,即匹配anything;这样做的原因是类型的定义多种多样,如果按格式匹配要写很多。不如all in函数里面做具体识别。
函数处理结束后,这个token不应该继续匹配后面的语法单元,所以用yyclearin跳过这个token。
read_datatype处理流程见下面3。
decl_datatype :
{
/*
* If there's a lookahead token, read_datatype
* should consume it.
*/
$$ = read_datatype(yychar);
yyclearin;
}
;第四步:decl_defval
必须有人消费掉分号,否则不会匹配到decl_statement : ...
decl_defval : ';'
{ $$ = NULL; }
| decl_defkey
{
$$ = read_sql_expression(';', ";");
}
;2 plpgsql_yylex
五种情况
0、非IDENT :直接返回
1、IDENT :例如:i1 int;中的int
2、IDENT . :例如:i1 int.; 语法错误
3、IDENT . IDENT :例如:i2 tf1.c2%TYPE;中的tf1.c2
4、IDENT . IDENT . :例如:i1 int.int.; 语法错误
5、IDENT . IDENT . IDENT :例如:i3 public.tf1.c1%TYPE;中的public.tf1.c1其中除了2、4语法错误的,除了0直接返回的,剩下1、3、5会走专门的函数处理
IDENT :plpgsql_parse_word
IDENT . IDENT :plpgsql_parse_dblword
IDENT . IDENT . IDENT :plpgsql_parse_tripword注意所有向前看的token,不用的话都要push_back_token到队列,下次internal_yylex的时候会优先用队列里面的,没有才会调lex的。
3 plpgsql_parse_word/plpgsql_parse_dblword/plpgsql_parse_tripword
调用场景:一/二/三个单词的场景,在函数声明中总是返回T_WORD
功能:判断当前word是否在命名空间中(下面分析plpgsql_ns_lookup)
- 如果在token为T_DATUM,这是一个变量,启用PLwdatum *wdatum
- 如果不在token为T_WORD,没什么特殊含义,启用PLword *word
T_DATUM例子:上面用例中的i3 = -1;,i3已经定义过在ns中了,所以在后面遇到i3就有意义了。
bool
plpgsql_parse_word(char *word1, const char *yytxt, bool lookup,
PLwdatum *wdatum, PLword *word)
{
PLpgSQL_nsitem *ns;
/*
* We should not lookup variables in DECLARE sections. In SQL
* expressions, there's no need to do so either --- lookup will happen
* when the expression is compiled.
*/
if (lookup && plpgsql_IdentifierLookup == IDENTIFIER_LOOKUP_NORMAL)
{
/*
* Do a lookup in the current namespace stack
*/
ns = plpgsql_ns_lookup(plpgsql_ns_top(), false,
word1, NULL, NULL,
NULL);
if (ns != NULL)
{
switch (ns->itemtype)
{
case PLPGSQL_NSTYPE_VAR:
case PLPGSQL_NSTYPE_REC:
wdatum->datum = plpgsql_Datums[ns->itemno];
wdatum->ident = word1;
wdatum->quoted = (yytxt[0] == '"');
wdatum->idents = NIL;
return true;
default:
/* plpgsql_ns_lookup should never return anything else */
elog(ERROR, "unrecognized plpgsql itemtype: %d",
ns->itemtype);
}
}
}
/*
* Nothing found - up to now it's a word without any special meaning for
* us.
*/
word->ident = word1;
word->quoted = (yytxt[0] == '"');
return false;
}4 plpgsql_ns_lookup
总结:函数只匹配var类型或label+var组合类型
返回值:
- names_used返回1:name1直接匹配var
- names_used返回2:name1匹配label,name2匹配var
分析:
函数可以接受三个name来搜索
PLpgSQL_nsitem *
plpgsql_ns_lookup(PLpgSQL_nsitem *ns_cur, bool localmode,
const char *name1, const char *name2, const char *name3,
int *names_used)
{
...
}一个name搜索比较简单,直接匹配,这里记录一个A.B两个name匹配的场景
当前ns_top的状态
(gdb) p *ns_top
$36 = {itemtype = PLPGSQL_NSTYPE_REC, itemno = 4, prev = 0x2a81c98, name = 0x2a81e68 "row1"}
(gdb) p *ns_top->prev
$37 = {itemtype = PLPGSQL_NSTYPE_VAR, itemno = 3, prev = 0x2a9ea90, name = 0x2a81ca8 "i1"}
(gdb) p *ns_top->prev->prev
$38 = {itemtype = PLPGSQL_NSTYPE_VAR, itemno = 2, prev = 0x2a9e800, name = 0x2a9eaa0 "i2"}
(gdb) p *ns_top->prev->prev->prev
$39 = {itemtype = PLPGSQL_NSTYPE_VAR, itemno = 1, prev = 0x2a9e5e8, name = 0x2a9e810 "i3"}
(gdb) p *ns_top->prev->prev->prev->prev
$40 = {itemtype = PLPGSQL_NSTYPE_LABEL, itemno = 0, prev = 0x2a9e5b0, name = 0x2a9e5f8 ""}
(gdb) p *ns_top->prev->prev->prev->prev->prev
$41 = {itemtype = PLPGSQL_NSTYPE_VAR, itemno = 0, prev = 0x2a9e4e0, name = 0x2a9e5c0 "found"}
(gdb) p *ns_top->prev->prev->prev->prev->prev->prev
$42 = {itemtype = PLPGSQL_NSTYPE_LABEL, itemno = 0, prev = 0x0, name = 0x2a9e4f0 "tfun1"}返回值:
- names_used返回1:name1直接匹配var
- names_used返回2:name1匹配label,name2匹配var
所以当参数为:
plpgsql_ns_lookup (ns_cur=0x2a69848, localmode=false, name1=0x2a69f80 "row1", name2=0x2a69fa0 "c2", name3=0x0, names_used=0x7ffddfdbbf0c)
直接匹配:
{itemtype = PLPGSQL_NSTYPE_REC, itemno = 4, prev = 0x2a81c98, name = 0x2a81e68 "row1"}
PLpgSQL_nsitem *
plpgsql_ns_lookup(PLpgSQL_nsitem *ns_cur, bool localmode,
const char *name1, const char *name2, const char *name3,
int *names_used)
{
/* Outer loop iterates once per block level in the namespace chain */
while (ns_cur != NULL)
{
PLpgSQL_nsitem *nsitem;
/* Check this level for unqualified match to variable name */
for (nsitem = ns_cur;
nsitem->itemtype != PLPGSQL_NSTYPE_LABEL;
nsitem = nsitem->prev)
{
if (strcmp(nsitem->name, name1) == 0)
{
if (name2 == NULL ||
nsitem->itemtype != PLPGSQL_NSTYPE_VAR)
{
if (names_used)
*names_used = 1;
return nsitem;
}
}
}
/* Check this level for qualified match to variable name */
if (name2 != NULL &&
strcmp(nsitem->name, name1) == 0)
{
for (nsitem = ns_cur;
nsitem->itemtype != PLPGSQL_NSTYPE_LABEL;
nsitem = nsitem->prev)
{
if (strcmp(nsitem->name, name2) == 0)
{
if (name3 == NULL ||
nsitem->itemtype != PLPGSQL_NSTYPE_VAR)
{
if (names_used)
*names_used = 2;
return nsitem;
}
}
}
}
if (localmode)
break; /* do not look into upper levels */
ns_cur = nsitem->prev;
}
/* This is just to suppress possibly-uninitialized-variable warnings */
if (names_used)
*names_used = 0;
return NULL; /* No match found */
}5 plpgsql_ns_lookup_label
相对于plpgsql_ns_lookup,该函数只扫label
/* ----------
* plpgsql_ns_lookup_label Lookup a label in the given namespace chain
* ----------
*/
PLpgSQL_nsitem *
plpgsql_ns_lookup_label(PLpgSQL_nsitem *ns_cur, const char *name)
{
while (ns_cur != NULL)
{
if (ns_cur->itemtype == PLPGSQL_NSTYPE_LABEL &&
strcmp(ns_cur->name, name) == 0)
return ns_cur;
ns_cur = ns_cur->prev;
}
return NULL; /* label not found */
}7 read_datatype
总结:
- 类型名会在plpgsql_yylex中解析为T_WORD(例如int)或T_CWORD(例如public.tf1.c1)或关键字
- 类型名构造
- 如果是单个单词的int直接查pg_type然后build_datatype构造类型;
- 如果是public.tf1.c1%TYPE这样的某个表的列,会先检查对象类型,再找到列类型,然后build_datatype构造类型
- 如果是xxx%TYPE会先查namespace,找到指定的datum,拿到类型,然后build_datatype构造类型;如果ns没有,当做单个单词处理
- 返回构造好的PLpgSQL_type
解析i3 public.tf1.c1%TYPE;的过程:
static PLpgSQL_type *
read_datatype(int tok)
{
...
if (tok == T_WORD)
{
...
}
else if (plpgsql_token_is_unreserved_keyword(tok))
{
...
}
// 函数进入前public.tf1.c1已经解析完成,转换成T_CWORD
else if (tok == T_CWORD)
{
// public tf1 c1三个元素保存在yylval.cword.idents
List *dtnames = yylval.cword.idents;
tok = yylex(); // 再向后看一个,读到%
if (tok == '%')
{
tok = yylex(); // 再向后看一个,读到K_TYPE
if (tok_is_keyword(tok, &yylval,
K_TYPE, "type"))
{
// 拿着public tf1 c1三个元素,进入plpgsql_parse_cwordtype
result = plpgsql_parse_cwordtype(dtnames);
/*
if (list_length(idents) == 3)
1、makeRangeVar生成RangeVar:{schemaname = 0x2a9e640 "public", relname = 0x2a9e660 "tf1"}
2、RangeVarGetRelid拿到OID
3、先扫PGCLASS确定OID是个什么:SearchSysCache1(RELOID, ObjectIdGetDatum(classOid))
4、在扫ATT找到指定表列:SearchSysCacheAttName(classOid, fldname)
5、最后build_datatype:
{typname = 0x2a9e7c0 "int4",
typoid = 23,
ttype = PLPGSQL_TTYPE_SCALAR,
typlen = 4, typbyval = true,
typtype = 98 'b',
collation = 0,
typisarray = false,
atttypmod = -1,
origtypname = 0x0,
tcache = 0x0,
tupdesc_id = 0}
*/
if (result)
return result;
...
...边栏推荐
- Thinking about the probability of drawing cards in the duel link of game king
- 3/6 线性系统的时域分析法(上)
- Super detailed dry goods! Docker+pxc+haproxy build a MySQL Cluster with high availability and strong consistency
- Analysis on the development status and direction of China's cultural tourism real estate industry in 2021: the average transaction price has increased, and cultural tourism projects continue to innova
- 读取mhd、raw图像并切片、归一化、保存
- acwing 802. 区间和 (离散化)
- Interview: difference between '= =' and equals()
- Development practice of ag1280q48 in domestic CPLD
- acwing796 子矩阵的和
- Recurrent+Transformer 视频恢复领域的‘德艺双馨’
猜你喜欢

Servlet API

leetcode-54. Spiral matrix JS

< 山东大学软件学院项目实训 > 渲染引擎系统——基础渲染器(六)
![[browser principle] variable promotion](/img/19/f6b26d97c6024893a21dd40e2bbc47.jpg)
[browser principle] variable promotion

Read MHD and raw images, slice, normalize and save them

< 山东大学软件学院项目实训 > 渲染引擎系统——点云处理(十)

RTOS rt-thread裸机系统与多线程系统

Scanpy (VI) analysis and visualization of spatial transcriptome data

Acwing794 high precision Division

Project training of Software College of Shandong University rendering engine system basic renderer (V)
随机推荐
深入理解 Go Modules 的 go.mod 与 go.sum
[tool recommendation] personal local markdown knowledge map software
Global and Chinese market of soft capsule manufacturing equipment 2022-2028: Research Report on technology, participants, trends, market size and share
Tensorflow function: tf nn. in_ top_ k()
acwing 2816. 判断子序列
Global and Chinese markets of bioreactors 2022-2028: Research Report on technology, participants, trends, market size and share
HEMA is the best representative of future retail
Writing code can also be classified as "manual" or "vulgar", and we should be good at finding good hands!
acwing 797 差分
批量--04---移动构件
Project training of Software College of Shandong University rendering engine system radiation pre calculation (IX)
Let's talk about events. Listen to those things. - Part one
Interview: hashcode() and equals()
leetcode-54. Spiral matrix JS
Development practice of ag1280q48 in domestic CPLD
Example of bit operation (to be continued)
盒马,最能代表未来的零售
Unicom Network Management Protocol block diagram
Servlet API
< 山东大学软件学院项目实训 > 渲染引擎系统——基础渲染器(四)