当前位置:网站首页>Pisa-Proxy 之 SQL 解析实践
Pisa-Proxy 之 SQL 解析实践
2022-06-27 14:06:00 【InfoQ】
一、背景
关于语法分析
- LL(自上而下)
- LR(自下而上)
- LALR
关于调研
- antlr_rust
- sqlparser-rs
- nom-sql
- grmtools

二、Grmtools 使用
- 编写 Lex 和 Yacc 文件
/%%
[0-9]+ "INT"
\+ "+"
\* "*"
\( "("
\) ")"
[\t ]+ ;
%start Expr
%avoid_insert "INT"
%%
Expr -> Result<u64, ()>:
Expr '+' Term { Ok($1? + $3?) }
| Term { $1 }
;
Term -> Result<u64, ()>:
Term '*' Factor { Ok($1? * $3?) }
| Factor { $1 }
;
Factor -> Result<u64, ()>:
'(' Expr ')' { $2 }
| 'INT'
{
let v = $1.map_err(|_| ())?;
parse_int($lexer.span_str(v.span()))
}
;
%%
- 构造词法和语法解析器
use cfgrammar::yacc::YaccKind;
use lrlex::CTLexerBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
CTLexerBuilder::new()
.lrpar_config(|ctp| {
ctp.yacckind(YaccKind::Grmtools)
.grammar_in_src_dir("calc.y")
.unwrap()
})
.lexer_in_src_dir("calc.l")?
.build()?;
Ok(())
}
- 在应用中集成解析
use std::env;
use lrlex::lrlex_mod;
use lrpar::lrpar_mod;
// Using `lrlex_mod!` brings the lexer for `calc.l` into scope. By default the
// module name will be `calc_l` (i.e. the file name, minus any extensions,
// with a suffix of `_l`).
lrlex_mod!("calc.l");
// Using `lrpar_mod!` brings the parser for `calc.y` into scope. By default the
// module name will be `calc_y` (i.e. the file name, minus any extensions,
// with a suffix of `_y`).
lrpar_mod!("calc.y");
fn main() {
// Get the `LexerDef` for the `calc` language.
let lexerdef = calc_l::lexerdef();
let args: Vec<String> = env::args().collect();
// Now we create a lexer with the `lexer` method with which we can lex an
// input.
let lexer = lexerdef.lexer(&args[1]);
// Pass the lexer to the parser and lex and parse the input.
let (res, errs) = calc_y::parse(&lexer);
for e in errs {
println!("{}", e.pp(&lexer, &calc_y::token_epp));
}
match res {
Some(r) => println!("Result: {:?}", r),
_ => eprintln!("Unable to evaluate expression.")
}
}
lrpar::NonStreamingLexer
lrlex::LRNonStreamingLexer::new()
三、遇到的问题
- Shift/Reduce 错误
Shift/Reduce conflicts:
State 619: Shift("TEXT_STRING") / Reduce(literal: "text_literal")
%nonassoc LOWER_THEN_ELSE
%nonassoc ELSE
stmt:
IF expr stmt %prec LOWER_THEN_ELSE
| IF expr stmt ELSE stmt
literal -> String:
text_literal
{ }
| NUM_literal
{ }
...
text_literal -> String:
'TEXT_STRING' {}
| 'NCHAR_STRING' {}
| text_literal 'TEXT_STRING' {}
...

%nonassoc 'LOWER_THEN_TEXT_STRING'
%nonassoc 'TEXT_STRING'
literal -> String:
text_literal %prec 'LOWER_THEN_TEXT_STRING'
{ }
| NUM_literal
{ }
...
text_literal -> String:
'TEXT_STRING' {}
| 'NCHAR_STRING' {}
| text_literal 'TEXT_STRING' {}
...
- SQL 包含中文问题
四、优化
- 在空跑解析(测试代码见附录),不执行 action 的情况下,性能如下:
[[email protected] examples]$ time ./parser
real 0m4.788s
user 0m4.781s
sys 0m0.002s


__GRM_DATA__STABLE_DATAgrmstable
- 再分析,每次解析的时候,都会初始化一个 actions 的数组,随着 grammar 中语法规则的增多,actions 的数组也会随之增大,且数组元素类型是 dyn trait 的引用,在运行时是有开销的。
::std::vec![&__gt_wrapper_0,
&__gt_wrapper_1,
&__gt_wrapper_2,
...
]
match idx {
0 => __gt_wrapper_0(),
1 => __gt_wrapper_1(),
2 => __gt_wrapper_2(),
....
}



[[email protected] examples]$ time ./parser
real 0m2.677s
user 0m2.667s
sys 0m0.007s
五、总结
附录
let input = "select id, name from t where id = ?;"
let p = parser::Parser::new();
for _ in 0..1_000_000
{
let _ = p.parse(input);
}
边栏推荐
- Julia1.1 installation instructions
- 事务的四大特性
- SFINAE
- Learning records of numpy Library
- [安洵杯 2019]Attack
- Tsinghua & Shangtang & Shanghai AI & CUHK proposed Siamese image modeling, which has both linear probing and intensive prediction performance
- 重读经典:《The Craft of Research(1)》
- Crane: a new way of dealing with dictionary items and associated data
- Buuctf Misc
- 522. 最长特殊序列 II / 剑指 Offer II 101. 分割等和子集
猜你喜欢

Massive data! Second level analysis! Flink+doris build a real-time data warehouse scheme

【mysql进阶】MTS主从同步原理及实操指南(七)

Learning records of numpy Library

AXI總線

How to set the compatibility mode of 360 speed browser

What is the difference between the FAT32 and NTFS formats on the USB flash disk

Axi bus
Principle Comparison and analysis of mechanical hard disk and SSD solid state disk

【业务安全03】密码找回业务安全以及接口参数账号修改实例(基于metinfov4.0平台)

PostgreSQL 15新版本特性解读(含直播问答、PPT资料汇总)
随机推荐
OpenSSF安全计划:SBOM将驱动软件供应链安全
Completely solve the problem of Chinese garbled code in Web Engineering at one time
my. INI file configuration
Crane: a new way of dealing with dictionary items and associated data
Step by step expansion of variable parameters in class templates
[XMAN2018排位赛]通行证
线程同步之信号量
How to use 200 lines of code to implement Scala's Object Converter
Crane: a new way of dealing with dictionary items and associated data
Brief reading of dynamic networks and conditional computing papers and code collection
Daily 3 questions (1): find the nearest point with the same X or Y coordinate
Array related knowledge
海外仓知识科普
【业务安全-02】业务数据安全测试及商品订购数量篡改实例
Half find (half find)
POSIX AIO -- Introduction to glibc version asynchronous IO
力扣 第 81 场双周赛
清华&商汤&上海AI&CUHK提出Siamese Image Modeling,兼具linear probing和密集预测性能!...
At a time of oversupply of chips, China, the largest importer, continued to reduce imports, and the United States panicked
Dynamic Networks and Conditional Computation论文简读和代码合集