当前位置:网站首页>Pisa-Proxy 之 SQL 解析实践
Pisa-Proxy 之 SQL 解析实践
2022-06-27 14:06:00 【InfoQ】
一、背景
关于语法分析
- LL(自上而下)
- LR(自下而上)
- LALR
关于调研
- antlr_rust
- sqlparser-rs
- nom-sql
- grmtools

二、Grmtools 使用
- 编写 Lex 和 Yacc 文件
/%%
[0-9]+ "INT"
\+ "+"
\* "*"
\( "("
\) ")"
[\t ]+ ;
%start Expr
%avoid_insert "INT"
%%
Expr -> Result<u64, ()>:
Expr '+' Term { Ok($1? + $3?) }
| Term { $1 }
;
Term -> Result<u64, ()>:
Term '*' Factor { Ok($1? * $3?) }
| Factor { $1 }
;
Factor -> Result<u64, ()>:
'(' Expr ')' { $2 }
| 'INT'
{
let v = $1.map_err(|_| ())?;
parse_int($lexer.span_str(v.span()))
}
;
%%
- 构造词法和语法解析器
use cfgrammar::yacc::YaccKind;
use lrlex::CTLexerBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
CTLexerBuilder::new()
.lrpar_config(|ctp| {
ctp.yacckind(YaccKind::Grmtools)
.grammar_in_src_dir("calc.y")
.unwrap()
})
.lexer_in_src_dir("calc.l")?
.build()?;
Ok(())
}
- 在应用中集成解析
use std::env;
use lrlex::lrlex_mod;
use lrpar::lrpar_mod;
// Using `lrlex_mod!` brings the lexer for `calc.l` into scope. By default the
// module name will be `calc_l` (i.e. the file name, minus any extensions,
// with a suffix of `_l`).
lrlex_mod!("calc.l");
// Using `lrpar_mod!` brings the parser for `calc.y` into scope. By default the
// module name will be `calc_y` (i.e. the file name, minus any extensions,
// with a suffix of `_y`).
lrpar_mod!("calc.y");
fn main() {
// Get the `LexerDef` for the `calc` language.
let lexerdef = calc_l::lexerdef();
let args: Vec<String> = env::args().collect();
// Now we create a lexer with the `lexer` method with which we can lex an
// input.
let lexer = lexerdef.lexer(&args[1]);
// Pass the lexer to the parser and lex and parse the input.
let (res, errs) = calc_y::parse(&lexer);
for e in errs {
println!("{}", e.pp(&lexer, &calc_y::token_epp));
}
match res {
Some(r) => println!("Result: {:?}", r),
_ => eprintln!("Unable to evaluate expression.")
}
}
lrpar::NonStreamingLexer
lrlex::LRNonStreamingLexer::new()
三、遇到的问题
- Shift/Reduce 错误
Shift/Reduce conflicts:
State 619: Shift("TEXT_STRING") / Reduce(literal: "text_literal")
%nonassoc LOWER_THEN_ELSE
%nonassoc ELSE
stmt:
IF expr stmt %prec LOWER_THEN_ELSE
| IF expr stmt ELSE stmt
literal -> String:
text_literal
{ }
| NUM_literal
{ }
...
text_literal -> String:
'TEXT_STRING' {}
| 'NCHAR_STRING' {}
| text_literal 'TEXT_STRING' {}
...

%nonassoc 'LOWER_THEN_TEXT_STRING'
%nonassoc 'TEXT_STRING'
literal -> String:
text_literal %prec 'LOWER_THEN_TEXT_STRING'
{ }
| NUM_literal
{ }
...
text_literal -> String:
'TEXT_STRING' {}
| 'NCHAR_STRING' {}
| text_literal 'TEXT_STRING' {}
...
- SQL 包含中文问题
四、优化
- 在空跑解析(测试代码见附录),不执行 action 的情况下,性能如下:
[[email protected] examples]$ time ./parser
real 0m4.788s
user 0m4.781s
sys 0m0.002s


__GRM_DATA__STABLE_DATAgrmstable
- 再分析,每次解析的时候,都会初始化一个 actions 的数组,随着 grammar 中语法规则的增多,actions 的数组也会随之增大,且数组元素类型是 dyn trait 的引用,在运行时是有开销的。
::std::vec![&__gt_wrapper_0,
&__gt_wrapper_1,
&__gt_wrapper_2,
...
]
match idx {
0 => __gt_wrapper_0(),
1 => __gt_wrapper_1(),
2 => __gt_wrapper_2(),
....
}



[[email protected] examples]$ time ./parser
real 0m2.677s
user 0m2.667s
sys 0m0.007s
五、总结
附录
let input = "select id, name from t where id = ?;"
let p = parser::Parser::new();
for _ in 0..1_000_000
{
let _ = p.parse(input);
}
边栏推荐
- The second part of the travel notes of C (Part II) structural thinking: Zen is stable; all four advocate structure
- Acwing game 57
- 【业务安全-01】业务安全概述及测试流程
- 易周金融 | Q1手机银行活跃用户规模6.5亿;理财子公司布局新兴领域
- Bidding announcement: Oracle all-in-one machine software and hardware maintenance project of Shanghai R & D Public Service Platform Management Center
- Multithreading Basics (III)
- 【业务安全-02】业务数据安全测试及商品订购数量篡改实例
- [an Xun cup 2019]attack
- Calcul de la confidentialité Fate - Prévisions hors ligne
- Redis master-slave replication, sentinel mode, cluster cluster
猜你喜欢
![[安洵杯 2019]Attack](/img/1a/3e82a54cfcb90ebafebeaa8ee1ec01.png)
[安洵杯 2019]Attack

初识云原生安全:云时代的最佳保障

以前国产手机高傲定价扬言消费者爱买不买,现在猛降两千求售

Redis master-slave replication, sentinel mode, cluster cluster

How to solve the problem of missing language bar in win10 system

Integration of entry-level SSM framework based on XML configuration file

基于 xml 配置文件的入门级 SSM 框架整合
![[XMAN2018排位赛]通行证](/img/eb/7bf04941a96e9522e2b93859266cf2.png)
[XMAN2018排位赛]通行证

High efficiency exponentiation

Leetcode 724. 寻找数组的中心下标(可以,一次过)
随机推荐
同花顺能开户炒股吗?安全吗?
Kyndryl partnered with Oracle and Veritas
事务的四大特性
美国芯片再遭重击,继Intel后又一家芯片企业将被中国芯片超越
初识云原生安全:云时代的最佳保障
Implementing springboard agent through SSH port forwarding configuration
【微服务|Sentinel】热点规则|授权规则|集群流控|机器列表
Resolve activity startup - lifecycle Perspective
类模板中可变参的逐步展开
为什么 Oracle 云客户必须在Oracle Cloud 季度更新发布后自行测试?
Step by step expansion of variable parameters in class templates
R language objects are stored in JSON
What is the difference between the FAT32 and NTFS formats on the USB flash disk
解析Activity启动-生命周期角度
【业务安全-02】业务数据安全测试及商品订购数量篡改实例
Practice of constructing ten billion relationship knowledge map based on Nebula graph
Daily 3 questions (1): find the nearest point with the same X or Y coordinate
Gaode map IP positioning 2.0 backup
关于接口测试自动化的总结与思考
简析国内外电商的区别