当前位置:网站首页>HQL statement execution process
HQL statement execution process
2022-08-05 05:25:00 【value growth】
HQL statement execution process:
- Syntax parsing: Antlr defines the grammatical rules of SQL, completes SQL lexical, grammatical parsing, and converts SQL into an abstract syntax tree AST Tree;
- Semantic parsing: traverse the AST Tree and abstract the basic unit of query QueryBlock;
- Generate a logical execution plan: traverse QueryBlock and translate it into an execution operation tree OperatorTree;
- Optimize the logic execution plan: The logic layer optimizer performs OperatorTree transformation, merges unnecessary ReduceSinkOperators, and reduces the amount of shuffle data;
- Generate physical execution plan: traverse OperatorTree and translate into MapReduce tasks;
- Optimize the physical execution plan: The physical layer optimizer transforms MapReduce tasks to generate the final execution plan.
How is HQL resolved into MR job?
Hive uses Antlr to implement syntax parsing. According to the SQL parsing rules formulated by Antlr, complete the lexical/syntax parsing of SQL statements, and convert SQL into abstract syntax tree AST.
Traverse the AST to generate the basic query unit QueryBlock. QueryBlock is the most basic unit of SQL, including three parts: input source, calculation process, and output.
Traverse QueryBlock to generate OperatorTree. The MapReduce task finally generated by Hive consists of OperatorTree in both Map and Reduce phases.Operator is to complete a single specific operation in the Map phase or the Reduce phase.QueryBlock generates Operator Tree by traversing the attributes of the saved syntax of the QB and QBParseInfo objects generated in the previous process.
**Optimize OperatorTree.**Most logic layer optimizers achieve the purpose of reducing MapReduce Job and shuffle data volume by transforming OperatorTree and merging operators
OperatorTree generates MapReduce Job. Traverse OperatorTree and translate into MR tasks.
- Generate MoveTask for output table
- Depth-first traversal from one of the root nodes of OperatorTree down
- ReduceSinkOperator marks the boundaries of Map/Reduce, the boundaries between multiple jobs
- Traverse other root nodes and encounter JoinOperator to merge MapReduceTask
- Generate StatTask update metadata
- Sever the operator relationship between Map and Reduce
Optimize the task. Use the physical optimizer to optimize the MR task to generate the final execution task
[HIVE] sqlStatement converted into mapreduce - Mr.Ming2 - Blog Park
边栏推荐
猜你喜欢
随机推荐
Community Sharing|Tencent Overseas Games builds game security operation capabilities based on JumpServer
RL reinforcement learning summary (1)
The difference between span tag and p
day8字典作业
【练一下1】糖尿病遗传风险检测挑战赛 【讯飞开放平台】
Redux
第四讲 反向传播随笔
shell函数
pycharm中调用Matlab配置:No module named ‘matlab.engine‘; ‘matlab‘ is not a package
Basic properties of binary tree + oj problem analysis
重新审视分布式系统:永远不会有完美的一致性方案……
Distributed systems revisited: there will never be a perfect consistency scheme...
npm搭建本地服务器,直接运行build后的目录
Pycharm中使用pip安装第三方库安装失败:“Non-zero exit code (2)“的解决方法
Matplotlib(二)—— 子图
UVA10827
【过一下3】卷积&图像噪音&边缘&纹理
【技能】长期更新
【过一下14】自习室的一天
vscode+pytorch使用经验记录(个人记录+不定时更新)




![coppercam入门手册[6]](/img/d3/a7d44aa19acfb18c5a8cacdc8176e9.png)



