当前位置:网站首页>HQL statement execution process
HQL statement execution process
2022-08-05 05:25:00 【value growth】
HQL statement execution process:
- Syntax parsing: Antlr defines the grammatical rules of SQL, completes SQL lexical, grammatical parsing, and converts SQL into an abstract syntax tree AST Tree;
- Semantic parsing: traverse the AST Tree and abstract the basic unit of query QueryBlock;
- Generate a logical execution plan: traverse QueryBlock and translate it into an execution operation tree OperatorTree;
- Optimize the logic execution plan: The logic layer optimizer performs OperatorTree transformation, merges unnecessary ReduceSinkOperators, and reduces the amount of shuffle data;
- Generate physical execution plan: traverse OperatorTree and translate into MapReduce tasks;
- Optimize the physical execution plan: The physical layer optimizer transforms MapReduce tasks to generate the final execution plan.
How is HQL resolved into MR job?
Hive uses Antlr to implement syntax parsing. According to the SQL parsing rules formulated by Antlr, complete the lexical/syntax parsing of SQL statements, and convert SQL into abstract syntax tree AST.
Traverse the AST to generate the basic query unit QueryBlock. QueryBlock is the most basic unit of SQL, including three parts: input source, calculation process, and output.
Traverse QueryBlock to generate OperatorTree. The MapReduce task finally generated by Hive consists of OperatorTree in both Map and Reduce phases.Operator is to complete a single specific operation in the Map phase or the Reduce phase.QueryBlock generates Operator Tree by traversing the attributes of the saved syntax of the QB and QBParseInfo objects generated in the previous process.
**Optimize OperatorTree.**Most logic layer optimizers achieve the purpose of reducing MapReduce Job and shuffle data volume by transforming OperatorTree and merging operators
OperatorTree generates MapReduce Job. Traverse OperatorTree and translate into MR tasks.
- Generate MoveTask for output table
- Depth-first traversal from one of the root nodes of OperatorTree down
- ReduceSinkOperator marks the boundaries of Map/Reduce, the boundaries between multiple jobs
- Traverse other root nodes and encounter JoinOperator to merge MapReduceTask
- Generate StatTask update metadata
- Sever the operator relationship between Map and Reduce
Optimize the task. Use the physical optimizer to optimize the MR task to generate the final execution task
[HIVE] sqlStatement converted into mapreduce - Mr.Ming2 - Blog Park
边栏推荐
猜你喜欢
[cesium] element highlighting
【过一下4】09-10_经典网络解析
SQL(一) —— 增删改查
【过一下 17】pytorch 改写 keras
server disk array
【过一下6】机器视觉视频 【过一下2被挤掉了】
Lecture 4 Backpropagation Essays
OFDM 十六讲 5 -Discrete Convolution, ISI and ICI on DMT/OFDM Systems
The underlying mechanism of the class
[Go through 7] Notes from the first section of the fully connected neural network video
随机推荐
Matplotlib(二)—— 子图
Dashboard Display | DataEase Look at China: Data Presents China's Capital Market
物理层的接口有哪几个方面的特性?各包含些什么内容?
[Decoding tools] Some online tools for Bitcoin
Structured Light 3D Reconstruction (2) Line Structured Light 3D Reconstruction
day8字典作业
多线程查询结果,添加List集合
redis事务
机器学习(一) —— 机器学习基础
结构光三维重建(一)条纹结构光三维重建
【过一下9】卷积
【Reading】Long-term update
vscode+pytorch use experience record (personal record + irregular update)
SQL(二) —— join窗口函数视图
Requests the library deployment and common function
【过一下4】09-10_经典网络解析
OFDM Lecture 16 5 -Discrete Convolution, ISI and ICI on DMT/OFDM Systems
序列基础练习题
The difference between span tag and p
位运算符与逻辑运算符的区别