当前位置:网站首页>HQL statement execution process
HQL statement execution process
2022-08-05 05:25:00 【value growth】
HQL statement execution process:
- Syntax parsing: Antlr defines the grammatical rules of SQL, completes SQL lexical, grammatical parsing, and converts SQL into an abstract syntax tree AST Tree;
- Semantic parsing: traverse the AST Tree and abstract the basic unit of query QueryBlock;
- Generate a logical execution plan: traverse QueryBlock and translate it into an execution operation tree OperatorTree;
- Optimize the logic execution plan: The logic layer optimizer performs OperatorTree transformation, merges unnecessary ReduceSinkOperators, and reduces the amount of shuffle data;
- Generate physical execution plan: traverse OperatorTree and translate into MapReduce tasks;
- Optimize the physical execution plan: The physical layer optimizer transforms MapReduce tasks to generate the final execution plan.
How is HQL resolved into MR job?
Hive uses Antlr to implement syntax parsing. According to the SQL parsing rules formulated by Antlr, complete the lexical/syntax parsing of SQL statements, and convert SQL into abstract syntax tree AST.
Traverse the AST to generate the basic query unit QueryBlock. QueryBlock is the most basic unit of SQL, including three parts: input source, calculation process, and output.
Traverse QueryBlock to generate OperatorTree. The MapReduce task finally generated by Hive consists of OperatorTree in both Map and Reduce phases.Operator is to complete a single specific operation in the Map phase or the Reduce phase.QueryBlock generates Operator Tree by traversing the attributes of the saved syntax of the QB and QBParseInfo objects generated in the previous process.
**Optimize OperatorTree.**Most logic layer optimizers achieve the purpose of reducing MapReduce Job and shuffle data volume by transforming OperatorTree and merging operators
OperatorTree generates MapReduce Job. Traverse OperatorTree and translate into MR tasks.
- Generate MoveTask for output table
- Depth-first traversal from one of the root nodes of OperatorTree down
- ReduceSinkOperator marks the boundaries of Map/Reduce, the boundaries between multiple jobs
- Traverse other root nodes and encounter JoinOperator to merge MapReduceTask
- Generate StatTask update metadata
- Sever the operator relationship between Map and Reduce
Optimize the task. Use the physical optimizer to optimize the MR task to generate the final execution task
[HIVE] sqlStatement converted into mapreduce - Mr.Ming2 - Blog Park
边栏推荐
猜你喜欢
随机推荐
【过一下 17】pytorch 改写 keras
【解码工具】Bitcoin的一些在线工具
Pycharm中使用pip安装第三方库安装失败:“Non-zero exit code (2)“的解决方法
UVA10827
vscode+pytorch使用经验记录(个人记录+不定时更新)
[Go through 3] Convolution & Image Noise & Edge & Texture
coppercam入门手册[6]
有用番茄来监督自己的同道中人吗?加一下我的自习室,一起加油
[cesium] 3D Tileset model is loaded and associated with the model tree
Flex layout frog game clearance strategy
What are the characteristics of the interface of the physical layer?What does each contain?
number_gets the specified number of decimals
Redux
Geek卸载工具
[Software Exam System Architect] Software Architecture Design ③ Domain-Specific Software Architecture (DSSA)
day6-列表作业
小白一枚各位大牛轻虐虐
【过一下11】随机森林和特征工程
jvm three heap and stack
UVA10827