当前位置:网站首页>sparksql源码系列 | ResolveReferences规则count(*)详解
sparksql源码系列 | ResolveReferences规则count(*)详解
2022-06-09 21:54:00 【数据仓库践行者】
本文基于spark 3.2
这篇文章是做上次源码调试分享上留的一个作业题
1、select * from TESTDATA2,分析一下【*】的情况,看看是怎么把【*】转化为对应字段的。匹配ResolveReferences中的这段代码:case p: Project if containsStar(p.projectList) => p.copy(projectList = buildExpandedProjectList(p.projectList, p.child)) |
|---|
sql:
select * from testdata2对应astTree:
unresolved logical plan 、resolved Logical Plan 以及这中间用到的规则:
生成resolved Logical Plan用的所有规则一览
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [testdata2], [], false
//*********************** 规则1************************
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
'Project [*]
+- SubqueryAlias testdata2
+- View (`testData2`, [a#3,b#4])
+- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4]
+- ExternalRDD [obj#2]
//*********************** 规则2************************
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
Project [a#3, b#4]
+- SubqueryAlias testdata2
+- View (`testData2`, [a#3,b#4])
+- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4]
+- ExternalRDD [obj#2]
== Analyzed Logical Plan ==
Project [a#3, b#4]
+- SubqueryAlias testdata2
+- View (`testData2`, [a#3,b#4])
+- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#3, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#4]
+- ExternalRDD [obj#2]源码过程分析
主要看Project [*] 是怎么转化为 Project [a#3, b#4] 的,ResolveReferences 规则的作用在源码共读分享上说过了:
主要是把 UnresolvedAttribute 替换为AttributeReference
从代码,可以看到,把*展开,用的是buildExpandedProjectList方法:
【*】是UnresolvedStar,UnresolvedStar是Star的子类:
所以,会走第一个case,expand方法,而expand最终调用了UnresolvedStar 的 expand 方法:
我们来debug康康input.output 里面有啥:
这里的input是SubqueryAlias节点,output方法,实际上就是遵循逻辑执行计划的output方法,这个在上一次的源码共读分享中很详细的讲过了:
最后,总结一下:output每一步都是根据底部已经resloved的Attribute来给顶部的Attribute赋值,从而保证两个Attribute是指向同一个。
边栏推荐
猜你喜欢

Spider PI intelligent vision hexapod robot color recognition function 0603

尽一份孝心,为家人做一个老人防摔报警系统

Spider PI intelligent vision hexapod robot in direct connection mode 0603

86.(leaflet之家)leaflet军事标绘-直线箭头采集

Intelligent prevention and control of safety production risk at construction site in flood season

Audio 3A processing practice makes your application more "pleasant"

GameFi新的启程,AQUANEE将于6.9日登陆Gate以及BitMart

AVL树的旋转

What is the "big safety" industry? Digital empowerment and great safety industry development

稍微复杂的查询
随机推荐
Calculation of C language test question 163 the day of a certain day is the day of the corresponding year, and the total number of days in the year; Calculates the number of days between two dates. Th
建筑工地数字化监管和科学战疫的智慧力量
体系化目标一健身合辑
【翻译论文】A Progressive Morphological Filter for Removing Nonground Measurements From Airborne LIDAR Dat
Début de la production de sécurité et prévention et contrôle des épidémies
The fourth paradigm chenyuqiang: the next generation technology of enterprise intelligent decision-making "reinforcement learning + environmental learning"
Spider PI intelligent vision hexapod robot color recognition function 0603
node. JS connecting sqlserver encapsulating MSSQL
Light detection and ranging (LIDAR)
Day6-t1345 & T39 -2022-01-21-not answer by yourself
Bonner visual lens lcf08lk1f
Aquanee will land in gate and bitmart in the near future, providing a good opportunity for low-level layout
汛期建筑施工现场安全生产风险智能防控
Multiplexing IO
Campus Ruijie router User Guide
Ble link layer air packet format
Day5-t2029 & T39 -2022-01-20-not answer by yourself
AVL树的旋转
【BP预测】基于Adaboost的BP神经网络实现数据回归预测附matlab代码
浅谈倍增法求解LCA