当前位置:网站首页>Spark Sql之union
Spark Sql之union
2022-08-01 23:23:00 【南风知我意丿】
spark union和hive union的区别
spark中data frame 有union和union all算子,均不去重
这点,不像hive中那样,hive sql中union all不去重,union去重
示例
val df3: DataFrame = sc.makeRDD(Seq((1, "xm"), (2, "xl"))).toDF("id", "name")
val df4: DataFrame = sc.makeRDD(Seq((1, "xm"), (2, "xl"), (3, "xw"))).toDF("id", "name")
df3.union(df4).show(false)
+---+----+
|id |name|
+---+----+
|1 |xm |
|2 |xl |
|1 |xm |
|2 |xl |
|3 |xw |
+---+----+
df3.unionAll(df4).show(false)
+---+----+
|id |name|
+---+----+
|1 |xm |
|2 |xl |
|1 |xm |
|2 |xl |
|3 |xw |
+---+----+
如果想达到hive中的效果,可以使用distinct算子
df3.union(df4).distinct().show(false)
+---+----+
|id |name|
+---+----+
|1 |xm |
|3 |xw |
|2 |xl |
+---+----+
边栏推荐
猜你喜欢
域名重定向工具 —— SwitchHosts 实用教程
[Camp Experience Post] 2022 Cybersecurity Summer Camp
PDF转Word有那么难吗?做一个文件转换器,都解决了
cmd指令
中职网络安全竞赛B7比赛部署流程
分享10套开源免费的高品质源码,免费源码下载平台
Deep Learning Fundamentals - Numpy-based Recurrent Neural Network (RNN) implementation and backpropagation training
解决端口占用
ROS2初级知识(8):Launching启动多节点
Thesis understanding [RL - Exp Replay] - Experience Replay with Likelihood-free Importance Weights
随机推荐
欧拉路径与欧拉回路
PostgreSQL 基础--常用命令
6134. Find the closest node to the given two nodes - force double hundred code
系统可用性:SRE口中的3个9,4个9...到底是个什么东西?
Create virtual environments with virtualenv and Virtualenvwrapper virtual environment management tools
[Recommended books] The first self-driving technology book
Three, mysql storage engine - building database and table operation
Always use "noopener" or "noreferrer" for links that open in a new tab
excel split text into different rows
How to better understand and do a good job?
[Camp Experience Post] 2022 Cybersecurity Summer Camp
软技能之UML图
基于JAX的激活函数、softmax函数和交叉熵函数
problem solved
6133. 分组的最大数量
Avoid hidden text when loading fonts
E - Integer Sequence Fair
npm npm
Building a cloud-native DevOps environment
Secondary Vocational Network Security Competition B7 Competition Deployment Process