当前位置:网站首页>Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
2022-07-27 03:24:00 【One's cow】
Calculate the same in different partitions key Average value .
aggregateByKey Realization 、combineByKey Realization .
aggregateByKey
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_aggregateByKey {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——key-value——aggregateByKey
// Calculate the same in different partitions key Average value
val rdd = sc.makeRDD(List(
("A", 1), ("B", 2), ("C", 3), ("D", 4), ("A", 5), ("B", 6), ("C", 7), ("A", 8)
), 2)
val newRDD = rdd.aggregateByKey((0, 0))(
(t, v) => {
(t._1 + v, t._2 + 1)
},
(t1, t2) => {
(t1._1 + t2._1, t1._2 + t2._2)
}
)
val RDD1 = newRDD.mapValues {
case (num, c) => {
num / c
}
}
RDD1.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}
combineByKey
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_combineByKey {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——key-value——combineByKey
// Calculate the same in different partitions key Average value
val rdd = sc.makeRDD(List(
("A", 1), ("B", 2), ("C", 3), ("D", 4), ("A", 5), ("B", 6), ("C", 7), ("A", 8)
), 2)
// first : Will be the same key The first data structure transformation
// the second : Calculation rules within a partition
// Third : Calculation rules between partitions
val newRDD = rdd.combineByKey(
v => (v, 1),
(t: (Int, Int), v) => {
(t._1 + v, t._2 + 1)
},
(t1: (Int, Int), t2: (Int, Int)) => {
(t1._1 + t2._1, t1._2 + t2._2)
}
)
val RDD1 = newRDD.mapValues {
case (num, c) => {
num / c
}
}
RDD1.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}
边栏推荐
- 1.28亿美元!芬兰量子计算公司IQM获世界基金支持
- Oracle有没有分布式数据库?
- 深度学习——词汇embedded、Beam Search
- 周全的照护 解析LYRIQ锐歌电池安全设计
- Technology vane | interpretation of cloud native technology architecture maturity model
- Database usage security policy
- PIP3 setting alicloud
- 【flask】服务端获取客户端请求的文件
- Common events of window objects
- 数据库红的表如何设计才能使性能更加优化
猜你喜欢

【学习笔记之菜Dog学C】字符串+内存函数

关于OpenFeign的源码分析

Single case mode (double check lock)

Message rejected MQ

redis秒杀案例,跟着b站尚硅谷老师学习

The diagram of user login verification process is well written!

数模1232

185. All employees with the top three highest wages in the Department (mandatory)

在线问题反馈模块实战(十五):实现在线更新反馈状态功能

Graphic SQL, this is too vivid!
随机推荐
【1206. 设计跳表】
渗透测试-后渗透-痕迹清理
shell awk
深入理解Mysql索引底层数据结构与算法
DNS记录类型及相关名词解释
Single case mode (double check lock)
将幕布文章OPML转换为Markdown的方法
优炫数据库集群如何唯一标识一条用户SQL
redis秒杀案例,跟着b站尚硅谷老师学习
【常用搜索问题】111
[flask] the server obtains the request header information of the client
太强了,一个注解搞定接口返回数据脱敏
“date: write error: No space left on device”解决
impala 执行计划详解
客户案例 | 关注老年用户体验,银行APP适老化改造要避虚就实
《稻盛和夫给年轻人的忠告》阅读笔记
spark:计算不同分区中相同key的平均值(入门级-简单实现)
力扣(LeetCode)207. 课程表(2022.07.26)
It's too strong. An annotation handles the data desensitization returned by the interface
Safe-arc/warner power supply maintenance xenon lamp power supply maintenance analysis