当前位置:网站首页>Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
2022-07-27 03:24:00 【One's cow】
Calculate the same in different partitions key Average value .
aggregateByKey Realization 、combineByKey Realization .
aggregateByKey
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_aggregateByKey {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——key-value——aggregateByKey
// Calculate the same in different partitions key Average value
val rdd = sc.makeRDD(List(
("A", 1), ("B", 2), ("C", 3), ("D", 4), ("A", 5), ("B", 6), ("C", 7), ("A", 8)
), 2)
val newRDD = rdd.aggregateByKey((0, 0))(
(t, v) => {
(t._1 + v, t._2 + 1)
},
(t1, t2) => {
(t1._1 + t2._1, t1._2 + t2._2)
}
)
val RDD1 = newRDD.mapValues {
case (num, c) => {
num / c
}
}
RDD1.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}
combineByKey
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_combineByKey {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——key-value——combineByKey
// Calculate the same in different partitions key Average value
val rdd = sc.makeRDD(List(
("A", 1), ("B", 2), ("C", 3), ("D", 4), ("A", 5), ("B", 6), ("C", 7), ("A", 8)
), 2)
// first : Will be the same key The first data structure transformation
// the second : Calculation rules within a partition
// Third : Calculation rules between partitions
val newRDD = rdd.combineByKey(
v => (v, 1),
(t: (Int, Int), v) => {
(t._1 + v, t._2 + 1)
},
(t1: (Int, Int), t2: (Int, Int)) => {
(t1._1 + t2._1, t1._2 + t2._2)
}
)
val RDD1 = newRDD.mapValues {
case (num, c) => {
num / c
}
}
RDD1.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}
边栏推荐
猜你喜欢
随机推荐
水仙花数(DAY 78)
正方形数组的数目(DAY 81)
spark学习笔记(六)——sparkcore核心编程-RDD行动算子
记录一次,php程序访问系统文件访问错误的问题
积分发放带给商家的两个帮助
图解用户登录验证流程,写得太好了!
客户案例 | 关注老年用户体验,银行APP适老化改造要避虚就实
MySQL:互联网公司常用分库分表方案汇总
Abbkine AbFluor 488 细胞凋亡检测试剂盒特点及实验建议
unity游戏,隐私协议最简单解决方案!仅3行代码就搞定!(转载)
[1206. Design skip table]
Best practices of opentelemetry in service grid architecture
C语言const用法详解
How many implementation postures of delay queue? Daily essential skills!
spark学习笔记(四)——sparkcore核心编程-RDD
The diagram of user login verification process is well written!
30分钟彻底弄懂 synchronized 锁升级过程
【flask】服务端获取客户端的请求头信息
延时队列的几种实现姿势?日常必备技能!
【树链剖分】2022杭电多校2 1001 Static Query on Tree








