当前位置:网站首页>Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
2022-07-27 03:24:00 【One's cow】
Calculate the same in different partitions key Average value .
aggregateByKey Realization 、combineByKey Realization .
aggregateByKey
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_aggregateByKey {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——key-value——aggregateByKey
// Calculate the same in different partitions key Average value
val rdd = sc.makeRDD(List(
("A", 1), ("B", 2), ("C", 3), ("D", 4), ("A", 5), ("B", 6), ("C", 7), ("A", 8)
), 2)
val newRDD = rdd.aggregateByKey((0, 0))(
(t, v) => {
(t._1 + v, t._2 + 1)
},
(t1, t2) => {
(t1._1 + t2._1, t1._2 + t2._2)
}
)
val RDD1 = newRDD.mapValues {
case (num, c) => {
num / c
}
}
RDD1.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}
combineByKey
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_combineByKey {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——key-value——combineByKey
// Calculate the same in different partitions key Average value
val rdd = sc.makeRDD(List(
("A", 1), ("B", 2), ("C", 3), ("D", 4), ("A", 5), ("B", 6), ("C", 7), ("A", 8)
), 2)
// first : Will be the same key The first data structure transformation
// the second : Calculation rules within a partition
// Third : Calculation rules between partitions
val newRDD = rdd.combineByKey(
v => (v, 1),
(t: (Int, Int), v) => {
(t._1 + v, t._2 + 1)
},
(t1: (Int, Int), t2: (Int, Int)) => {
(t1._1 + t2._1, t1._2 + t2._2)
}
)
val RDD1 = newRDD.mapValues {
case (num, c) => {
num / c
}
}
RDD1.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}
边栏推荐
- Common events of window objects
- Quick sequencing and optimization
- $128million! IQM, a Finnish quantum computing company, was supported by the world fund
- Worthington果胶酶的特性及测定方案
- Pytoch loss function summary
- 【学习笔记之菜Dog学C】字符串+内存函数
- be based on. NETCORE development blog project starblog - (16) some new functions (monitoring / statistics / configuration / initialization)
- Best practices of opentelemetry in service grid architecture
- 国内服务器与海外服务器用1个数据库,怎样可以访问的快?
- [understanding of opportunity -52]: the depth of communication varies from person to person
猜你喜欢

Byte side: can TCP and UDP use the same port?

Bulk copy baby upload prompt garbled, how to solve?

【学习笔记之菜Dog学C】字符串+内存函数

图解 SQL,这也太形象了吧!

Code practice when the queue reaches the maximum length

【1206. 设计跳表】

flask_restful中reqparse解析器继承

be based on. NETCORE development blog project starblog - (16) some new functions (monitoring / statistics / configuration / initialization)

An error in the fourth edition of the red book?

vector 转 svg 方法
随机推荐
太强了,一个注解搞定接口返回数据脱敏
[机缘参悟-52]:交浅言深要因人而异
Explain详解
HCIP第十四天笔记
数据库红的表如何设计才能使性能更加优化
spark学习笔记(四)——sparkcore核心编程-RDD
数据湖(二十):Flink兼容Iceberg目前不足和Iceberg与Hudi对比
Yilingsi T35 FPGA drives LVDS display screen
On the prototype of constructor
PyCharm中Debug模式进行调试详解
Common weak password Encyclopedia
Worthington过氧化物酶活性的6种测定方法
Activiti5.22.0扩展支持达国产数据库,以GBase据库为例
PIP3 setting alicloud
redis秒杀案例,跟着b站尚硅谷老师学习
二叉树(北京邮电大学机试题)(DAY 85)
Role of thread.sleep (0)
spark:地区广告点击量排行统计(小案例)
《稻盛和夫给年轻人的忠告》阅读笔记
[flask] the server obtains the request header information of the client