当前位置：网站首页>Aggregate function with key in spark

Aggregate function with key in spark

2022-07-06 21:43:00 【Big data Xiaochen】

RDD Every element of is 【 Key value pair 】 To call the following functions .
groupByKey
aggregateByKey
- rdd = sc.parallelize([('a', 1), ('b', 1), ('a', 1), ('b', 1), ('a', 1)], 2)
- When aggregating in the following partition , The initial value will participate in the calculation , When aggregating between partitions , The initial value will not participate in the calculation .

foldByKey
- foldByKey By aggregateByKey Simplify
- When aggregateByKey The logic of aggregation functions within and between partitions of is the same , It can be omitted as a , It becomes foldByKey.
reduceByKey
- reduceByKey By foldByKey Simplify
- When foldByKey When the initial value of is meaningless , You can omit it

版权声明
本文为[Big data Xiaochen]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202131122070763.html

边栏推荐

猜你喜欢

随机推荐