当前位置:网站首页>Understanding of spark operator aggregatebykey
Understanding of spark operator aggregatebykey
2022-07-28 07:58:00 【hzp666】
Yes spark operator aggregateByKey The understanding of the

Case study
aggregateByKey Operators are actually different “key” Data make a map+reduce Operation of the protocol .
Take a simple piece of code in the production environment
There are some sorted log fields , After processing, we got RDD The type is (String,(String,String)) Of List Format results , Each of them String It stands for :( user name ,( Access time , Access page url))
The same user may visit different or the same pages at different times , In order to merge the access behavior of the same user , Write the following code , be used aggregateByKey.
val data = sc.parallelize(
List(
("13909029812",("20170507","http://www.baidu.com")),("18089376778",("20170401","http://www.google.com")),("18089376778",("20170508","http://www.taobao.com")),("13909029812",("20170507","http://www.51cto.com"))
)
)
data.aggregateByKey(scala.collection.mutable.Set[(String, String)](), 200)((set, item) => {
set += item
}, (set1, set2) => set1 union set2).mapValues(x => x.toIterable).collect
result :
res12: Array[(String, Iterable[(String, String)])] = Array((18089376778,Set((20170401,http://www.google.com), (20170508,http://www.taobao.com))), (13909029812,Set((20170507,http://www.51cto.com), (20170507,http://www.baidu.com))))
Decomposition analysis :##
aggregateByKey( Parameters 1)( Parameters 2, Parameters 3)
The process : about data One of the key, Parameters 1 The initialization value is , In the parameter 2 The function of , Initial value and this key Every one of value Pass in functions to operate , All returned results are in the parameter 3 In .
- Parameters 1
scala.collection.mutable.Set[(String, String)]()
new An empty set aggregate , As the initial value
Parameters 2
(set, item) => {
set += item
}
A similar to map The mapping function of , Will be key Every one of value( In this case ( Access time , visit url)) As item, Put it in the set And in return .
Know a key All of the value Will return a containing the value Of setParameters 3
(set1, set2) => set1 union set2
The key All of the value Got set Conduct union Statute . And back to
final result : Get the access of every user at all times url Behavior information .
original text :https://www.jianshu.com/p/09912beb1350
边栏推荐
猜你喜欢

GD32使用ST的HAL库和GD官方库的一些体会

What is the root cause of EMC's problems?

Basic dictionary of deep learning --- activation function, batch size, normalization

It has been rectified seven times and took half a month. Painful EMC summary

Opencv's practical learning of credit card recognition (4)

EMC整改方法集合

DNA修饰贵金属纳米颗粒|DNA脱氧核糖核酸修饰金属钯Pd纳米颗粒PdNPS-DNA

Don't be afraid of ESD static electricity. This article tells you some solutions

03 | project deployment: how to quickly deploy a website developed based on the laravel framework

Learn software testing in two weeks? I was shocked!
随机推荐
How do we run batch mode in MySQL?
EMC设计攻略 —时钟
MPLS --- 多协议标签交换技术
Delete the nodes in the linked list - daily question
Collector原理解析
Clion debugging redis6 source code
C language explanation series - array explanation, one-dimensional array, two-dimensional array
铜铟硫CuInSe2量子点修饰DNA(脱氧核糖核酸)DNA-CuInSe2QDs(齐岳)
Learn software testing in two weeks? I was shocked!
Tensorflow uses deep learning (II)
Swm32 series tutorial 5-adc application
Elaborate on common mode interference and differential mode interference
Awk from introduction to earth (16) discussion on the types of awk variables -- about the two types of numbers and strings
Copper indium sulfide CuInSe2 quantum dots modified DNA (deoxyribonucleic acid) DNA cuinse2qds (Qiyue)
YOLO系列损失函数详解
细说共模干扰和差模干扰
[dry goods] 32 EMC standard circuits are shared!
Adjust the array order so that odd numbers precede even numbers - two questions per day
Some experience of gd32 using Hal Library of ST and Gd official library
JUC atomic class: CAS, unsafe, CAS shortcomings, how to solve ABA problems in detail