当前位置:网站首页>Introduction to partition operators, broadcast variables and accumulators of 32 spark
Introduction to partition operators, broadcast variables and accumulators of 32 spark
2022-07-23 10:36:00 【Portrait people under big data】
17.11 operator ( Partition )️
17.11.1 Conversion operator
mapPartitionsWithIndex
- Be similar to mapPartitions, In addition, it also carries the index value of the partition
repartition
- Add or subtract partitions . This operator produces shuffle
coalesce
- coalesce Often used to reduce partitions , The second parameter in the operator is whether the partition reduction process produces shuffle .
- true In order to produce shuffle , false Do not produce shuffle . The default is false .
- If coalesce Set the number of partitions than the original RDD If the number of partitions is more , The second parameter is set to false It won't work ( The number of partitions after conversion is greater than that before ), If I set it to true , Effect and repartition equally .
repartition(numPartitions) = coalesce(numPartitions,true)groupByKey
- It works on K,V Format RDD On . according to Key Grouping . It works on (K,V) , return
(KIterable<V>) - groupByKey and reduceByKey difference
- reduceByKey Is a grouping aggregation class operator , stay Map The end turns on aggregation by default , And the aggregation logic must be consistent with Reduce End consistent , That is, the aggregate function passed in by f Appoint ;
- groupByKey Is a grouping collection class operator , stay Map End will not produce combine() , Just put the same key The data are collected together , Will not receive similar f Function parameters of
- It works on K,V Format RDD On . according to Key Grouping . It works on (K,V) , return
zip
- Put two RDD The elements in ( KV Format / Not KV Format ) To become a KV Format RDD , Two RDD The number of must be the same , At the same time, the number of partitions must also be the same .
zipWithIndex
- This function will RDD The element in and the element in RDD Index number in ( from 0 Start ) Combine into (K,V) Yes
17.11.2 Action operator
- countByKey
- Effect to K,V Format RDD On , according to Key The count is the same Key Data set elements of .
- countByValue
- Count according to the same content of each element in the dataset . Returns the number of elements with the same content .
- reduce
- Aggregate each element in the dataset according to the aggregation logic
17.12 Case answer ️
17.12.1 PV&UV
17.12.2 Two order
17.12.3 Take... In groups topN
17.13 Broadcast variables and accumulators ️
17.13.1 Broadcast variables
The illustration :

The use of broadcast variables
val conf = new SparkConf() conf.setMaster("local").setAppName("brocast") val sc = new SparkContext(conf) val list = List("hello yjx") val broadCast = sc.broadcast(list) val lineRDD = sc.textFile("./words.txt") lineRDD.filter { x => broadCast.value.contains(x) }.foreach { println} sc.stop()matters needing attention
- Broadcast variables can only be Driver End definition , Can't be in Executor End definition .
- stay Driver You can change the value of the broadcast variable , stay Executor End can't modify the value of broadcast variable
17.13.2 accumulator
The illustration :

The use of accumulators
val conf = new SparkConf() conf.setMaster("local").setAppName("accumulator") val sc = new SparkContext(conf) val accumulator = sc.longAccumulator sc.textFile("./words.txt").foreach { x =>{ accumulator.add(1)}} println(accumulator.value) sc.stop()matters needing attention
- Accumulator in Driver Initial value assigned to end definition , Accumulator can only be Driver End read , stay Executor End update
边栏推荐
- NFT数字藏品版权如何保护?
- redis伪集群一键部署脚本---亲测可用
- 两个海量数据的同构表,如何查询数据差异
- What is the difference between College coder and 985 programmer?
- [learning notes] agc022
- Registration tree mode
- 【Delphi】制作控件面板安装图标的简单方法(译)
- 32 < tag array and bit operation > supplement: Lt. sword finger offer 56 - I. number of occurrences of numbers in the array
- 振奋人心 元宇宙!下一代互联网的财富风口
- How switch statements work
猜你喜欢

Redis pseudo cluster one click deployment script - pro test available

Rapid SQL All-Platforms高性能 SQL 代码

Flask学习笔记

千亿营收之后,阿里云生态有了新打法

Redis transaction - detailed implementation process of seckill case simulation

32-spark的分区算子介绍、广播变量和累加器

Sonar中如何删除一个项目

比你老师详细系列————结构体

禅道的甘特图功能是什么

Seektiger's okaleido has a big move. Will the STI of ecological pass break out?
随机推荐
Redis安装
千亿营收之后,阿里云生态有了新打法
Data warehouse: workflow design and Optimization Practice
Kingbasees SQL language reference manual of Jincang database (8. Function (2))
7. < tag dynamic programming and stock trading Collection> lt.121. The best time to buy and sell stocks + lt.122. The best time to buy and sell stocks II + lt.123. The best time to buy and sell stocks
PyQt5_pyqtgraph鼠标在折线图上画线段
Cache penetration, cache breakdown, cache avalanche
SAP 批导模板(WBS批导为例)
Is there a fraud in opening an account with Huatai Securities? Is it safe
Cs5266+ma8621 do the scheme design of typec to hdmi+pd+u3+2u+sd/tf seven in one expansion dock | cs5266 multi port expansion dock pcb+ schematic diagram reference
Sonar中如何删除一个项目
数仓:工作流的设计以及优化实践
仅用5000行代码,在V853上AI渲染出一亿幅山水画
chrome selenium 用默认profile 不必每次清空
Special training - linked list
什么是文件管理软件?你为什么需要它?
LeetCode每日一题(1946. Largest Number After Mutating Substring)
缓存穿透、缓存击穿、缓存雪崩
More detailed series than your teacher -- structure
[c #] IEnumerable enumerable type interface analysis yield