当前位置:网站首页>Rational selection of (Spark Tuning ~) operator
Rational selection of (Spark Tuning ~) operator
2022-07-27 00:58:00 【A photographer who can't play is not a good programmer】
1.map And mappartition
1.map It's right RDD Each element in acts on a function
2.mappartition Is to apply a function to each partition
If you need to write the data to the database , Be sure to use mappartition
2.foreach And foreachpartition
Be similar to map And mappartition
The difference is that :foreach It's an action operator ,map It's a transformation operator
3.groupByKey And reduceByKey
1.groupByKey
All the data has been shuffle.
2.reduceByKey
Will be first map Make a local aggregation at the end , Then aggregate the data shuffle operation (map End prepolymerization )
( This method is preferred )
4.collect operator
The data of the execution result is all in an array of Van Gogh ( It can lead to OOM) Use with caution !
5.coalesce And repartition
The function of both is to change the number of partitions
1.coalesce operator
When the number of partitions is reduced, there will be no shuffle,(data.coalesce(1))
When the number of partitions exceeds the default , There will be shuffle
It is generally used in multi partition and less partition
2.repartition operator
repartition Operator bottom call coalesce(shuffle = true), There will be shuffle
边栏推荐
- 2022.7.14DAY605
- [watevrCTF-2019]Cookie Store
- JSCORE day_01(6.30) RegExp 、 Function
- [HITCON 2017]SSRFme
- Only hard work, hard work and hard work are the only way out C - patient entity class
- [b01lers2020]Welcome to Earth
- [红明谷CTF 2021]write_shell
- Flink1.11 Jdcb方式写mysql测试用例
- [CISCN2019 总决赛 Day2 Web1]Easyweb
- [SQL注入] 报错注入
猜你喜欢

DOM day_ 01 (7.7) introduction and core operation of DOM

基于Flink实时计算Demo—关于用户行为的数据分析
![[SQL注入] 联合查询](/img/82/37008a1ecb4bb37bea42443dbb9be6.png)
[SQL注入] 联合查询

(Spark调优~)算子的合理选择
![[WUSTCTF2020]CV Maker](/img/64/06023938e83acc832f06733b6c4d63.png)
[WUSTCTF2020]CV Maker
![[watevrCTF-2019]Cookie Store](/img/24/8baaa1ac9daa62c641472d5efac895.png)
[watevrCTF-2019]Cookie Store

CUDA version difference between NVIDIA SMI and nvcc -v

基于Flink实时计算Demo:用户行为分析(四:在一段时间内到底有多少不同的用户访问了网站(UV))
![[CISCN2019 华东南赛区]Double Secret](/img/51/9597968ff1747a67e10a70b785ee9f.png)
[CISCN2019 华东南赛区]Double Secret

基于Flink实时项目:用户行为分析(三:网站总浏览量统计(PV))
随机推荐
重学JSON.stringify
5_ Linear regression
[ciscn2019 finals Day2 web1]easyweb
The detailed process of reinstalling AutoCAD after uninstallation and deleting the registry
Redisson 工作原理-源码分析
Flink中的状态管理
SSRF explanation and burp automatic detection SSRF
[b01lers2020]Welcome to Earth
(Spark调优~)算子的合理选择
[RootersCTF2019]I_< 3_ Flask
2022.7.9DAY601
[CISCN2019 华北赛区 Day1 Web5]CyberPunk
Checked status in El checkbox 2021-08-02
flinksql 窗口提前触发
啊啊啊啊啊啊啊a
MYSQL分表DDL操作(存储过程)
C # conversion of basic data types for entry
JSCORE day_03(7.4)
Export and import in ES6
Canal 安装