当前位置:网站首页>Sparkshuffle process and Mr shuffle process
Sparkshuffle process and Mr shuffle process
2022-07-06 21:44:00 【Big data Xiaochen】
SparkShuffle
Spark1.2 In the version 【HashShuffle】
Spark1.2 In later versions 【sortShuffle】
MR Of shuffle
Spark Of shuffle
HashshuffleManager( Abandon )
Unoptimized HashShuffle, Number of small files in the middle =【 The upstream task Number 】*【 The downstream task Number 】, To many, many
groupby and join Can trigger shuffle. The figure below shows the original join situation , Simplified to groupby Aggregation
The optimized HashShuffle, Number of small files in the middle =【Executor The number of 】*【 The downstream task The number of 】, The number has decreased exponentially .
groupby and join Can trigger shuffle.
sortshuffleManager( choose )
Common mechanisms ( Need to sort )
1- Define the data structure : If it is reduceByKey This kind of aggregate class shuffle operator , Then I will choose 【Map】 data structure , If it is join such shuffle operator , Then I will choose 【Array】 data structure
2- Requested memory = Current data memory *2- Last memory condition
3- Sort : Before overflowing to disk file , According to key The existing data in the memory data structure is 【 Sort 】.
4- Overflow disk : After sorting , Data will be written to disk files in batches .* default batch The number is 10000 strip , in other words , Sorted data , With every batch of 1 Ten thousand pieces of data are written to disk files in batches .*
5- Merge :* In the file start offset And end offset.* Indicates the file index
ByPass Mechanism ( There is no need to sort )
When shuffle write task Less than or equal to 【spark.shuffle.sort.bypassMergeThreshold】 The value of the parameter ( The default is 【200】)
It can't be 【 with map End aggregated shuffle operator 】.
reduceByKey yes map End aggregate class shuffle operator .
groupBykey No map End aggregate class shuffle operator .
边栏推荐
- 【力扣刷题】32. 最长有效括号
- In JS, string and array are converted to each other (I) -- the method of converting string into array
- Uni app app half screen continuous code scanning
- 红杉中国,刚刚募资90亿美元
- Why do job hopping take more than promotion?
- Nodejs教程之Expressjs一篇文章快速入门
- MySQL - 事务(Transaction)详解
- WEB功能测试说明
- NPM run dev start project error document is not defined
- string的底层实现
猜你喜欢
缓存更新策略概览(Caching Strategies Overview)
Vit paper details
红杉中国,刚刚募资90亿美元
JS method to stop foreach
Internet News: Geely officially acquired Meizu; Intensive insulin purchase was fully implemented in 31 provinces
Summary of cross partition scheme
【力扣刷题】32. 最长有效括号
C# 如何在dataGridView里设置两个列comboboxcolumn绑定级联事件的一个二级联动效果
一行代码可以做些什么?
【滑动窗口】第九届蓝桥杯省赛B组:日志统计
随机推荐
[Li Kou brush questions] 32 Longest valid bracket
Microsoft technology empowerment position - February course Preview
@GetMapping、@PostMapping 和 @RequestMapping详细区别附实战代码(全)
C# 如何在dataGridView里设置两个列comboboxcolumn绑定级联事件的一个二级联动效果
Torch Cookbook
document. Usage of write () - write text - modify style and position control
语谱图怎么看
Shake Sound poussera l'application indépendante de plantation d'herbe "louable", les octets ne peuvent pas oublier le petit livre rouge?
The underlying implementation of string
Nodejs教程之让我们用 typescript 创建你的第一个 expressjs 应用程序
Reinforcement learning - learning notes 5 | alphago
Redistemplate common collection instructions opsforhash (IV)
The underlying implementation of string
VIM basic configuration and frequently used commands
在Pi和Jetson nano上运行深度网络,程序被Killed
Happy sound 2[sing.2]
JS method to stop foreach
Explain ESM module and commonjs module in simple terms
string的底层实现
Redistemplate common collection instructions opsforset (V)