当前位置:网站首页>Sparkshuffle process and Mr shuffle process
Sparkshuffle process and Mr shuffle process
2022-07-06 21:44:00 【Big data Xiaochen】
SparkShuffle
Spark1.2 In the version 【HashShuffle】
Spark1.2 In later versions 【sortShuffle】
MR Of shuffle
Spark Of shuffle
HashshuffleManager( Abandon )
Unoptimized HashShuffle, Number of small files in the middle =【 The upstream task Number 】*【 The downstream task Number 】, To many, many
groupby and join Can trigger shuffle. The figure below shows the original join situation , Simplified to groupby Aggregation
The optimized HashShuffle, Number of small files in the middle =【Executor The number of 】*【 The downstream task The number of 】, The number has decreased exponentially .
groupby and join Can trigger shuffle.
sortshuffleManager( choose )
Common mechanisms ( Need to sort )
1- Define the data structure : If it is reduceByKey This kind of aggregate class shuffle operator , Then I will choose 【Map】 data structure , If it is join such shuffle operator , Then I will choose 【Array】 data structure
2- Requested memory = Current data memory *2- Last memory condition
3- Sort : Before overflowing to disk file , According to key The existing data in the memory data structure is 【 Sort 】.
4- Overflow disk : After sorting , Data will be written to disk files in batches .* default batch The number is 10000 strip , in other words , Sorted data , With every batch of 1 Ten thousand pieces of data are written to disk files in batches .*
5- Merge :* In the file start offset And end offset.* Indicates the file index
ByPass Mechanism ( There is no need to sort )
When shuffle write task Less than or equal to 【spark.shuffle.sort.bypassMergeThreshold】 The value of the parameter ( The default is 【200】)
It can't be 【 with map End aggregated shuffle operator 】.
reduceByKey yes map End aggregate class shuffle operator .
groupBykey No map End aggregate class shuffle operator .
边栏推荐
- The underlying implementation of string
- [go][转载]vscode配置完go跑个helloworld例子
- Dialogue with Jia Yangqing, vice president of Alibaba: pursuing a big model is not a bad thing
- 中国白酒的5场大战
- Technology sharing | packet capturing analysis TCP protocol
- JPEG2000-Matlab源码实现
- Nodejs tutorial let's create your first expressjs application with typescript
- 一行代码可以做些什么?
- Divide candy
- document. Usage of write () - write text - modify style and position control
猜你喜欢
OneNote in-depth evaluation: using resources, plug-ins, templates
Happy sound 2[sing.2]
抖音將推獨立種草App“可頌”,字節忘不掉小紅書?
对话阿里巴巴副总裁贾扬清:追求大模型,并不是一件坏事
Numpy download and installation
Sequoia China, just raised $9billion
[Li Kou brushing questions] one dimensional dynamic planning record (53 change exchanges, 300 longest increasing subsequence, 53 largest subarray and)
Z function (extended KMP)
爬虫实战(五):爬豆瓣top250
guava:Collections.unmodifiableXXX创建的collection并不immutable
随机推荐
Forward maximum matching method
[go][reprint]vscode run a HelloWorld example after configuring go
Redistemplate common collection instructions opsforzset (VI)
互联网快讯:吉利正式收购魅族;胰岛素集采在31省全面落地
抖音將推獨立種草App“可頌”,字節忘不掉小紅書?
在Pi和Jetson nano上运行深度网络,程序被Killed
袁小林:安全不只是标准,更是沃尔沃不变的信仰和追求
HMS core machine learning service creates a new "sound" state of simultaneous interpreting translation, and AI makes international exchanges smoother
Yuan Xiaolin: safety is not only a standard, but also Volvo's unchanging belief and pursuit
抖音将推独立种草App“可颂”,字节忘不掉小红书?
C语言:#if、#def和#ifndef综合应用
Torch Cookbook
Michael smashed the minority milk sign
OneNote in-depth evaluation: using resources, plug-ins, templates
string的底层实现
Efficiency tool +wps check box shows the solution to the sun problem
@GetMapping、@PostMapping 和 @RequestMapping详细区别附实战代码(全)
MySQL removes duplicates according to two fields
It's not my boast. You haven't used this fairy idea plug-in!
[redis design and implementation] part I: summary of redis data structure and objects