当前位置:网站首页>Sparkshuffle process and Mr shuffle process
Sparkshuffle process and Mr shuffle process
2022-07-06 21:44:00 【Big data Xiaochen】
SparkShuffle
Spark1.2 In the version 【HashShuffle】
Spark1.2 In later versions 【sortShuffle】
MR Of shuffle
Spark Of shuffle

HashshuffleManager( Abandon )
Unoptimized HashShuffle, Number of small files in the middle =【 The upstream task Number 】*【 The downstream task Number 】, To many, many
groupby and join Can trigger shuffle. The figure below shows the original join situation , Simplified to groupby Aggregation
The optimized HashShuffle, Number of small files in the middle =【Executor The number of 】*【 The downstream task The number of 】, The number has decreased exponentially .
groupby and join Can trigger shuffle.

sortshuffleManager( choose )
Common mechanisms ( Need to sort )

1- Define the data structure : If it is reduceByKey This kind of aggregate class shuffle operator , Then I will choose 【Map】 data structure , If it is join such shuffle operator , Then I will choose 【Array】 data structure
2- Requested memory = Current data memory *2- Last memory condition
3- Sort : Before overflowing to disk file , According to key The existing data in the memory data structure is 【 Sort 】.
4- Overflow disk : After sorting , Data will be written to disk files in batches .* default batch The number is 10000 strip , in other words , Sorted data , With every batch of 1 Ten thousand pieces of data are written to disk files in batches .*
5- Merge :* In the file start offset And end offset.* Indicates the file index
ByPass Mechanism ( There is no need to sort )
When shuffle write task Less than or equal to 【spark.shuffle.sort.bypassMergeThreshold】 The value of the parameter ( The default is 【200】)
It can't be 【 with map End aggregated shuffle operator 】.
reduceByKey yes map End aggregate class shuffle operator .
groupBykey No map End aggregate class shuffle operator .
边栏推荐
- VIM basic configuration and frequently used commands
- 50个常用的Numpy函数解释,参数和使用示例
- 技术分享 | 抓包分析 TCP 协议
- OneNote in-depth evaluation: using resources, plug-ins, templates
- Sql: stored procedures and triggers - Notes
- Absolute primes (C language)
- [go][reprint]vscode run a HelloWorld example after configuring go
- First batch selected! Tencent security tianyufeng control has obtained the business security capability certification of the ICT Institute
- The role of applicationmaster in spark on Yan's cluster mode
- Nodejs tutorial expressjs article quick start
猜你喜欢

Tiktok will push the independent grass planting app "praiseworthy". Can't bytes forget the little red book?

1292_FreeROS中vTaskResume()以及xTaskResumeFromISR()的实现分析

【力扣刷题】一维动态规划记录(53零钱兑换、300最长递增子序列、53最大子数组和)

Internet News: Geely officially acquired Meizu; Intensive insulin purchase was fully implemented in 31 provinces

Seven original sins of embedded development

Michael smashed the minority milk sign

抖音将推独立种草App“可颂”,字节忘不掉小红书?
![[interpretation of the paper] machine learning technology for Cataract Classification / classification](/img/0c/b76e59f092c1b534736132faa76de5.png)
[interpretation of the paper] machine learning technology for Cataract Classification / classification

缓存更新策略概览(Caching Strategies Overview)

Shake Sound poussera l'application indépendante de plantation d'herbe "louable", les octets ne peuvent pas oublier le petit livre rouge?
随机推荐
Quick news: the flybook players' conference is held online; Wechat payment launched "education and training service toolbox"
Proxy and reverse proxy
3D face reconstruction: from basic knowledge to recognition / reconstruction methods!
50个常用的Numpy函数解释,参数和使用示例
Internet News: Geely officially acquired Meizu; Intensive insulin purchase was fully implemented in 31 provinces
[in depth learning] pytorch 1.12 was released, officially supporting Apple M1 chip GPU acceleration and repairing many bugs
Sequoia China, just raised $9billion
Guava: use of multiset
The underlying implementation of string
JS learning notes OO create suspicious objects
mysql根据两个字段去重
PostgreSQL modifies the password of the database user
Nodejs教程之Expressjs一篇文章快速入门
C语言:#if、#def和#ifndef综合应用
The use method of string is startwith () - start with XX, endswith () - end with XX, trim () - delete spaces at both ends
In JS, string and array are converted to each other (II) -- the method of converting array into string
一行代码可以做些什么?
The role of applicationmaster in spark on Yan's cluster mode
Thinking about agile development
guava:Collections.unmodifiableXXX创建的collection并不immutable