当前位置:网站首页>Spark's wide dependence and narrow dependence yyds dry goods inventory
Spark's wide dependence and narrow dependence yyds dry goods inventory
2022-06-24 23:27:00 【Sunzhongming】
Let's talk about wide dependence and narrow dependence
The core point to distinguish the wide and narrow dependence is Son RDD Of partition With the father RDD Of partition Whether it is 1 Relationship to many ,
If this is the case , Description multiple parents rdd Of partition Need to go through shuffle The process is summarized into a sub rdd Of partition, This is a wide dependency , stay DAGScheduler There will be stage The segmentation of .
Narrow dependence :Narrow Dependency
Father RDD Hezi RDD Is a one-to-one dependency , Such as map,filter

Wide dependence :Shuffle Dependency
Nature is shuffle. Such as reduceByKey,groupyByKey, Father RDD A partition data is given to the child RDD Multiple sections of 
There is shuffle It's just wide dependence , Otherwise, it is narrow dependence
RDD As a data structure , It's essentially a Read only partition record set . One RDD Can contain multiple partitions , Each partition is a piece of dataset .
First , Narrow dependencies can be supported in The same node On , With pipeline Form to execute multiple commands ( Also called the same Stage The operation of ), For example, in the implementation of map after , Followed by execution filter. contrary , Wide dependency requires that all parent partitions be available , You may need to call something like MapReduce And so on Cross node transfer .
secondly , From the perspective of failure recovery . Failure recovery with narrow dependency is more effective , Because it just needs to recalculate the lost parent partition that will do , And it can be recomputed on different nodes in parallel ( If a machine is too slow, it will be rescheduled to multiple nodes ).
边栏推荐
- 国内有哪些好的智能家居品牌支持homekit?
- 7-7 数字三角形
- Pseudo original intelligent rewriting API Baidu - good collection
- R语言dplyr包select函数将dataframe数据中的指定数据列移动到dataframe数据列中的第一列(首列)
- [JS] - [tree] - learning notes
- Basic data type
- Financial management [3]
- Construction equipment [6]
- [introduction to UVM== > episode_8] ~ sequence and sequencer, sequence hierarchy
- 企业数据防泄露解决方案分享
猜你喜欢
随机推荐
The dplyr package select function of R language moves the specified data column in the dataframe data to the first column (the first column) in the dataframe data column
Laravel scheduled task
Financial management [6]
Blogs personal blog project details (servlet implementation)
R语言dplyr包select函数将dataframe数据中的指定数据列移动到dataframe数据列中的第一列(首列)
RT thread uses RT kprintf
企业数据防泄露解决方案分享
376. 機器任務
Financial management [5]
Design and practice of vivo server monitoring architecture
[JS] - [stack, team - application] - learning notes
Financial management [2]
378. 骑士放置
数字IC设计经验整理(二)
7-7 数字三角形
376. Tâches mécaniques
Dig deep into MySQL - resolve the difference between clustered and non clustered indexes
基本数据类型
【js】-【树】-学习笔记
Blogs personal blog test point (manual test)
![[JS] - [tree] - learning notes](/img/62/de4fa2a7c5e52c461b8be4a884a395.png)




![[JS] - [stack, team - application] - learning notes](/img/5b/b90ed8d3eb4fc0ab41c6ea8d092d0f.png)



