当前位置:网站首页>About the transmission pipeline of stage in spark
About the transmission pipeline of stage in spark
2022-07-01 04:31:00 【Daily Xiaoxin】
About Spark in Stage The transmission of Pipeline
First pipeline Pipeline computing mode ,pipeline It's just a computational idea , A pattern , Follow MR differ ,pipeline It is only when the logic is completely completed that the result will be landing ,MR Is to calculate the persistent disk , recompute , This is also MR And Spark The root cause of the speed gap ( Code implementation Stage Medium Pipeline)
object Pipeline {
def main(args: Array[String]): Unit = {
// Create connection
val conf = new SparkConf()
conf.setMaster("local").setAppName("PPLine");
val sc = new SparkContext(conf)
// Set an array of two concurrent numbers (2 Partition number )
val rdd = sc.parallelize(Array(1, 2, 3, 4, 5, 6, 7, 8), 2)
val rdd1: RDD[Int] = rdd.map(x => {
println("rdd1----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
val rdd2: RDD[Int] = rdd1.filter(x => {
println("rdd2----[pid" + TaskContext.get.partitionId + "]----" + x)
true
})
val rdd3: RDD[(String, Int)] = rdd2.map(x => {
println("rdd3----[pid" + TaskContext.get.partitionId + "]----" + x)
Tuple2("yjx" + x % 3, x)
})
// stay RDD4 To perform a partition operation in , stay RDD4 The operator is then divided into a single stage Used to calculate (2——>4 Meet wide dependence )
val rdd4: RDD[(String, Int)] = rdd3.reduceByKey((sum: Int, value: Int) => {
println("rdd4----[pid" + TaskContext.get.partitionId + "]----" + sum + "---" + value)
sum + value
}, 3)
val rdd5: RDD[(String, Int)] = rdd4.map(x => {
println("rdd5----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
// Start operator
rdd5.count
sc.stop()
}
}
The above code shows that a set of data is in pipeline Execution order of , Divided into two stage, First, RDD1RDD4 A simple print filter between (`stage1`), Then the RDD4RDD5 Between a calculation print (
stage2)
// The meaning is as follows :
/* stage1( for(i=0;i<2;i++){ new Thread( textFile-->rdd1() rdd2() rdd3()-->ShuffleWrite ).start() } ) stage2( for(i=0;i<1;i++){ new Thread( shuffleRead--->rdd4() rdd5() ).start() } ) */


边栏推荐
- Maixll-Dock 快速上手
- 2022年上海市安全员C证考试题模拟考试题库及答案
- The junior college students were angry for 32 days, four rounds of interviews, five hours of soul torture, and won Ali's offer with tears
- How to use maixll dock
- 跳槽一次涨8k,5年跳了3次...
- 【发送邮件报错】535 Error:authentication failed
- [human version] Web3 privacy game in the dark forest
- 【LeetCode】100. Same tree
- Applications and features of VR online exhibition
- 25.k sets of flipped linked lists
猜你喜欢

Odeint and GPU
![[send email with error] 535 error:authentication failed](/img/58/8cd22fed1557077994cd78fd29f596.png)
[send email with error] 535 error:authentication failed

This may be your last chance to join Tencent

Grey correlation cases and codes

JS image path conversion Base64 format

slf4j 简单实现

Task04 | statistiques mathématiques

"Target detection" + "visual understanding" realizes the understanding of the input image

Ten wastes of software research and development: the other side of research and development efficiency

How to use maixll dock
随机推荐
TASK04|数理统计
小程序中自定义组件
Hololens2 development environment building and deploying apps
Day 52 - tree problem
【人话版】WEB3黑暗森林中的隐私博弈
总结全了,低代码还需要解决这4点问题
Grey correlation cases and codes
MySQL function variable stored procedure
Haskell lightweight threads overhead and use on multicores
采购数智化爆发在即,支出宝'3+2'体系助力企业打造核心竞争优势
Account sharing technology enables the farmers' market and reshapes the efficiency of transaction management services
扩展-Fragment
多次跳槽后,月薪等于老同事的年薪
LM small programmable controller software (based on CoDeSys) note 20: PLC controls stepping motor through driver
[godot] unity's animator is different from Godot's animplayer
Concurrent mode of different performance testing tools
Task04 | statistiques mathématiques
OdeInt與GPU
Odeint et GPU
Chen Yu (Aqua) - Safety - & gt; Cloud Security - & gt; Multicloud security