当前位置:网站首页>About the transmission pipeline of stage in spark
About the transmission pipeline of stage in spark
2022-07-01 04:31:00 【Daily Xiaoxin】
About Spark in Stage The transmission of Pipeline
First pipeline Pipeline computing mode ,pipeline It's just a computational idea , A pattern , Follow MR differ ,pipeline It is only when the logic is completely completed that the result will be landing ,MR Is to calculate the persistent disk , recompute , This is also MR And Spark The root cause of the speed gap ( Code implementation Stage Medium Pipeline)
object Pipeline {
def main(args: Array[String]): Unit = {
// Create connection
val conf = new SparkConf()
conf.setMaster("local").setAppName("PPLine");
val sc = new SparkContext(conf)
// Set an array of two concurrent numbers (2 Partition number )
val rdd = sc.parallelize(Array(1, 2, 3, 4, 5, 6, 7, 8), 2)
val rdd1: RDD[Int] = rdd.map(x => {
println("rdd1----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
val rdd2: RDD[Int] = rdd1.filter(x => {
println("rdd2----[pid" + TaskContext.get.partitionId + "]----" + x)
true
})
val rdd3: RDD[(String, Int)] = rdd2.map(x => {
println("rdd3----[pid" + TaskContext.get.partitionId + "]----" + x)
Tuple2("yjx" + x % 3, x)
})
// stay RDD4 To perform a partition operation in , stay RDD4 The operator is then divided into a single stage Used to calculate (2——>4 Meet wide dependence )
val rdd4: RDD[(String, Int)] = rdd3.reduceByKey((sum: Int, value: Int) => {
println("rdd4----[pid" + TaskContext.get.partitionId + "]----" + sum + "---" + value)
sum + value
}, 3)
val rdd5: RDD[(String, Int)] = rdd4.map(x => {
println("rdd5----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
// Start operator
rdd5.count
sc.stop()
}
}
The above code shows that a set of data is in pipeline Execution order of , Divided into two stage, First, RDD1RDD4 A simple print filter between (`stage1`), Then the RDD4RDD5 Between a calculation print (
stage2)
// The meaning is as follows :
/* stage1( for(i=0;i<2;i++){ new Thread( textFile-->rdd1() rdd2() rdd3()-->ShuffleWrite ).start() } ) stage2( for(i=0;i<1;i++){ new Thread( shuffleRead--->rdd4() rdd5() ).start() } ) */


边栏推荐
- 【无标题】
- How to choose the right server for website data collection?
- 2022年煤气考试题库及在线模拟考试
- Concurrent mode of different performance testing tools
- MySQL function variable stored procedure
- 跳槽一次涨8k,5年跳了3次...
- MySQL winter vacation self-study 2022 12 (5)
- Ospfb notes - five messages [ultra detailed] [Hello message, DD message, LSR message, LSU message, lsack message]
- Question bank and online simulation examination for special operation certificate of G1 industrial boiler stoker in 2022
- 2022 a special equipment related management (elevator) simulation test and a special equipment related management (elevator) certificate examination
猜你喜欢

The junior college students were angry for 32 days, four rounds of interviews, five hours of soul torture, and won Ali's offer with tears

In the innovation community, the "100 cities Tour" of the gold warehouse of the National People's Congress of 2022 was launched

It's settled! 2022 JD cloud summit of JD global technology Explorer conference see you in Beijing on July 13

扩展-Fragment

25.k sets of flipped linked lists

OdeInt与GPU
![[human version] Web3 privacy game in the dark forest](/img/89/e16789b7f3892002748aab309c45e6.png)
[human version] Web3 privacy game in the dark forest

Daily question - line 10

NFT: utilisez EIP - 2981 pour commencer un voyage de redevances NFT

Embedded System Development Notes 79: why should I get the IP address of the local network card
随机推荐
"Target detection" + "visual understanding" realizes the understanding of the input image
Registration for R2 mobile pressure vessel filling test in 2022 and R2 mobile pressure vessel filling free test questions
Leetcode learning - day 36
2022 t elevator repair question bank and simulation test
Selenium opens the Chrome browser and the settings page pops up: Microsoft defender antivirus to reset your settings
Spock单元测试框架介绍及在美团优选的实践___第一章
2022 hoisting machinery command registration examination and hoisting machinery command examination registration
Extension fragment
Possible problems and solutions of using scroll view to implement slider view
Embedded System Development Notes 79: why should I get the IP address of the local network card
Applications and features of VR online exhibition
2022年G1工业锅炉司炉特种作业证考试题库及在线模拟考试
TCP/IP 详解(第 2 版) 笔记 / 3 链路层 / 3.4 桥接器与交换机 / 3.4.2 多属性注册协议(Multiple Registration Protocol (MRP))
2022年化工自动化控制仪表操作证考试题库及答案
LM small programmable controller software (based on CoDeSys) note 19: errors do not match the profile of the target
OdeInt与GPU
Question bank and answers for chemical automation control instrument operation certificate examination in 2022
C language games (I) -- guessing games
Tencent has five years of testing experience. It came to the interview to ask for 30K, and saw the so-called software testing ceiling
2022 polymerization process test questions and simulation test