当前位置:网站首页>About the transmission pipeline of stage in spark
About the transmission pipeline of stage in spark
2022-07-01 04:31:00 【Daily Xiaoxin】
About Spark in Stage The transmission of Pipeline
First pipeline Pipeline computing mode ,pipeline It's just a computational idea , A pattern , Follow MR differ ,pipeline It is only when the logic is completely completed that the result will be landing ,MR Is to calculate the persistent disk , recompute , This is also MR And Spark The root cause of the speed gap ( Code implementation Stage Medium Pipeline)
object Pipeline {
def main(args: Array[String]): Unit = {
// Create connection
val conf = new SparkConf()
conf.setMaster("local").setAppName("PPLine");
val sc = new SparkContext(conf)
// Set an array of two concurrent numbers (2 Partition number )
val rdd = sc.parallelize(Array(1, 2, 3, 4, 5, 6, 7, 8), 2)
val rdd1: RDD[Int] = rdd.map(x => {
println("rdd1----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
val rdd2: RDD[Int] = rdd1.filter(x => {
println("rdd2----[pid" + TaskContext.get.partitionId + "]----" + x)
true
})
val rdd3: RDD[(String, Int)] = rdd2.map(x => {
println("rdd3----[pid" + TaskContext.get.partitionId + "]----" + x)
Tuple2("yjx" + x % 3, x)
})
// stay RDD4 To perform a partition operation in , stay RDD4 The operator is then divided into a single stage Used to calculate (2——>4 Meet wide dependence )
val rdd4: RDD[(String, Int)] = rdd3.reduceByKey((sum: Int, value: Int) => {
println("rdd4----[pid" + TaskContext.get.partitionId + "]----" + sum + "---" + value)
sum + value
}, 3)
val rdd5: RDD[(String, Int)] = rdd4.map(x => {
println("rdd5----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
// Start operator
rdd5.count
sc.stop()
}
}
The above code shows that a set of data is in pipeline Execution order of , Divided into two stage, First, RDD1RDD4 A simple print filter between (`stage1`), Then the RDD4RDD5 Between a calculation print (
stage2
)
// The meaning is as follows :
/* stage1( for(i=0;i<2;i++){ new Thread( textFile-->rdd1() rdd2() rdd3()-->ShuffleWrite ).start() } ) stage2( for(i=0;i<1;i++){ new Thread( shuffleRead--->rdd4() rdd5() ).start() } ) */
边栏推荐
- [godot] unity's animator is different from Godot's animplayer
- Haskell lightweight threads overhead and use on multicores
- Embedded System Development Notes 80: using QT designer to design the main interface
- [today in history] June 30: von Neumann published the first draft; The semiconductor war in the late 1990s; CBS acquires CNET
- Jenkins automatically cleans up construction history
- 多次跳槽后,月薪等于老同事的年薪
- (12) Somersault cloud case (navigation bar highlights follow)
- slf4j 简单实现
- How to use maixll dock
- 为什么香港服务器最适合海外建站使用
猜你喜欢
2022年煤气考试题库及在线模拟考试
2022 question bank and answers for safety production management personnel of hazardous chemical production units
Do280 management application deployment --rc
Mallbook: how can hotel enterprises break the situation in the post epidemic era?
Daily algorithm & interview questions, 28 days of special training in large factories - the 13th day (array)
[today in history] June 30: von Neumann published the first draft; The semiconductor war in the late 1990s; CBS acquires CNET
2022 polymerization process test questions and simulation test
Ospfb notes - five messages [ultra detailed] [Hello message, DD message, LSR message, LSU message, lsack message]
Maixll dock quick start
Simple implementation of slf4j
随机推荐
OSPF notes [dr and bdr]
嵌入式系统开发笔记81:使用Dialog组件设计提示对话框
Class and object finalization
This may be your last chance to join Tencent
[ue4] event distribution mechanism of reflective event distributor and active call event mechanism
Tencent has five years of testing experience. It came to the interview to ask for 30K, and saw the so-called software testing ceiling
Some small knowledge points
2022年煤气考试题库及在线模拟考试
Account sharing technology enables the farmers' market and reshapes the efficiency of transaction management services
Knowledge supplement: redis' basic data types and corresponding commands
Grey correlation cases and codes
CF1638E colorful operations
Introduction of Spock unit test framework and its practice in meituan optimization___ Chapter I
Registration for R2 mobile pressure vessel filling test in 2022 and R2 mobile pressure vessel filling free test questions
OSPF notes [multiple access, two multicast addresses with OSPF]
Chen Yu (Aqua) - Safety - & gt; Cloud Security - & gt; Multicloud security
How to ensure the idempotency of the high concurrency interface?
Selenium opens the Chrome browser and the settings page pops up: Microsoft defender antivirus to reset your settings
VR线上展览所具备应用及特色
Common thread methods and daemon threads