当前位置:网站首页>About the transmission pipeline of stage in spark
About the transmission pipeline of stage in spark
2022-07-01 04:31:00 【Daily Xiaoxin】
About Spark in Stage The transmission of Pipeline
First pipeline Pipeline computing mode ,pipeline It's just a computational idea , A pattern , Follow MR differ ,pipeline It is only when the logic is completely completed that the result will be landing ,MR Is to calculate the persistent disk , recompute , This is also MR And Spark The root cause of the speed gap ( Code implementation Stage Medium Pipeline)
object Pipeline {
def main(args: Array[String]): Unit = {
// Create connection
val conf = new SparkConf()
conf.setMaster("local").setAppName("PPLine");
val sc = new SparkContext(conf)
// Set an array of two concurrent numbers (2 Partition number )
val rdd = sc.parallelize(Array(1, 2, 3, 4, 5, 6, 7, 8), 2)
val rdd1: RDD[Int] = rdd.map(x => {
println("rdd1----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
val rdd2: RDD[Int] = rdd1.filter(x => {
println("rdd2----[pid" + TaskContext.get.partitionId + "]----" + x)
true
})
val rdd3: RDD[(String, Int)] = rdd2.map(x => {
println("rdd3----[pid" + TaskContext.get.partitionId + "]----" + x)
Tuple2("yjx" + x % 3, x)
})
// stay RDD4 To perform a partition operation in , stay RDD4 The operator is then divided into a single stage Used to calculate (2——>4 Meet wide dependence )
val rdd4: RDD[(String, Int)] = rdd3.reduceByKey((sum: Int, value: Int) => {
println("rdd4----[pid" + TaskContext.get.partitionId + "]----" + sum + "---" + value)
sum + value
}, 3)
val rdd5: RDD[(String, Int)] = rdd4.map(x => {
println("rdd5----[pid" + TaskContext.get.partitionId + "]----" + x)
x
})
// Start operator
rdd5.count
sc.stop()
}
}
The above code shows that a set of data is in pipeline Execution order of , Divided into two stage, First, RDD1RDD4 A simple print filter between (`stage1`), Then the RDD4RDD5 Between a calculation print (
stage2)
// The meaning is as follows :
/* stage1( for(i=0;i<2;i++){ new Thread( textFile-->rdd1() rdd2() rdd3()-->ShuffleWrite ).start() } ) stage2( for(i=0;i<1;i++){ new Thread( shuffleRead--->rdd4() rdd5() ).start() } ) */


边栏推荐
- 1. Mobile terminal touch screen event
- How to use maixll dock
- js 图片路径转换base64格式
- What is uid? What is auth? What is a verifier?
- LM small programmable controller software (based on CoDeSys) note 20: PLC controls stepping motor through driver
- Recommend the best product development process in the Internet industry!
- CF1638E colorful operations
- 高并发下接口幂等性如何保证?
- [learn C and fly] S1E20: two dimensional array
- Simple implementation of slf4j
猜你喜欢

Embedded System Development Notes 79: why should I get the IP address of the local network card

The junior college students were angry for 32 days, four rounds of interviews, five hours of soul torture, and won Ali's offer with tears

CF1638E colorful operations

JMeter learning notes 2 - brief introduction to graphical interface

Internet winter, how to spend three months to make a comeback

Spock单元测试框架介绍及在美团优选的实践___第一章

Loop filtering based on Unet

2022 a special equipment related management (elevator) simulation test and a special equipment related management (elevator) certificate examination

MySQL winter vacation self-study 2022 12 (5)

Programs and processes, process management, foreground and background processes
随机推荐
2. Use of classlist (element class name)
[recommended algorithm] C interview question of a small factory
Web server: how to choose a good web server these five aspects should be paid attention to
OdeInt與GPU
做网站数据采集,怎么选择合适的服务器呢?
After many job hopping, the monthly salary is equal to the annual salary of old colleagues
[learn C and fly] S1E20: two dimensional array
OSPF notes [multiple access, two multicast addresses with OSPF]
MySQL winter vacation self-study 2022 12 (5)
TCP/IP 详解(第 2 版) 笔记 / 3 链路层 / 3.4 桥接器与交换机 / 3.4.2 多属性注册协议(Multiple Registration Protocol (MRP))
Redis (VII) optimization suggestions
分账技术赋能农贸市场,重塑交易管理服务效能
How to use maixll dock
Embedded System Development Notes 81: Using Dialog component to design prompt dialog box
DO280管理应用部署--RC
如何看待智慧城市建设中的改变和机遇?
Task04 | statistiques mathématiques
2022 Shanghai safety officer C certificate examination question simulation examination question bank and answers
TASK04|数理统计
The junior college students were angry for 32 days, four rounds of interviews, five hours of soul torture, and won Ali's offer with tears