当前位置:网站首页>MapReduce execution principle record
MapReduce execution principle record
2022-06-26 03:52:00 【I love meat】
Mapreduce Basic principles this chapter omits


Some notes :
1 One file Slice multiple split, One split Corresponding to one Maptask. The same one map The output partition corresponds to a reducetask
2 map The data is written to the memory at full percent 80 when , Start writing data to disk . Don't stop when you reach the task , You can continue to write to both memory and disk at the same time .
If you haven't finished reading it all, you can't write it down 80, At this time, the disk file is also written but only written once
3 Most of the ( Not all )map It will start when the execution is completed reduce 了 ,
redcue Write to disk if memory is insufficient , Finally, it is merged into a large disk file to execute reduce Business logic
4 reduce Stage read map Output documents , It determines which data to read by reading the index file
5 Why Data sorting ->
because reduce Phases need to be grouped , take key The same is put together for the specification .map The phases are sorted together to reduce reduce Stage memory sort pressure .
For example, in the same partition reduce Aggregation operation , You only need to traverse one at a time key You can successfully aggregate . Out of order, you need to traverse all the files
6 map spill Three small files are merged into one large file
7 In the memory is the fast row , Merge files are grouped side by side
8 map End data has index file ,reduce There is no index file on the end , because reduce The end data is orderly
( in addition Spark Then for the case that is not a pre aggregation operator and the number of downstream partitions is very small , No memory sort , Improve performance )
Source version 2.7.7

Submit tasks
1 Client resolution MR Mission , Generate some necessary components : The startup script ,job.xml,jar package ( The submission is stored in HDFS In the temporary directory )
2 Submit a task to RM A proxy object , to RM Send an event program to submit the application . The event contains (jobid,submitDir)
3 RM Allocate one NodeManager Start the master program MRAppMaster,MRAppMaster Start assigning other NodeManger start-up YarnChild Program execution ,
MRAppMaster and YarnChild Keep in touch with each other ,
If all the programs are successfully executed, the main program will notify MR and client.MR Then release resources ,client The execution is judged to be successful
MRAppMaster amount to Spark Driver,YarnChild amount to Spark executor

Ring buffer
NodeManager Received MRAppMaster Start after command JVM process , from HDFS Pull various resources to execute MapTask/ReduceTask
Call partition component , to mapTask Output key-value Mark the partition , Write ring buffer ( Buffer zone 100mb, It has reached percent 80 Write to disk )
Default 100mb, With equator As a boundary , Write data on the right , Write fixed on the left 4 Byte data index .
When it is 100% 80, Start writing to disk , The memory is deleted every time the data falls on the disk .
When writing to disk , Re percent 20 Memory delimitation equator Continue writing memory , If the memory is full again , The disk is blocked before it is written , Up to percent 80 Restore when the disk is written .
MR A major reason for stability : Apply for memory only once and use it all the time , Will not keep applying for new memory space
Memory is only constantly overwritten with writes , There is no recycling
Before data is written to disk , Will be carried out in quicksort Quick sort , That's percent 80 In memory data location exchange
1 Sort by partition number first
2 In the partition , according to key Sort


Map Merge files

Reduce End shuffle
Reduce End
Memory is still 100mb, The threshold for triggering write to disk is 0.66, The available memory threshold is 0.7
Reading data , If key The same is placed in an intermediate container , Read on to the next key Different or not until next ( Because the data is orderly )

边栏推荐
- MySQL高级部分( 四: 锁机制、SQL优化 )
- Run multiple main functions in the clion project
- [Flink] a brief analysis of the writing process of Flink sort shuffle
- Camera-memory内存泄漏分析(二)
- The style of the mall can also change a lot. DIY can learn about it
- Restful API interface design standards and specifications
- 2022.6.24-----leetcode. five hundred and fifteen
- Uni app custom navigation bar component
- ABP framework Practice Series (III) - domain layer in depth
- matplotlib多条折线图,点散图
猜你喜欢

Some mobile phones open USB debugging, and the solution to installation failure

MySQL addition, deletion, query and modification (Advanced)
![[Flink] a brief analysis of the writing process of Flink sort shuffle](/img/27/01e95b44df46d8bfcb36ab1d746cc2.jpg)
[Flink] a brief analysis of the writing process of Flink sort shuffle

Camera-memory内存泄漏分析(三)

String到底能不能改变?

Uni app custom selection date 1 (September 16, 2021)

面了个字节拿25k出来的测试,算是真正见识到了基础的天花板

Can string be changed?
![[Flink] Flink source code analysis - creation of jobgraph in batch mode](/img/8e/1190eec23169a4d2a06e1b03154d4f.jpg)
[Flink] Flink source code analysis - creation of jobgraph in batch mode

面试阿里测开岗失败后,被面试官在朋友圈吐槽了......(心塞)
随机推荐
【Flink】Flink Sort-Shuffle写流程简析
2022.6.20-----leetcode. seven hundred and fifteen
Optimization - multi objective planning
YOLOv5改进:更换骨干网(Backbone)
Can string be changed?
MySQL高级篇第一章(linux下安装MySQL)【下】
ASP. Net core introduction
MySQL addition, deletion, query and modification (Advanced)
Binary search
I/O 虚拟化技术 — VFIO
[LOJ 6718] nine suns' weakened version (cyclic convolution, arbitrary modulus NTT)
Some mobile phones open USB debugging, and the solution to installation failure
An error occurred using the connection to database 'on server' 10.28.253.2‘
Flask入门
外包干了四年,人直接废了。。。
[Flink] Flink batch mode map side data aggregation normalizedkeysorter
Camera-memory内存泄漏分析(二)
Android gap animation translate, scale, alpha, rotate
R语言与机器学习
816. fuzzy coordinates