当前位置:网站首页>MapReduce (III)
MapReduce (III)
2022-07-27 23:49:00 【JiaXingNashishua】
One :WritableComparable Sort
The order is MapReduce One of the most important operations in the framework .
MapTask and ReduceTask All data will be key Sort . The operation belongs to Hadoop Default behavior of . Data in any application is sorted , And whether or not it's logically necessary .
The default sort is in dictionary order , And the method to realize the sorting is Quick sort .
about MapTask, It will temporarily put the result of processing into the ring buffer , When the ring buffer makes
When the utilization reaches a certain threshold , Then do a quick sort of the data in the buffer , And put these ordinal numbers
Data overflow is written to disk , And when the data is processed , It will Merge and sort all files on disk .
about ReduceTask, It comes from every MapTask Remote copy of the corresponding data file on , If the file size exceeds a certain threshold , The overflow is written on the disk , Otherwise it's stored in memory . If the number of files on disk reaches a certain threshold , A merge sort is performed to generate a larger file ; If the size or number of files in memory exceeds a certain threshold , After a merge, overflow the data to disk . When all the data is copied ,ReduceTask Merge and sort all data on memory and disk at one time .
Sort and sort
(1) Partial sorting
MapReduce Sort the data set according to the key of the input record . Ensure the internal order of each output file .
(2) Total sort
There is only one file for the final output , And the documents are in order . The implementation is to set only one ReduceTask. But this method is used in
Very inefficient when dealing with large files , Because one machine processes all the files , Completely lost MapReduce The parallel architecture provided .
(3) Auxiliary sort :(GroupingComparator grouping ) stay Reduce End to key Grouping . be applied to : Receiving key by bean Object time , Want one or more fields to be the same ( All fields are different ) Of key Go to the same reduce When the method is used , You can use group sorting .
(4) Secondary sorting is in the process of custom sorting , If compareTo There are two judging conditions in, that is, the second order .
Custom sort WritableComparable Principle analysis
bean Object as key transmission , Need to achieve WritableComparable Interface rewriting compareTo Method , You can sort .

WritableComparable Sorting case analysis ( Total sort )

Code implementation :
(1)FlowBean Object in demand 1 The comparison function is added on the basis of

(2)Mapper Class key by FlowBean,value by Text( Phone number ), No other code changes
(3) To write Reducer class

(4) To write Driver class

Two :Combiner Merge
(1)Combiner yes MR In the program Mapper and Reducer A component other than .
(2)Combiner The parent class of a component is Reducer.
(3)Combiner and Reducer The difference is the location of the operation ,Combiner It's in every MapTask The node is running ;
(4)Combiner The meaning of this is for every MapTask Local summary of the output of , To reduce network traffic .
(5)Combiner The premise of application is not to affect the final business logic , and ,Combiner Output kv
It should be with Reducer The input of kv Types should be matched .
(6) Customize Combiner Implementation steps

3、 ... and :OutputFormat Data output

边栏推荐
- CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional Network for Clustering 2021
- JS提升:JS中的数组扁平化问题
- Accelerate IGBT localization! BYD semiconductor will be listed independently, with a market value of 30billion yuan!
- JUC toolkit learning
- [MRCTF2020]babyRSA
- Realize today's news website based on native JS
- BUUCTF-bbbbbbrsa
- 四次挥手的Socket交互流程
- 进程同步的方式有哪些?
- 给网站套上Cloudflare(以腾讯云为例)
猜你喜欢

重新定义分析 - EventBridge 实时事件分析平台发布

如果我们是那晚负责修复 B 站崩了的开发人员

2022 International Conference on civil, building and Environmental Engineering (iccaee 2022)

2022夏暑假每日一题(五)
![[NCTF2019]babyRSA1](/img/c1/52e79b6e40390374d48783725311ba.gif)
[NCTF2019]babyRSA1

Introduction to several common usage scenarios of message queue

The first activity of togaf10 standard reading club was successfully held, and the wonderful moments were reviewed!

西门子PLC能否实时无线采集多处从站模拟量数据?

TOGAF10标准读书会首场活动圆满举办,精彩时刻回顾!

基于mediapipe的姿态识别和简单行为识别
随机推荐
proteus仿真arduino中调用DHT11/22温湿度传感器
Socket interaction process of four waves
This is the most concise guide to tcpdump in history. It's enough to read this one
疫情之下,台积电一季度增长超预期,7nm占比35%!二季度或创新高
Which one is better to request to merge -- three skills of interface request merging, and the performance directly explodes the table
JUC toolkit learning
JS提升:JS中的数组扁平化问题
[GWCTF 2019]BabyRSA1
TCP的粘包拆包问题+解决方案
Character stream learning 14.3
Redefine analysis - release of eventbridge real-time event analysis platform
Lua基础语法学习
Under the epidemic, TSMC's growth in the first quarter exceeded expectations, with 7Nm accounting for 35%! Second quarter or record high
2022 summer vacation daily question (5)
File&递归14.1
TFRecord的Shuffle、划分和读取
Bank marketing predicts the success rate of a customer's purchase of financial products
一加将在2020年释放ODM订单,发力中低端市场
[NCTF2019]babyRSA1
【C语言】通讯录(动态版本)