当前位置:网站首页>MapReduce (III)
MapReduce (III)
2022-07-27 23:49:00 【JiaXingNashishua】
One :WritableComparable Sort
The order is MapReduce One of the most important operations in the framework .
MapTask and ReduceTask All data will be key Sort . The operation belongs to Hadoop Default behavior of . Data in any application is sorted , And whether or not it's logically necessary .
The default sort is in dictionary order , And the method to realize the sorting is Quick sort .
about MapTask, It will temporarily put the result of processing into the ring buffer , When the ring buffer makes
When the utilization reaches a certain threshold , Then do a quick sort of the data in the buffer , And put these ordinal numbers
Data overflow is written to disk , And when the data is processed , It will Merge and sort all files on disk .
about ReduceTask, It comes from every MapTask Remote copy of the corresponding data file on , If the file size exceeds a certain threshold , The overflow is written on the disk , Otherwise it's stored in memory . If the number of files on disk reaches a certain threshold , A merge sort is performed to generate a larger file ; If the size or number of files in memory exceeds a certain threshold , After a merge, overflow the data to disk . When all the data is copied ,ReduceTask Merge and sort all data on memory and disk at one time .
Sort and sort
(1) Partial sorting
MapReduce Sort the data set according to the key of the input record . Ensure the internal order of each output file .
(2) Total sort
There is only one file for the final output , And the documents are in order . The implementation is to set only one ReduceTask. But this method is used in
Very inefficient when dealing with large files , Because one machine processes all the files , Completely lost MapReduce The parallel architecture provided .
(3) Auxiliary sort :(GroupingComparator grouping ) stay Reduce End to key Grouping . be applied to : Receiving key by bean Object time , Want one or more fields to be the same ( All fields are different ) Of key Go to the same reduce When the method is used , You can use group sorting .
(4) Secondary sorting is in the process of custom sorting , If compareTo There are two judging conditions in, that is, the second order .
Custom sort WritableComparable Principle analysis
bean Object as key transmission , Need to achieve WritableComparable Interface rewriting compareTo Method , You can sort .

WritableComparable Sorting case analysis ( Total sort )

Code implementation :
(1)FlowBean Object in demand 1 The comparison function is added on the basis of

(2)Mapper Class key by FlowBean,value by Text( Phone number ), No other code changes
(3) To write Reducer class

(4) To write Driver class

Two :Combiner Merge
(1)Combiner yes MR In the program Mapper and Reducer A component other than .
(2)Combiner The parent class of a component is Reducer.
(3)Combiner and Reducer The difference is the location of the operation ,Combiner It's in every MapTask The node is running ;
(4)Combiner The meaning of this is for every MapTask Local summary of the output of , To reduce network traffic .
(5)Combiner The premise of application is not to affect the final business logic , and ,Combiner Output kv
It should be with Reducer The input of kv Types should be matched .
(6) Customize Combiner Implementation steps

3、 ... and :OutputFormat Data output

边栏推荐
猜你喜欢

NDK 系列(6):说一下注册 JNI 函数的方式和时机

2022 summer vacation daily question (5)

(十二)51单片机----用DS18B20浅测一下工(江)西的室外温度

Redefine analysis - release of eventbridge real-time event analysis platform

Bank Marketing预测一个客户购买理财产品的成功率

TCP的粘包拆包问题+解决方案

org.junit.runners.model.InvalidTestClassError: Invalid test class ‘com.zhj.esdemo.MysqlTests‘: 1.

2022 International Conference on civil, building and Environmental Engineering (iccaee 2022)

Lua basic grammar learning

字符流学习14.3
随机推荐
BUUCTF-RSA4
Lua基础语法学习
TOGAF10标准读书会首场活动圆满举办,精彩时刻回顾!
为什么需要等待计时2MSL?
[December Haikou] the 6th International Conference on ships, marine and Maritime Engineering in 2022 (naome 2022)
【JS 逆向百例】某公共资源交易网,公告 URL 参数逆向分析
基于mediapipe的姿态识别和简单行为识别
MySQL data query (where)
消息队列常见的几种使用场景介绍
org.junit.runners.model.InvalidTestClassError: Invalid test class ‘com.zhj.esdemo.MysqlTests‘: 1.
TCP的粘包拆包问题+解决方案
C#委托用法--控制台项目,通过委托实现事件
In 2019, the world's top ten semiconductor manufacturers: Intel returned to the first place, and apple rose sharply against the trend
[RoarCTF2019]RSA
How to use xshell Free Edition
(十二)51单片机----用DS18B20浅测一下工(江)西的室外温度
MySQL之数据查询(WHERE)
BUUCTF-Dangerous RSA
采用汇顶屏下光学指纹方案,三星Galaxy A71 5G上市
实现按照序号命名的txt文件由后往前递补重命名文件