当前位置:网站首页>MapReduce (III)
MapReduce (III)
2022-07-27 23:49:00 【JiaXingNashishua】
One :WritableComparable Sort
The order is MapReduce One of the most important operations in the framework .
MapTask and ReduceTask All data will be key Sort . The operation belongs to Hadoop Default behavior of . Data in any application is sorted , And whether or not it's logically necessary .
The default sort is in dictionary order , And the method to realize the sorting is Quick sort .
about MapTask, It will temporarily put the result of processing into the ring buffer , When the ring buffer makes
When the utilization reaches a certain threshold , Then do a quick sort of the data in the buffer , And put these ordinal numbers
Data overflow is written to disk , And when the data is processed , It will Merge and sort all files on disk .
about ReduceTask, It comes from every MapTask Remote copy of the corresponding data file on , If the file size exceeds a certain threshold , The overflow is written on the disk , Otherwise it's stored in memory . If the number of files on disk reaches a certain threshold , A merge sort is performed to generate a larger file ; If the size or number of files in memory exceeds a certain threshold , After a merge, overflow the data to disk . When all the data is copied ,ReduceTask Merge and sort all data on memory and disk at one time .
Sort and sort
(1) Partial sorting
MapReduce Sort the data set according to the key of the input record . Ensure the internal order of each output file .
(2) Total sort
There is only one file for the final output , And the documents are in order . The implementation is to set only one ReduceTask. But this method is used in
Very inefficient when dealing with large files , Because one machine processes all the files , Completely lost MapReduce The parallel architecture provided .
(3) Auxiliary sort :(GroupingComparator grouping ) stay Reduce End to key Grouping . be applied to : Receiving key by bean Object time , Want one or more fields to be the same ( All fields are different ) Of key Go to the same reduce When the method is used , You can use group sorting .
(4) Secondary sorting is in the process of custom sorting , If compareTo There are two judging conditions in, that is, the second order .
Custom sort WritableComparable Principle analysis
bean Object as key transmission , Need to achieve WritableComparable Interface rewriting compareTo Method , You can sort .

WritableComparable Sorting case analysis ( Total sort )

Code implementation :
(1)FlowBean Object in demand 1 The comparison function is added on the basis of

(2)Mapper Class key by FlowBean,value by Text( Phone number ), No other code changes
(3) To write Reducer class

(4) To write Driver class

Two :Combiner Merge
(1)Combiner yes MR In the program Mapper and Reducer A component other than .
(2)Combiner The parent class of a component is Reducer.
(3)Combiner and Reducer The difference is the location of the operation ,Combiner It's in every MapTask The node is running ;
(4)Combiner The meaning of this is for every MapTask Local summary of the output of , To reduce network traffic .
(5)Combiner The premise of application is not to affect the final business logic , and ,Combiner Output kv
It should be with Reducer The input of kv Types should be matched .
(6) Customize Combiner Implementation steps

3、 ... and :OutputFormat Data output

边栏推荐
- C#委托用法--控制台项目,通过委托实现事件
- J9数字科普:Sui网络的双共识是如何工作的?
- NB-IoT产业的现状与未来:跨过1亿出货门槛,奔向5G大连接!
- 虚拟存储器与Cache的比较
- Socket interaction process of four waves
- Flutter pull_ to_ refresh-1.6.0/lib/src/internals/slivers. dart:164:13: Error: Method not found: ‘descr
- 7.6万人停工!东芝宣布关闭日本所有工厂
- The txt file named according to the sequence number is renamed from the back to the front
- 五子棋人机对战实现
- What technology is RPA process automation robot? How to realize office automation?
猜你喜欢

Master data management theory and Practice

Can Siemens PLC collect analog data of multiple slave stations in real time and wirelessly?

J9数字科普:Sui网络的双共识是如何工作的?

Bank marketing predicts the success rate of a customer's purchase of financial products
![[RoarCTF2019]RSA](/img/0e/8c8371ccf40094e5b03e502d6ae851.png)
[RoarCTF2019]RSA

主数据管理理论与实践

NDK 系列(6):说一下注册 JNI 函数的方式和时机

(十二)51单片机----用DS18B20浅测一下工(江)西的室外温度

JUC工具包学习

29. Learn the stacked column chart of highcharts using percentage
随机推荐
Redis的分布式锁
Spark 离线开发框架设计与实现
2022 International Conference on civil, building and Environmental Engineering (iccaee 2022)
Those "experiences and traps" in the data center
Lua basic grammar learning
TFRecord的Shuffle、划分和读取
Redis hash underlying data structure
Yijia will release ODM orders in 2020 and make efforts in the middle and low-end market
Is it really hard to understand? What level of cache is the recyclerview caching mechanism?
Which one is better to request to merge -- three skills of interface request merging, and the performance directly explodes the table
【JS 逆向百例】某公共资源交易网,公告 URL 参数逆向分析
TCP sticking and unpacking problem + Solution
urllib.error. URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: un
基于原生js实现今日新闻网站
15million per day! BYD masks won a US $1billion order in California
ELK日志分析系统安装和部署
Your list is too laggy? These four optimizations can make your list silky smooth
你的列表很卡?这4个优化能让你的列表丝般顺滑
主数据管理理论与实践
MySQL之数据查询(WHERE)