当前位置:网站首页>Cartoon: what is MapReduce?
Cartoon: what is MapReduce?
2022-07-05 16:09:00 【Small ash】
————— the second day —————
————————————
What is? MapReduce?
MapReduce It's a programming model , The theory comes from Google Three papers published by the company (MapReduce,BigTable,GFS) One of , It is mainly used in parallel computing of massive data .
MapReduce Can be divided into Map and Reduce Two parts of understanding .
1.Map: Mapping process , Put a set of data in a certain way Map Functions map to new data .
2.Reduce: Reduction process , Several groups of mapping results are summarized and output .
Let's look at a practical chestnut , How to efficiently count the number of people with all surnames in the country ?
We can use MapReduce Thought , Do parallel mapping for the population of each province , Some local results are obtained , And then sort out and summarize these local results :
What does this picture mean ? Let's explain the steps :
1.Map:
In provinces , Multiple threads read the population data of different provinces in parallel , Each record generates a Key-Value Key value pair . Here's just simplified data .
2.Shuffle
Shuffle This concept has not been mentioned before , Its Chinese meaning is “ Shuffle ”.Shuffle The process is to sort the data maps 、 grouping 、 Copy .
3.Reduce
Results grouped before execution , And summarize and output .
It should be noted that , Described here Shuffle It's just an abstract concept , In the course of actual implementation Shuffle It's divided into two parts , Part of it is Map In the task , Part of it is Reduce In the task .
Hadoop How to achieve MapReduce?
Hadoop yes Apache A distributed system framework developed by the foundation , Contains multiple components , Its core is HDFS and MapReduce.
For reasons of length , The text won't be right Hadoop Make a complete introduction , Just a brief introduction to Haddoop How to achieve MapReduce.
Here is the graph Hadoop The framework performs a MapReduce Job The whole process :
There are several entities that need to be explained :
HDFS:
Hadoop Distributed file system , by MapReduce Provide data sources and Job Information storage .
Client Node:
perform MapReduce Process of procedure , To submit MapReduce Job.
JobTracker Node:
Put the whole Job Split into several Task, Responsible for dispatching and coordinating all Task, amount to Master Role .
TaskTracker Node:
Responsible for the execution by JobTracker Assigned Task, amount to Worker Role . Of these Task It is divided into MapTask and ReduceTask.
Last , I wish you guys who aspire to be big data engineers , And all the readers of Xiaohui , Achieve your dream in the new year !
—————END—————
边栏推荐
- Arduino controls a tiny hexapod 3D printing robot
- Intelligent metal detector based on openharmony
- 项目sql中批量update的时候参数类型设置错误
- Summary of the third class
- SQL injection sqllabs (basic challenges) 1-10
- ES6深入—ES6 Generator 函数
- verilog实现计算最大公约数和最小公倍数
- Exception com alibaba. fastjson. JSONException: not match : - =
- Batch update in the project
- obj集合转为实体集合
猜你喜欢
Research and practice of super-resolution technology in the field of real-time audio and video
The OBD deployment mode of oceanbase Community Edition is installed locally
Five common negotiation strategies of consulting companies and how to safeguard their own interests
Li Kou today's question -729 My schedule I
单商户 V4.4,初心未变,实力依旧!
ES6 drill down - Async functions and symbol types
Quick completion guide for manipulator (IX): forward kinematics analysis
Appium automation test foundation - appium basic operation API (II)
Subclasses and superclasses of abstract classes
21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver
随机推荐
CISP-PTE之PHP伪协议总结
16.[STM32]从原理开始带你了解DS18B20温度传感器-四位数码管显示温度
[Netease Yunxin] research and practice of super-resolution technology in the field of real-time audio and video
异常com.alibaba.fastjson.JSONException: not match : - =
RLock锁的使用
记一次'非常诡异'的云安全组规则问题排查过程
wyt 。。
Data communication foundation smart_ Link_&_ Monitor_ Link
降本40%!Redis多租户集群的容器化实践
Dataarts studio data architecture - Introduction to data standards
《MongoDB入门教程》第04篇 MongoDB客户端
18.[STM32]读取DS18B20温度传感器的ROM并实现多点测量温度
漫画:什么是蓝绿部署?
Analytic hierarchy process of mathematical modeling (including Matlab code)
研发效能度量指标构成及效能度量方法论
Research and practice of super-resolution technology in the field of real-time audio and video
Boost the development of digital economy and consolidate the base of digital talents - the digital talent competition was successfully held in Kunming
Interval DP (gravel consolidation)
10分钟帮你搞定Zabbix监控平台告警推送到钉钉群
基于OpenHarmony的智能金属探测器