当前位置:网站首页>Cartoon: what is MapReduce?
Cartoon: what is MapReduce?
2022-07-05 16:09:00 【Small ash】
————— the second day —————
————————————
What is? MapReduce?
MapReduce It's a programming model , The theory comes from Google Three papers published by the company (MapReduce,BigTable,GFS) One of , It is mainly used in parallel computing of massive data .
MapReduce Can be divided into Map and Reduce Two parts of understanding .
1.Map: Mapping process , Put a set of data in a certain way Map Functions map to new data .
2.Reduce: Reduction process , Several groups of mapping results are summarized and output .
Let's look at a practical chestnut , How to efficiently count the number of people with all surnames in the country ?
We can use MapReduce Thought , Do parallel mapping for the population of each province , Some local results are obtained , And then sort out and summarize these local results :
What does this picture mean ? Let's explain the steps :
1.Map:
In provinces , Multiple threads read the population data of different provinces in parallel , Each record generates a Key-Value Key value pair . Here's just simplified data .
2.Shuffle
Shuffle This concept has not been mentioned before , Its Chinese meaning is “ Shuffle ”.Shuffle The process is to sort the data maps 、 grouping 、 Copy .
3.Reduce
Results grouped before execution , And summarize and output .
It should be noted that , Described here Shuffle It's just an abstract concept , In the course of actual implementation Shuffle It's divided into two parts , Part of it is Map In the task , Part of it is Reduce In the task .
Hadoop How to achieve MapReduce?
Hadoop yes Apache A distributed system framework developed by the foundation , Contains multiple components , Its core is HDFS and MapReduce.
For reasons of length , The text won't be right Hadoop Make a complete introduction , Just a brief introduction to Haddoop How to achieve MapReduce.
Here is the graph Hadoop The framework performs a MapReduce Job The whole process :
There are several entities that need to be explained :
HDFS:
Hadoop Distributed file system , by MapReduce Provide data sources and Job Information storage .
Client Node:
perform MapReduce Process of procedure , To submit MapReduce Job.
JobTracker Node:
Put the whole Job Split into several Task, Responsible for dispatching and coordinating all Task, amount to Master Role .
TaskTracker Node:
Responsible for the execution by JobTracker Assigned Task, amount to Worker Role . Of these Task It is divided into MapTask and ReduceTask.
Last , I wish you guys who aspire to be big data engineers , And all the readers of Xiaohui , Achieve your dream in the new year !
—————END—————
边栏推荐
- Modify PyUnit_ Time makes it support the time text of 'xx~xx months'
- vulnhub-FirstBlood
- vant tabbar遮挡内容的解决方式
- Analytic hierarchy process of mathematical modeling (including Matlab code)
- 《MongoDB入门教程》第04篇 MongoDB客户端
- 19.[STM32]HC_SR04超声波测距_定时器方式(OLED显示)
- list使用Stream流进行根据元素某属性数量相加
- Obj resolves to a set
- 21.[STM32]I2C协议弄不懂,深挖时序图带你编写底层驱动
- CISP-PTE之PHP伪协议总结
猜你喜欢

示例项目:简单的六足步行者

vlunhub- BoredHackerBlog Moriarty Corp
![19.[STM32]HC_SR04超声波测距_定时器方式(OLED显示)](/img/fe/8f59db28823290da8e9280df06673d.jpg)
19.[STM32]HC_SR04超声波测距_定时器方式(OLED显示)
![21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver](/img/f4/2c935dd9933f5cd4324c29c41ab221.png)
21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver

项目sql中批量update的时候参数类型设置错误

【简记】解决IDE golang 代码飘红报错
![17. [stm32] use only three wires to drive LCD1602 LCD](/img/c6/b56c54da2553a451b526179f8b5867.png)
17. [stm32] use only three wires to drive LCD1602 LCD

Appium自动化测试基础 — APPium基础操作API(一)

Xiao Sha's arithmetic problem solving Report

Arduino controls a tiny hexapod 3D printing robot
随机推荐
Data communication foundation NAT network address translation
vlunhub- BoredHackerBlog Moriarty Corp
Value series solution report
后台系统发送验证码功能
Noi / 1.5 37: mercenaries
ES6深入—ES6 Class 类
Record the pits encountered in the raspberry pie construction environment...
18.[STM32]读取DS18B20温度传感器的ROM并实现多点测量温度
Data communication foundation smart_ Link_&_ Monitor_ Link
I'm fat, huh
MySQL giant pit: update updates should be judged with caution by affecting the number of rows!!!
单商户 V4.4,初心未变,实力依旧!
对象和类的关系
OceanBase社区版之OBD方式部署方式本地安装
16. [stm32] starting from the principle, I will show you the DS18B20 temperature sensor - four digit digital tube displays the temperature
抽象类中子类与父类
漫画:什么是分布式事务?
vulnhub-Root_ this_ box
助力数字经济发展,夯实数字人才底座—数字人才大赛在昆成功举办
Detailed explanation of C language branch statements