当前位置:网站首页>Cartoon: what is MapReduce?
Cartoon: what is MapReduce?
2022-07-05 16:09:00 【Small ash】
————— the second day —————
————————————
What is? MapReduce?
MapReduce It's a programming model , The theory comes from Google Three papers published by the company (MapReduce,BigTable,GFS) One of , It is mainly used in parallel computing of massive data .
MapReduce Can be divided into Map and Reduce Two parts of understanding .
1.Map: Mapping process , Put a set of data in a certain way Map Functions map to new data .
2.Reduce: Reduction process , Several groups of mapping results are summarized and output .
Let's look at a practical chestnut , How to efficiently count the number of people with all surnames in the country ?
We can use MapReduce Thought , Do parallel mapping for the population of each province , Some local results are obtained , And then sort out and summarize these local results :
What does this picture mean ? Let's explain the steps :
1.Map:
In provinces , Multiple threads read the population data of different provinces in parallel , Each record generates a Key-Value Key value pair . Here's just simplified data .
2.Shuffle
Shuffle This concept has not been mentioned before , Its Chinese meaning is “ Shuffle ”.Shuffle The process is to sort the data maps 、 grouping 、 Copy .
3.Reduce
Results grouped before execution , And summarize and output .
It should be noted that , Described here Shuffle It's just an abstract concept , In the course of actual implementation Shuffle It's divided into two parts , Part of it is Map In the task , Part of it is Reduce In the task .
Hadoop How to achieve MapReduce?
Hadoop yes Apache A distributed system framework developed by the foundation , Contains multiple components , Its core is HDFS and MapReduce.
For reasons of length , The text won't be right Hadoop Make a complete introduction , Just a brief introduction to Haddoop How to achieve MapReduce.
Here is the graph Hadoop The framework performs a MapReduce Job The whole process :
There are several entities that need to be explained :
HDFS:
Hadoop Distributed file system , by MapReduce Provide data sources and Job Information storage .
Client Node:
perform MapReduce Process of procedure , To submit MapReduce Job.
JobTracker Node:
Put the whole Job Split into several Task, Responsible for dispatching and coordinating all Task, amount to Master Role .
TaskTracker Node:
Responsible for the execution by JobTracker Assigned Task, amount to Worker Role . Of these Task It is divided into MapTask and ReduceTask.
Last , I wish you guys who aspire to be big data engineers , And all the readers of Xiaohui , Achieve your dream in the new year !
—————END—————
边栏推荐
- Cs231n notes (top) - applicable to 0 Foundation
- 事务回滚异常
- 助力数字经济发展,夯实数字人才底座—数字人才大赛在昆成功举办
- vant popup+其他组件的组合使用,及避坑指南
- 单商户 V4.4,初心未变,实力依旧!
- Data communication foundation smart_ Link_&_ Monitor_ Link
- The OBD deployment mode of oceanbase Community Edition is installed locally
- vant tabbar遮挡内容的解决方式
- Pits encountered in the use of boolean type in development
- Go language programming specification combing summary
猜你喜欢

CODING DevSecOps 助力金融企业跑出数字加速度

vulnhub-FirstBlood

效果编辑器新版上线!3D渲染、加标注、设置动画,这次一个编辑器就够了

超分辨率技术在实时音视频领域的研究与实践

ES6 drill down - Async functions and symbol types

verilog实现计算最大公约数和最小公倍数
![21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver](/img/f4/2c935dd9933f5cd4324c29c41ab221.png)
21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver

The visual experience has been comprehensively upgraded, and Howell group and Intel Evo 3.0 have jointly accelerated the reform of the PC industry

Cs231n notes (medium) -- applicable to 0 Foundation

Li Kou today's question -729 My schedule I
随机推荐
漫画:什么是服务熔断?
Example project: simple hexapod Walker
17.[STM32]仅用三根线带你驱动LCD1602液晶
Analytic hierarchy process of mathematical modeling (including Matlab code)
CISP-PTE之PHP伪协议总结
Exception com alibaba. fastjson. JSONException: not match : - =
漫画:什么是分布式事务?
Data communication foundation smart_ Link_&_ Monitor_ Link
MySQL giant pit: update updates should be judged with caution by affecting the number of rows!!!
后台系统发送验证码功能
定义严苛标准,英特尔Evo 3.0正在加速PC产业升级
Summary of the second lesson
vlunhub- BoredHackerBlog Moriarty Corp
单商户 V4.4,初心未变,实力依旧!
Why should we learn mathematical modeling?
Appium automation test foundation - appium basic operation API (II)
Record the pits encountered in the raspberry pie construction environment...
Verilog realizes the calculation of the maximum common divisor and the minimum common multiple
我们为什么要学习数学建模?
20. [stm32] realize the function of intelligent garbage can by using ultrasonic module and steering gear