当前位置:网站首页>Cartoon: what is MapReduce?
Cartoon: what is MapReduce?
2022-07-05 16:09:00 【Small ash】
————— the second day —————
————————————
What is? MapReduce?
MapReduce It's a programming model , The theory comes from Google Three papers published by the company (MapReduce,BigTable,GFS) One of , It is mainly used in parallel computing of massive data .
MapReduce Can be divided into Map and Reduce Two parts of understanding .
1.Map: Mapping process , Put a set of data in a certain way Map Functions map to new data .
2.Reduce: Reduction process , Several groups of mapping results are summarized and output .
Let's look at a practical chestnut , How to efficiently count the number of people with all surnames in the country ?
We can use MapReduce Thought , Do parallel mapping for the population of each province , Some local results are obtained , And then sort out and summarize these local results :
What does this picture mean ? Let's explain the steps :
1.Map:
In provinces , Multiple threads read the population data of different provinces in parallel , Each record generates a Key-Value Key value pair . Here's just simplified data .
2.Shuffle
Shuffle This concept has not been mentioned before , Its Chinese meaning is “ Shuffle ”.Shuffle The process is to sort the data maps 、 grouping 、 Copy .
3.Reduce
Results grouped before execution , And summarize and output .
It should be noted that , Described here Shuffle It's just an abstract concept , In the course of actual implementation Shuffle It's divided into two parts , Part of it is Map In the task , Part of it is Reduce In the task .
Hadoop How to achieve MapReduce?
Hadoop yes Apache A distributed system framework developed by the foundation , Contains multiple components , Its core is HDFS and MapReduce.
For reasons of length , The text won't be right Hadoop Make a complete introduction , Just a brief introduction to Haddoop How to achieve MapReduce.
Here is the graph Hadoop The framework performs a MapReduce Job The whole process :
There are several entities that need to be explained :
HDFS:
Hadoop Distributed file system , by MapReduce Provide data sources and Job Information storage .
Client Node:
perform MapReduce Process of procedure , To submit MapReduce Job.
JobTracker Node:
Put the whole Job Split into several Task, Responsible for dispatching and coordinating all Task, amount to Master Role .
TaskTracker Node:
Responsible for the execution by JobTracker Assigned Task, amount to Worker Role . Of these Task It is divided into MapTask and ReduceTask.
Last , I wish you guys who aspire to be big data engineers , And all the readers of Xiaohui , Achieve your dream in the new year !
—————END—————
边栏推荐
- 具有倍数关系的时钟切换
- Appium automation test foundation - appium basic operation API (I)
- 抽象类中子类与父类
- abstract关键字和哪些关键字会发生冲突呢
- Data communication foundation NAT network address translation
- Research and practice of super-resolution technology in the field of real-time audio and video
- 20. [stm32] realize the function of intelligent garbage can by using ultrasonic module and steering gear
- 六种常用事务解决方案,你方唱罢,我登场(没有最好只有更好)
- 对象和类的关系
- Example project: simple hexapod Walker
猜你喜欢
视觉体验全面升级,豪威集团与英特尔Evo 3.0共同加速PC产业变革
Batch update in the project
19.[STM32]HC_ SR04 ultrasonic ranging_ Timer mode (OLED display)
Defining strict standards, Intel Evo 3.0 is accelerating the upgrading of the PC industry
Codasip adds verify safe startup function to risc-v processor series
Summary of the third class
Use of RLOCK lock
Research and practice of super-resolution technology in the field of real-time audio and video
CODING DevSecOps 助力金融企业跑出数字加速度
OceanBase社区版之OBD方式部署方式本地安装
随机推荐
研发效能度量指标构成及效能度量方法论
SQL injection sqllabs (basic challenges) 1-10
Summary of the second lesson
Boost the development of digital economy and consolidate the base of digital talents - the digital talent competition was successfully held in Kunming
Record the pits encountered in the raspberry pie construction environment...
抽象类和接口的区别
五种常见的咨询公司谈判策略以及如何维护自己的利益
Appium automation test foundation - appium basic operation API (II)
The list set is summed up according to a certain attribute of the object, the maximum value, etc
17.[STM32]仅用三根线带你驱动LCD1602液晶
Obj resolves to a set
Six common transaction solutions, you sing, I come on stage (no best, only better)
Temporary cramming before DFS examination
17. [stm32] use only three wires to drive LCD1602 LCD
【簡記】解决IDE golang 代碼飄紅報錯
Why should we learn mathematical modeling?
异常com.alibaba.fastjson.JSONException: not match : - =
事务回滚异常
Cs231n notes (top) - applicable to 0 Foundation
sql中set标签的使用