当前位置：网站首页>Cartoon: what is MapReduce?

Cartoon: what is MapReduce?

2022-07-05 16:09:00 【Small ash】

————— the second day —————

————————————

What is? MapReduce？

MapReduce It's a programming model , The theory comes from Google Three papers published by the company （MapReduce,BigTable,GFS） One of , It is mainly used in parallel computing of massive data .

MapReduce Can be divided into Map and Reduce Two parts of understanding .

1.Map： Mapping process , Put a set of data in a certain way Map Functions map to new data .

2.Reduce： Reduction process , Several groups of mapping results are summarized and output .

Let's look at a practical chestnut , How to efficiently count the number of people with all surnames in the country ？

We can use MapReduce Thought , Do parallel mapping for the population of each province , Some local results are obtained , And then sort out and summarize these local results ：

What does this picture mean ？ Let's explain the steps ：

1.Map：

In provinces , Multiple threads read the population data of different provinces in parallel , Each record generates a Key-Value Key value pair . Here's just simplified data .

2.Shuffle

Shuffle This concept has not been mentioned before , Its Chinese meaning is “ Shuffle ”.Shuffle The process is to sort the data maps 、 grouping 、 Copy .

3.Reduce

Results grouped before execution , And summarize and output .

It should be noted that , Described here Shuffle It's just an abstract concept , In the course of actual implementation Shuffle It's divided into two parts , Part of it is Map In the task , Part of it is Reduce In the task .

Hadoop How to achieve MapReduce？

Hadoop yes Apache A distributed system framework developed by the foundation , Contains multiple components , Its core is HDFS and MapReduce.

For reasons of length , The text won't be right Hadoop Make a complete introduction , Just a brief introduction to Haddoop How to achieve MapReduce.

Here is the graph Hadoop The framework performs a MapReduce Job The whole process ：

There are several entities that need to be explained ：

HDFS:

Hadoop Distributed file system , by MapReduce Provide data sources and Job Information storage .

Client Node:

perform MapReduce Process of procedure , To submit MapReduce Job.

JobTracker Node:

Put the whole Job Split into several Task, Responsible for dispatching and coordinating all Task, amount to Master Role .

TaskTracker Node:

Responsible for the execution by JobTracker Assigned Task, amount to Worker Role . Of these Task It is divided into MapTask and ReduceTask.

Last , I wish you guys who aspire to be big data engineers , And all the readers of Xiaohui , Achieve your dream in the new year ！

—————END—————

原网站

版权声明
本文为[Small ash]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/186/202207051533130342.html

当前位置：网站首页>Cartoon: what is MapReduce?

Cartoon: what is MapReduce?

边栏推荐

猜你喜欢

随机推荐