当前位置:网站首页>Cartoon: what is MapReduce?
Cartoon: what is MapReduce?
2022-07-05 16:09:00 【Small ash】
————— the second day —————
————————————
What is? MapReduce?
MapReduce It's a programming model , The theory comes from Google Three papers published by the company (MapReduce,BigTable,GFS) One of , It is mainly used in parallel computing of massive data .
MapReduce Can be divided into Map and Reduce Two parts of understanding .
1.Map: Mapping process , Put a set of data in a certain way Map Functions map to new data .
2.Reduce: Reduction process , Several groups of mapping results are summarized and output .
Let's look at a practical chestnut , How to efficiently count the number of people with all surnames in the country ?
We can use MapReduce Thought , Do parallel mapping for the population of each province , Some local results are obtained , And then sort out and summarize these local results :
What does this picture mean ? Let's explain the steps :
1.Map:
In provinces , Multiple threads read the population data of different provinces in parallel , Each record generates a Key-Value Key value pair . Here's just simplified data .
2.Shuffle
Shuffle This concept has not been mentioned before , Its Chinese meaning is “ Shuffle ”.Shuffle The process is to sort the data maps 、 grouping 、 Copy .
3.Reduce
Results grouped before execution , And summarize and output .
It should be noted that , Described here Shuffle It's just an abstract concept , In the course of actual implementation Shuffle It's divided into two parts , Part of it is Map In the task , Part of it is Reduce In the task .
Hadoop How to achieve MapReduce?
Hadoop yes Apache A distributed system framework developed by the foundation , Contains multiple components , Its core is HDFS and MapReduce.
For reasons of length , The text won't be right Hadoop Make a complete introduction , Just a brief introduction to Haddoop How to achieve MapReduce.
Here is the graph Hadoop The framework performs a MapReduce Job The whole process :
There are several entities that need to be explained :
HDFS:
Hadoop Distributed file system , by MapReduce Provide data sources and Job Information storage .
Client Node:
perform MapReduce Process of procedure , To submit MapReduce Job.
JobTracker Node:
Put the whole Job Split into several Task, Responsible for dispatching and coordinating all Task, amount to Master Role .
TaskTracker Node:
Responsible for the execution by JobTracker Assigned Task, amount to Worker Role . Of these Task It is divided into MapTask and ReduceTask.
Last , I wish you guys who aspire to be big data engineers , And all the readers of Xiaohui , Achieve your dream in the new year !
—————END—————
边栏推荐
- The difference between abstract classes and interfaces
- Noi / 1.5 06: element maximum span value of integer sequence
- CISP-PTE之PHP伪协议总结
- ES6 drill down - Async functions and symbol types
- 记录一下树莓派搭建环境中遇到的坑。。。
- Coding devsecops helps financial enterprises run out of digital acceleration
- 《MongoDB入门教程》第04篇 MongoDB客户端
- [brief notes] solve the problem of IDE golang code red and error reporting
- sql中set标签的使用
- 2.3 learning content
猜你喜欢

Data communication foundation - dynamic routing protocol rip

单商户 V4.4,初心未变,实力依旧!

Advanced level of static and extern

五种常见的咨询公司谈判策略以及如何维护自己的利益

vulnhub-FirstBlood

Parameter type setting error during batch update in project SQL

sql中set标签的使用

Data communication foundation smart_ Link_&_ Monitor_ Link

Xiao Sha's arithmetic problem solving Report
![19.[STM32]HC_SR04超声波测距_定时器方式(OLED显示)](/img/fe/8f59db28823290da8e9280df06673d.jpg)
19.[STM32]HC_SR04超声波测距_定时器方式(OLED显示)
随机推荐
CODING DevSecOps 助力金融企业跑出数字加速度
研发效能度量指标构成及效能度量方法论
Arduino controls a tiny hexapod 3D printing robot
sql中set标签的使用
五种常见的咨询公司谈判策略以及如何维护自己的利益
Data communication foundation smart_ Link_&_ Monitor_ Link
[Netease Yunxin] research and practice of super-resolution technology in the field of real-time audio and video
【漏洞预警】CVE-2022-26134 Confluence 远程代码执行漏洞POC验证与修复过程
【网易云信】超分辨率技术在实时音视频领域的研究与实践
RLock锁的使用
Transaction rollback exception
企业级备份软件Veritas NetBackup(NBU) 8.1.1服务端的安装部署
力扣今日题-729. 我的日程安排表 I
The OBD deployment mode of oceanbase Community Edition is installed locally
verilog实现计算最大公约数和最小公倍数
list使用Stream流进行根据元素某属性数量相加
Codasip为RISC-V处理器系列增加Veridify安全启动功能
Record the pits encountered in the raspberry pie construction environment...
ES6 drill down - Async functions and symbol types
18.[stm32] read the ROM of DS18B20 temperature sensor and realize multi-point temperature measurement