当前位置:网站首页>Analysis of MapReduce and yarn principles
Analysis of MapReduce and yarn principles
2022-07-02 07:07:00 【Boring n day】
MapReduce And YARN Principle analysis
Catalog
Preface
Here, we mainly prefer theoretical understanding , Mainly for better understanding MR Computing framework and YARN
One .MapReduce
(1)mapreduce Development history of
Hadoop1.x By hdfs and mapreduce form , did not YRAN, Resources are scheduled by mapreduce Two components of (JobTracker and TaskTracker) complete . here we are Hadoop2.x when mapreduce Just separate resource management and turn it into the present YARN,YRAN The function of is very powerful. I will introduce it in detail later . about Hadoop2.x After mapreduce It is simply a distributed computing framework .
(2)mapreduce workflow
Split: Shard operation
MR The main purpose is to realize distributed parallel processing , Therefore, for a large-scale dataset operation, we need to take fragment operation
split You can customize the size , Generally, one block size is the most appropriate
- Sharding operation is to split the data set into many small pieces , Note that the segmentation here is not physical , It is logically defined
RR: Record reader
about map The input and output of a task are k-v Formal , So you need to pass RR Convert data to k-v form .
- RR According to the position and length information of the slice , from hdfs Read and process the relevant pieces of each block , Read it out and output k-v form
Map:Map function
map Among them is user-defined logic processing , According to user-defined map Function processing logic can complete the related data processing , After processing, a pile of k-v In the middle
Shuffle
These intermediate results pass shuffle To send to Reduce
- Example
Reduce function
reduce Among them is user-defined logic processing , According to user-defined reduce Function processing logic can complete the relevant data processing for the final output
Two .YARN
YARN An introduction to the
YARN Architecture of :ResourceManager(RM),ApplicationMaster(AM),NodeManager(NM)
- ResourceManager(RM) Is a global Explorer , Responsible for resource management and allocation of the whole system , Handle client requests , It mainly consists of two components , The scheduler (Scheduler) And Application Manager (Application Manager)
- ApplicationMaster Task scheduling and task monitoring , Apply for resources for application jobs and assign them to various tasks (Map Task or Reduce Mission ), Realize the secondary allocation of resources , And NM Maintain communication to monitor various tasks , And recover when the task fails ( That is, reapply for resource restart task )
- NodeManager Is resident in a YARN Agents on each node in the cluster , Monitor the resource usage of each container , towards RM Report resource usage of jobs and running status of each container , Receive from ApplicationMaster All kinds of requests .
The dispatcher receives from ApplicationMaster Application resource request for , Put the resources in the cluster into “ Containers ” The form assigned to the requesting application , Container selection usually takes into account the location of data to be processed by the application , Choose nearby to achieve “ The calculation moves closer to the data ”
Containers : Containers (Container) As a dynamic resource allocation unit , Each container contains a certain number of CPU、 Memory 、 Resources such as disk , This limits the amount of resources that each application can use
Scheduler : The scheduler is designed to be a pluggable component ,YARN Not only does it provide many kinds of directly available schedulers , It also allows users to redesign the scheduler according to their own needs
Application Manager : Responsible for the management of all applications in the system , Mainly including application submission 、 Negotiate resources with the scheduler to start ApplicationMaster、 monitor ApplicationMaster Running state and restarting in case of failure, etc
YARN Workflow
First, users write applications to YARN Submit , The submission includes :
ApplicationMaser Program
start-up ApplicationMaster command ( This will change according to the calculation framework )
User programsRM Be responsible for accepting and processing requests from clients , Assign a container isolation to the application , And start a ApplicationMaster, from ApplicationMaster Manage the scheduling and execution of the whole operation
ApplicationMaster After being created, it will first go to RM register , Because only registered RM Will monitor and manage in real time ApplicationMaster
ApplicationMaster Polling method is adopted to send RM Application resources
An application execution will be associated with a ApplicationMaster The housekeeper , It will uniformly manage the resource scheduling and execution of the entire program job . The job will be divided into multiple tasks to be executed ( If more than one Map or Reduce Mission ), Each task requires related container resources , You need to apply for a container to perform tasks ,ApplicationMaster At this time, we need to go to RM Apply for these resourcesRM In the form of a container to AM Allocate resources
AM After getting the container , It will reallocate the resources it applied for , Assign it to the tasks you manage
At this time, every task running in the container must pass NM Transfer the current working status and progress to AM report , If the task fails AM Will reapply for his recovery
The whole application ( Homework ) After completion of operation AM Will send to RM The application manager logs out and closes itself , Release resources
YARN It is a pure resource scheduling management framework , As long as the programming implementation of the corresponding AM You can run different frameworks
summary
The above is what I learned today , Mainly in favor of theory
边栏推荐
- How to debug wechat built-in browser applications (enterprise number, official account, subscription number)
- JSP智能小区物业管理系统
- Sqli-labs customs clearance (less15-less17)
- UEditor .Net版本任意文件上传漏洞复现
- js判断对象是否为空
- CSRF攻击
- Oracle段顾问、怎么处理行链接行迁移、降低高水位
- Oracle rman半自动恢复脚本-restore阶段
- CAD secondary development object
- php中计算两个日期之前相差多少天、月、年
猜你喜欢
Network security -- intrusion detection of emergency response
sqli-labs通关汇总-page4
sqli-labs通关汇总-page3
Sqli-labs customs clearance (less6-less14)
Win10: add or delete boot items, and add user-defined boot files to boot items
IDEA2020中测试PySpark的运行出错
flex九宫格布局
The use of regular expressions in JS
CAD secondary development object
sqli-labs通关汇总-page1
随机推荐
Solve the problem of bindchange event jitter of swiper component of wechat applet
ts和js区别
解决微信小程序swiper组件bindchange事件抖动问题
php中计算树状结构数据中的合计
TCP攻击
数仓模型事实表模型设计
uniapp引入本地字体
php中通过集合collect的方法来实现把某个值插入到数组中指定的位置
ORACLE APEX 21.2安装及一键部署
Oracle rman自动恢复脚本(生产数据向测试迁移)
Log - 7 - record a major error in missing documents (A4 paper)
oracle EBS标准表的后缀解释说明
CVE-2015-1635(MS15-034 )遠程代碼執行漏洞複現
How to debug wechat built-in browser applications (enterprise number, official account, subscription number)
[Zhang San learns C language] - deeply understand data storage
Sqli-labs customs clearance (less18-less20)
JSP智能小区物业管理系统
Tool grass welfare post
RMAN incremental recovery example (1) - without unbacked archive logs
Sqli - Labs Clearance (less6 - less14)