当前位置:网站首页>Analysis of MapReduce and yarn principles
Analysis of MapReduce and yarn principles
2022-07-02 07:07:00 【Boring n day】
MapReduce And YARN Principle analysis
Catalog
Preface
Here, we mainly prefer theoretical understanding , Mainly for better understanding MR Computing framework and YARN
One .MapReduce
(1)mapreduce Development history of
Hadoop1.x By hdfs and mapreduce form , did not YRAN, Resources are scheduled by mapreduce Two components of (JobTracker and TaskTracker) complete . here we are Hadoop2.x when mapreduce Just separate resource management and turn it into the present YARN,YRAN The function of is very powerful. I will introduce it in detail later . about Hadoop2.x After mapreduce It is simply a distributed computing framework .
(2)mapreduce workflow

Split: Shard operation
MR The main purpose is to realize distributed parallel processing , Therefore, for a large-scale dataset operation, we need to take fragment operation
split You can customize the size , Generally, one block size is the most appropriate
- Sharding operation is to split the data set into many small pieces , Note that the segmentation here is not physical , It is logically defined
RR: Record reader
about map The input and output of a task are k-v Formal , So you need to pass RR Convert data to k-v form .
- RR According to the position and length information of the slice , from hdfs Read and process the relevant pieces of each block , Read it out and output k-v form
Map:Map function
map Among them is user-defined logic processing , According to user-defined map Function processing logic can complete the related data processing , After processing, a pile of k-v In the middle
Shuffle
These intermediate results pass shuffle To send to Reduce
- Example

Reduce function
reduce Among them is user-defined logic processing , According to user-defined reduce Function processing logic can complete the relevant data processing for the final output
Two .YARN
YARN An introduction to the
YARN Architecture of :ResourceManager(RM),ApplicationMaster(AM),NodeManager(NM)
- ResourceManager(RM) Is a global Explorer , Responsible for resource management and allocation of the whole system , Handle client requests , It mainly consists of two components , The scheduler (Scheduler) And Application Manager (Application Manager)
- ApplicationMaster Task scheduling and task monitoring , Apply for resources for application jobs and assign them to various tasks (Map Task or Reduce Mission ), Realize the secondary allocation of resources , And NM Maintain communication to monitor various tasks , And recover when the task fails ( That is, reapply for resource restart task )
- NodeManager Is resident in a YARN Agents on each node in the cluster , Monitor the resource usage of each container , towards RM Report resource usage of jobs and running status of each container , Receive from ApplicationMaster All kinds of requests .
The dispatcher receives from ApplicationMaster Application resource request for , Put the resources in the cluster into “ Containers ” The form assigned to the requesting application , Container selection usually takes into account the location of data to be processed by the application , Choose nearby to achieve “ The calculation moves closer to the data ”
Containers : Containers (Container) As a dynamic resource allocation unit , Each container contains a certain number of CPU、 Memory 、 Resources such as disk , This limits the amount of resources that each application can use
Scheduler : The scheduler is designed to be a pluggable component ,YARN Not only does it provide many kinds of directly available schedulers , It also allows users to redesign the scheduler according to their own needs
Application Manager : Responsible for the management of all applications in the system , Mainly including application submission 、 Negotiate resources with the scheduler to start ApplicationMaster、 monitor ApplicationMaster Running state and restarting in case of failure, etc
YARN Workflow
First, users write applications to YARN Submit , The submission includes :
ApplicationMaser Program
start-up ApplicationMaster command ( This will change according to the calculation framework )
User programsRM Be responsible for accepting and processing requests from clients , Assign a container isolation to the application , And start a ApplicationMaster, from ApplicationMaster Manage the scheduling and execution of the whole operation
ApplicationMaster After being created, it will first go to RM register , Because only registered RM Will monitor and manage in real time ApplicationMaster
ApplicationMaster Polling method is adopted to send RM Application resources
An application execution will be associated with a ApplicationMaster The housekeeper , It will uniformly manage the resource scheduling and execution of the entire program job . The job will be divided into multiple tasks to be executed ( If more than one Map or Reduce Mission ), Each task requires related container resources , You need to apply for a container to perform tasks ,ApplicationMaster At this time, we need to go to RM Apply for these resourcesRM In the form of a container to AM Allocate resources
AM After getting the container , It will reallocate the resources it applied for , Assign it to the tasks you manage
At this time, every task running in the container must pass NM Transfer the current working status and progress to AM report , If the task fails AM Will reapply for his recovery
The whole application ( Homework ) After completion of operation AM Will send to RM The application manager logs out and closes itself , Release resources
YARN It is a pure resource scheduling management framework , As long as the programming implementation of the corresponding AM You can run different frameworks
summary
The above is what I learned today , Mainly in favor of theory
边栏推荐
- Improve user experience defensive programming
- ORACLE 11G利用 ORDS+pljson来实现json_table 效果
- Explanation and application of annotation and reflection
- oracle-外币记账时总账余额表gl_balance变化(上)
- Huawei mindspire open source internship machine test questions
- Tool grass welfare post
- Review of reflection topics
- ORACLE EBS DATAGUARD 搭建
- 解决微信小程序swiper组件bindchange事件抖动问题
- php中通过集合collect的方法来实现把某个值插入到数组中指定的位置
猜你喜欢

PgSQL learning notes

In depth study of JVM bottom layer (II): hotspot virtual machine object

Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool

Sqli Labs clearance summary - page 2

Self study table Au

SQLI-LABS通关(less1)

2021-07-05c /cad secondary development create arc (4)

Oracle apex Ajax process + dy verification

Sqli-labs customs clearance (less1)

sqli-labs通關匯總-page2
随机推荐
ts和js区别
PHP Session原理简析
js的防抖和节流
Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool
Thinkphp5中一个字段对应多个模糊查询
Sqli-labs customs clearance (less15-less17)
Network security -- intrusion detection of emergency response
Spark的原理解析
User login function: simple but difficult
Oracle段顾问、怎么处理行链接行迁移、降低高水位
Oracle EBS数据库监控-Zabbix+zabbix-agent2+orabbix
Only the background of famous universities and factories can programmers have a way out? Netizen: two, big factory background is OK
sqli-labs通关汇总-page1
Anti shake and throttling of JS
js删除字符串的最后一个字符
Use of interrupt()
Overload global and member new/delete
Sqli labs customs clearance summary-page4
Check log4j problems using stain analysis
Improve user experience defensive programming