当前位置:网站首页>Introduction to yarn (one article is enough)

Introduction to yarn (one article is enough)

2022-07-07 05:57:00 Yang Linwei

01 introduction

Reference material :《Yarn【 framework 、 principle 、 Multi queue configuration 】 》

Yarn It is a resource scheduling platform , Responsible for providing server computing resources for computing programs , Equivalent to a distributed operating system platform , and MapReduce Etc. is equivalent to the application running on the operating system .

 Insert picture description here

02 yarn framework

yarn Mainly by ResourceManager、NodeManager、ApplicationMaster and Container Etc , Here's the picture :
 Insert picture description here

2.1 ResourceManager

ResourceManager(RM) The main functions are as follows :

  1. Handle client requests ;
  2. monitor NodeManager;
  3. Start or monitor ApplicationMaster;
  4. Resource allocation and scheduling .

2.2 NodeManager

NodeManager(NM) The main functions are as follows :

  1. Manage resources on a single node ;
  2. Processing comes from ResouceManager The order of ;
  3. Processing comes from ApplicationMaster The order of .

2.3 ApplicationMaster

ApplicationMaster(AM) It works as follows :

  1. Responsible for data segmentation ;
  2. Request resources for the application and assign them to internal tasks ;
  3. Task monitoring and fault tolerance .

2.4 Container

Container yes yarn Resource abstraction in , It encapsulates the dimension resources on a node , Such as : Memory 、CPU、 Hard disk and network, etc .

03 yarn working principle

3.1 yarn Working mechanism

yarn The working mechanism is shown below ( Image from :https://www.cnblogs.com/wh984763176/p/13225690.html):
 Insert picture description here
The process is as follows :

  1. MR The program is submitted to the node where the client is located .
  2. YarnRunner towards ResourceManager Apply for one Application.
  3. RM Return the resource path of the application to YarnRunner.
  4. The program submits the required resources to HDFS On .
  5. After the program resources are submitted , Apply to run mrAppMaster.
  6. RM Initialize the user's request to a Task.
  7. One of them NodeManager Received Task Mission .
  8. The NodeManager Create a container Container, And produce MRAppmaster.
  9. Container from HDFS Copy resources to local .
  10. MRAppmaster towards RM Apply to run MapTask resources .
  11. RM Will run MapTask The task is assigned to the other two NodeManager, The other two NodeManager Pick up tasks and create containers .
  12. MR To receive the task from two NodeManager Send program startup script , these two items. NodeManager To start, respectively, MapTask,MapTask Sort the data partition .
  13. MrAppMaster Wait for all MapTask After running , towards RM Apply for containers , function ReduceTask.
  14. ReduceTask towards MapTask Get the data of the corresponding partition .
  15. After the program runs ,MR Will send to RM Apply to cancel yourself .

3.2 yarn Task submission process

① Homework submission

  • The first 1 Step :Client call job.waitForCompletion Method , Submit... To the entire cluster MapReduce Homework .
  • The first 2 Step :Client towards RM Apply for an assignment id.
  • The first 3 Step :RM to Client Return to the job Submit path and job of resource id.
  • The first 4 Step :Client Submit jar package 、 Slice information and configuration files to the specified resource submission path .
  • The first 5 Step :Client After submitting resources , towards RM Apply to run MrAppMaster.

② Job initialization

  • The first 6 Step : When RM received Client After request , Will be job Add to capacity scheduler .
  • The first 7 Step : Some free NM Take it Job.
  • The first 8 Step : The NM establish Container, And produce MRAppmaster.
  • The first 9 Step : download Client Commit resources to local .

③ Task assignment

  • The first 10 Step :MrAppMaster towards RM Apply to run multiple MapTask Task resources .
  • The first 11 Step :RM Will run MapTask The task is assigned to the other two NodeManager, The other two
    NodeManager Pick up tasks and create containers
    .

④ Task run

  • The first 12 Step :MR To receive the task from two NodeManager Send program startup script , these two items. NodeManager To start, respectively, MapTask,MapTask Sort the data partition .
  • The first 13 Step :MrAppMaster Wait for all MapTask After running , towards RM Apply for containers , function ReduceTask.
  • The first 14 Step :ReduceTask towards MapTask Get the data of the corresponding partition .
  • The first 15 Step : After the program runs ,MR Will send to RM Apply to cancel yourself .

⑤ Progress and status updates

  • YARN The tasks in will have their progress and status ( Include counter) Back to application manager , Client per second ( adopt mapreduce.client.progressmonitor.pollinterval Set up ) Request progress updates from app Manager , Show it to the user .

⑥ Homework done

  • In addition to requesting job progress from the application manager , Every client 5 Seconds will pass through the call waitForCompletion() To check whether the homework is finished . The time interval can pass through mapreduce.client.completion.pollinterval To set up . When the homework is done , Application manager and Container Will clean up the working state . The job information will be stored by the job history server for later user verification .

04 yarn Resource scheduler

Hadoop There are three kinds of job scheduler :FIFO、Capacity Scheduler and Fair Scheduler.

Hadoop3.1.3 The default resource scheduler is Capacity Scheduler.

See... For specific settings :yarn-default.xml file

<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

4.1 FIFO Scheduler

fifo There is only one task in the queue at the same time
 Insert picture description here

4.2 Container scheduler

Multi queue FIFO in each queue , There is only one task in the queue at the same time , The parallelism of a queue is the number of queues .
 Insert picture description here

Container scheduler supports multiple queues , Each queue can be configured with a certain amount of resources , Each queue uses FIFO Scheduling strategy ;

In order to prevent the same user's jobs from monopolizing the resources in the queue , The scheduler will limit the resources occupied by jobs submitted by the same user

  • First , Calculate the ratio of the number of running tasks in each queue to the number of computing resources it should share , Select a queue with the lowest ratio ( That is, the most idle );
  • secondly , According to the order of job priority and submission time , At the same time, the user resource and memory constraints are considered to sort the tasks in the queue .

Pictured above , The three queues are executed at the same time according to the sequence of tasks , such as :job11,job21 and job31 At the top of the queue , First run , It's also running in parallel .

4.3 Fair scheduler

Multi queue Each queue allocates resources to start tasks according to the size of the vacancy , There are multiple tasks in the same time queue . The parallelism of the queue is greater than or equal to the number of queues

 Insert picture description here


 Insert picture description here

The fair scheduler has the following characteristics :

  • Support multiple queues and multiple jobs , Each queue can be configured separately ;
  • Jobs in the same queue share the resources of the whole queue according to the priority of the queue , Concurrent execution ;
  • Each job can set the minimum resource value , The scheduler will ensure that the job gets the above resources ;
  • The design goal is on a time scale , All operations receive fair resources . The gap between the resources that an operation should obtain and the resources actually obtained at a certain time is called “ A vacancy ”;
  • The scheduler will give priority to allocating resources to jobs with large vacancies .

05 At the end of the article

This article mainly explains yarn The composition of 、 Working mechanism and its three resource schedulers , Thank you for reading , The end of this paper !

原网站

版权声明
本文为[Yang Linwei]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070033400672.html