当前位置：网站首页>Yarn organizational structure

Yarn organizational structure

2022-07-06 09:34:00 【Prism 7】

Table of contents title

1. YARN Cluster architecture and working principle
2. YARN Task submission process
3. YARN Three resource scheduling models based on

1. YARN Cluster architecture and working principle

YARN The basic design idea is to MapReduce V1 Medium JobTracker Split into two separate services ：ResourceManager and ApplicationMaster.

ResourceManager( Responsible for resource management and allocation of the whole system )： RM Is a global Explorer , Responsible for resource management and allocation of the whole system , It's mainly made up of two parts ： Scheduler （Scheduler） And Application Manager （Application Manager）.

Scheduler according to capacity 、 Restrictions such as queues , Allocate resources in the system to running applications , At guaranteed capacity 、 On the premise of fairness and service level , Optimize cluster resource utilization , Let all resources be fully utilized
The application manager is responsible for managing all applications in the whole system , Including application submission 、 Negotiate resources with the scheduler to start ApplicationMaster、 monitor ApplicationMaster Run state and restart it in case of failure

ApplicationMaster ( Responsible for the management of a single application ): An application submitted by a user corresponds to a ApplicationMaster, Its main functions are ：

And RM The scheduler negotiates to obtain resources , Resources to Container Express .
Further assign the obtained tasks to internal tasks .
And NM Communicate to start / Stop task .
Monitor the status of all internal tasks , And re apply for resources for the task to restart the task when the task fails to run

NodeManager： NodeManager Is the resource and task manager on each node , One side , It regularly reports to RM Report the resource usage and each Container Operating state ; On the other hand , He receives and processes information from AM Of Container Start and stop requests .
Container： Container yes YARN Resource abstraction in , Encapsulates various resources . An application will be assigned a Container, This application can only use this Container Resources described in .Container It is a division unit of dynamic resources , Better use of resources

2. YARN Task submission process

Direction YARN After submitting an application ,YARN The program will be run in two stages ： One is to start ApplicationMaster; Second, by ApplicationMaster Create application , Then apply for resources for him , Operation of monitoring program , Until the end .

Specific steps ：
（1） The user to YARN Submit an application , And designate ApploicationMaster Program ;

（2）ResourceManager Assign a Container, And the corresponding NodeManager Communications , In this Container Start in ApplicationMaster.

（3）ApplicationMaster towards ResourceManager register , Then split the task and assign it to the internal , Apply for resources for each split task , Then monitor the operation of these tasks , Know the end .

（4）ApplicationMaster Use polling to RM Application resources .

（5）AM After applying for resources , With the corresponding NodeManager Communications , To start the task .

（6） After the mission starts , Each task will report to AM Report your status and progress , So that when the task fails ,AM You can reapply for the resource restart task .

（7） When the task is completed ,AM towards RM Log off and close yourself .

3. YARN Three resource scheduling models based on

stay Yarn There are three schedulers to choose from ：FIFO Scheduler ,Capacity Scheduler,Fair Scheduler.
Apache Version of hadoop The default is Capacity Scheduler Dispatch mode .CDH The default version is Fair Scheduler Dispatch mode
FIFO Scheduler（ First come, first served ）：
FIFO Scheduler Queue applications in the order they are submitted , This is a first in, first out line , In resource allocation , First, allocate resources to the application at the top of the queue , Wait for the top application requirements to be met before the next allocation , And so on .
FIFO Scheduler Is the simplest and easiest to understand scheduler , It doesn't need any configuration , But it doesn't apply to shared clusters . Large applications may take up all cluster resources , This causes other applications to be blocked , For example, there is a big task being carried out , It takes up all the resources , Submit another small task , Then this small task will be blocked all the time .
Capacity Scheduler（ Capacity / Capability scheduler ）：
about Capacity Scheduler , There is a dedicated queue for small tasks , But setting up a queue for small tasks will occupy a certain amount of cluster resources in advance , This leads to the execution time of big tasks lagging behind the use of FIFO Time of scheduler .
Fair Scheduler（ Fair scheduler ）：
stay Fair In scheduler , We do not need to occupy certain system resources in advance ,Fair The scheduler will run for all job Dynamically adjust system resources .
such as ： When the first big job When submitting , This is the only one job Running , At this point it gets all the cluster resources ; When the second small task is submitted ,Fair The scheduler will allocate half the resources to this small task , Let these two tasks share cluster resources fairly .
It should be noted that , stay Fair In scheduler , There will be a delay from the second task submission to resource acquisition , Because it needs to wait for the first task to release the occupied Container. After small tasks are executed, they will also release the resources they occupy , The big task gets all the system resources . The end result is Fair The scheduler can not only achieve high resource utilization, but also ensure that small tasks can be completed in time

原网站

版权声明
本文为[Prism 7]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060901047319.html