YARN yes hadoop Resource scheduling platform of .
1. YARN Basic architecture
YARN It consists of the following components .Resource Manager、Node Manager、Application Master and Container form .
- Resource Manager :RM Is a global Explorer . Responsible for resource management and task scheduling of the whole system . The core component consists of two . Scheduler Scheduler and Application task manager ASM(Applications Master)
Scheduler Responsible for allocating resources to applications .ASM Responsible for starting 、 monitor AM(Application Master)
- Node Manager:
- Periodic direction RM Report node resources and Container Running state .
- receive AM The order of . start-up 、 stop it Container.
- Application Master: Each task submitted by the user contains a AM.AM The main functions of the :
- Segmentation data .
- For applications to RM Application resources (container)
- And NM signal communication , start-up 、 Stop task
- Monitor the running status of tasks . Provide fault tolerance mechanism .
- Container:
Container It's the abstraction of resources ( Memory and CPU).
2. Job submission process
The user to YARN After submitting the task ,YARN Running the application in two phases . start-up AM and AM Create application 、 Request resources for the task 、 Monitor the task execution process until the task is completed .
establish AM
1) Client to YARN Submit the application , And ask for RM Allocate resources
2) RM To start a Container, And run on it AM.
AM Create application
3) AM towards RM Medium ASM register .
4) AM towards RM Apply for and access resources .
5) AM After obtaining resources , Will correspond to the resource NM Establishing a connection .
6) Start the task , Corresponding Container Will be in AM Registered at .
7) Each task is directed to AM Periodic reporting on performance .
8) AM Run back to RM Log off and close yourself
3. 【 Need to improve 】 Resource scheduler
HADOOP There are three job schedulers for . First in, first out scheduler FIFO, Capacity scheduler (capacity scheduler) And fair scheduler (fair scheduler). The default is capacity scheduler .
(1) FIFO : Queue save job, First come, first go .
(2) Capacity Scheduler:
Support multiple queues , Each queue adopts a FIFO scheduling strategy .
(3) Fair Scheduler: