当前位置：网站首页>[CDH] cdh5.16 configuring the setting of yarn task centralized allocation does not take effect

[CDH] cdh5.16 configuring the setting of yarn task centralized allocation does not take effect

2022-07-06 11:31:00 【kiraraLou】

Preface

Record CDH colony yarn Service task centralized allocation configuration does not take effect .

environmental information

CDH 5.16
Hadoop 2.6.0
yarn Fair scheduling mode

The course of the problem

In recent days, , The system operation and maintenance feedback said that our big data cluster has a node （nodemanager） Memory usage exceeds the alarm threshold , Trigger alarm .

After troubleshooting, it is found that the computing node （nodemanager） And nodes （nodemanager） The load gap is large , I immediately thought of it because Yarn Batch allocation is enabled .

In addition, because our tasks are all flow computing tasks , Required for a single task container Not many . So it will aggravate this phenomenon . Some nodes have been used 175G Memory , Some nodes only 40G.

Solution steps

1. Refer to online solutions

here , We have located it Yarn Batch allocation is enabled , And our task type will aggravate this phenomenon . Let's solve this problem OK 了 .

Most of the online solutions are as follows ：

Method 1 ：

take yarn.scheduler.fair.assignmultiple Set to false.

Insert picture description here

Method 2 ：

take yarn.scheduler.fair.assignmultiple Set to true.

Insert picture description here

take yarn.scheduler.fair.max.assign Set to A smaller value （ Such as 3 - 5）.

Insert picture description here

notes ： This value depends on the number of its own computing nodes and the number of containers launched by the task , It is not a so-called fixed value .

2. Configuration does not work

Here we refer to method 2 , And then yarn.scheduler.fair.max.assign Set up in order to 2. It's restarted resource manager After service , Reschedule tasks , It was found that the configuration did not take effect , All of a single task container Are centrally scheduled to a node . Explain this batch allocation yarn.scheduler.fair.max.assign The configuration did not take effect .

Insert picture description here

So the test will yarn.scheduler.fair.assignmultiple Set to false. Then repeat the above operation , It was found that the function of batch allocation was indeed turned off , Under a task container Are divided into different nodes .

Insert picture description here

Here comes the problem , Also set up yarn.scheduler.fair.assignmultiple by true and yarn.scheduler.fair.max.assign by 2, Will not take effect at the same time .

3. Why ignore ？

After all kinds of searches , Finally found

from CDH 5.9 Start , For new clusters （ I.e. not from CDH 5.8 Upgrade to a higher version of CDH The cluster of ）, No matter what is running in the cluster NodeManager How many , Continuous scheduling is disabled by default , namely yarn.scheduler.fair.continuous-scheduling -enabled Set to false,yarn.scheduler.fair.assignmultiple Set to true also yarn.scheduler.fair.dynamic.max.assign The default is also set to true.

We know from the above , about CDH 5.9 - Hadoop 2.6.0 For later versions ,yarn.scheduler.fair.dynamic.max.assign This configuration has been added to the service configuration by default , And for true , But for the open source version Hadoop Come on ,hadoop 2.6.0 No such configuration , But in hadoop 2.8.0 Later, this configuration was added .

Here, our previous reference configurations are the corresponding version configurations of the referenced open source version , So ignore yarn.scheduler.fair.dynamic.max.assign This configuration , As a result, the configuration did not take effect .

This also makes me understand ,CDH Version and The real difference between open source versions , Before, I just thought CDH Only the package is better , The code changes are not very big , Now it seems that there is some deviation in understanding .

Be careful ： from C6.1.0 Start , Set up yarn.scheduler.fair.dynamic.max.assign and yarn.scheduler.fair.max.assign stay Cloudera Manager China open , Therefore, no safety valve is required .

4. Final solution

in other words , For in C5.x Run in CDH 5.8 And higher （ Or from CDH 5.8 Upgrade to a higher version ） The cluster of ：

Attribute yarn.scheduler.fair.assignmultiple Set to true.
Optional ： Attribute yarn.scheduler.fair.dynamic.max.assign Set to true. This requires the use of safety valves - ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml.
If you set yarn.scheduler.fair.dynamic.max.assign , Even if the attribute is set yarn.scheduler.fair.max.assign It's also ignored .

principle

Question why

FairScheduler Continuous scheduling takes too long on large clusters with many applications submitted or running . This may lead to ResourceManager There is no response , Because the time spent in continuous scheduling dominates ResourceManager The usability of .

As the number of applications increases and / Or the increase in the number of nodes in the cluster , Iterating over nodes can take a long time . Due to continuous scheduling, lock is obtained , This reduces ResourceManager In other functions （ Including regular container distribution ） The proportion of time spent on .

YARN The reason for the performance degradation

NodeManager -> ResourceManager heartbeat
stay YARN In the cluster , Every NodeManager(NM) Will regularly report to ResourceManager(RM) Send a heartbeat . These are based on yarn.resourcemanager.nodemanagers.heartbeat-interval-ms

Property occurs periodically during this heartbeat , Every NM tell RM How much unused capacity , also FairScheduler One or more containers will be allocated in this NM Up operation . By default , The interval between heartbeats is 1 second （ The second 1 Heart rate ）.

Heartbeat and container allocation
The number of containers allocated will depend on fair-scheduler.xml Set up . The flow chart is as follows ： Insert picture description here

attribute yarn.scheduler.fair.dynamic.max.assign Is in CDH 5.9（ and YARN-5035） Introduced in .

Continuously scheduled container allocation

Except for the routine （ Heartbeat based ） Outside the container distribution ,FairScheduler It also supports continuous scheduling . This can be done by attributes yarn.scheduler.fair.continuous-scheduling-enabled Turn on . When this property is set to true when , Will be in FairScheduler Start continuous scheduling .

For continuous scheduling , There is a separate thread that performs container allocation , And then according to the properties yarn.scheduler.fair.continuous-scheduling-sleep-ms Sleep for milliseconds .

stay CDH in , This value is set as the default 5 millisecond . During this period, non scheduling RM function .

Continuous scheduling is introduced , Reduce the scheduling delay to much lower than the default value of node heartbeat 1s. The continuous scheduling thread will perform scheduling by iterating over the submitted and running applications , At the same time, find free resources on the nodes in the cluster . This applies to small clusters . The scheduler can very quickly （ In a few milliseconds ） Traverse all nodes .

As the number of applications increases and / Or the increase in the number of nodes in the cluster , Iterating over nodes can take a long time . Because continuous scheduling will acquire locks , This reduces RM In other functions （ Including regular container distribution ） The proportion of time spent on .

Insert picture description here

In the three figures above , We show the running time of continuously scheduled threads in red , The remaining RM Usability . On small and lightly loaded clusters ,RM Availability will be as shown in Figure A Shown . On larger and more heavily loaded clusters ,RM Availability will be more like figure B. This has shown RM Available only half the time . In the figure C On the heavily loaded cluster ,RM It may seem unresponsive , Because all the time can be spent on continuous scheduling . This may cause other clients （ for example ：Cloudera Manager、Oozie etc. ） There is also no response .

summary ：

Continuous distribution is yarn An optimization of , It can speed up the container allocation and scheduling of tasks .
Continuous allocation can easily lead to unbalanced cluster load .
It needs to be reasonably configured according to its own cluster size and computing task size , Don't let this optimization become " stumbling block ".
attribute yarn.scheduler.fair.dynamic.max.assign Is in CDH 5.9 - Hadoop2.6.0 and Open source Hadoop-2.8.0 Introduced in .

Reference resources ：

1. https://my.cloudera.com/knowledge/FairScheduler-Tuning-With-assignmultiple-and-Continuous?id=76442
2. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
3. https://blog.csdn.net/nazeniwaresakini/article/details/105137788

原网站

版权声明
本文为[kiraraLou]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060913061955.html