当前位置:网站首页>[CDH] cdh5.16 configuring the setting of yarn task centralized allocation does not take effect
[CDH] cdh5.16 configuring the setting of yarn task centralized allocation does not take effect
2022-07-06 11:31:00 【kiraraLou】
Preface
Record CDH colony yarn Service task centralized allocation configuration does not take effect .
environmental information
- CDH 5.16
- Hadoop 2.6.0
- yarn Fair scheduling mode
The course of the problem
In recent days, , The system operation and maintenance feedback said that our big data cluster has a node (nodemanager) Memory usage exceeds the alarm threshold , Trigger alarm .
After troubleshooting, it is found that the computing node (nodemanager) And nodes (nodemanager) The load gap is large , I immediately thought of it because Yarn Batch allocation is enabled .
In addition, because our tasks are all flow computing tasks , Required for a single task container Not many . So it will aggravate this phenomenon . Some nodes have been used 175G Memory , Some nodes only 40G.
Solution steps
1. Refer to online solutions
here , We have located it Yarn Batch allocation is enabled , And our task type will aggravate this phenomenon . Let's solve this problem OK 了 .
Most of the online solutions are as follows :
Method 1 :
take yarn.scheduler.fair.assignmultiple
Set to false
.
Method 2 :
- take
yarn.scheduler.fair.assignmultiple
Set totrue
.
- take
yarn.scheduler.fair.max.assign
Set to A smaller value ( Such as 3 - 5).
notes : This value depends on the number of its own computing nodes and the number of containers launched by the task , It is not a so-called fixed value .
2. Configuration does not work
Here we refer to method 2 , And then yarn.scheduler.fair.max.assign
Set up in order to 2. It's restarted resource manager
After service , Reschedule tasks , It was found that the configuration did not take effect , All of a single task container
Are centrally scheduled to a node . Explain this batch allocation yarn.scheduler.fair.max.assign
The configuration did not take effect .
So the test will yarn.scheduler.fair.assignmultiple
Set to false
. Then repeat the above operation , It was found that the function of batch allocation was indeed turned off , Under a task container Are divided into different nodes .
Here comes the problem , Also set up yarn.scheduler.fair.assignmultiple
by true
and yarn.scheduler.fair.max.assign
by 2, Will not take effect at the same time .
3. Why ignore ?
After all kinds of searches , Finally found
from CDH 5.9 Start , For new clusters ( I.e. not from CDH 5.8 Upgrade to a higher version of CDH The cluster of ), No matter what is running in the cluster NodeManager How many , Continuous scheduling is disabled by default , namely yarn.scheduler.fair.continuous-scheduling -enabled Set to false,yarn.scheduler.fair.assignmultiple Set to true also yarn.scheduler.fair.dynamic.max.assign The default is also set to true.
We know from the above , about CDH 5.9 - Hadoop 2.6.0
For later versions ,yarn.scheduler.fair.dynamic.max.assign
This configuration has been added to the service configuration by default , And for true
, But for the open source version Hadoop
Come on ,hadoop 2.6.0
No such configuration , But in hadoop 2.8.0
Later, this configuration was added .
Here, our previous reference configurations are the corresponding version configurations of the referenced open source version , So ignore yarn.scheduler.fair.dynamic.max.assign
This configuration , As a result, the configuration did not take effect .
This also makes me understand ,CDH Version and The real difference between open source versions , Before, I just thought CDH Only the package is better , The code changes are not very big , Now it seems that there is some deviation in understanding .
Be careful : from C6.1.0 Start , Set up
yarn.scheduler.fair.dynamic.max.assign
andyarn.scheduler.fair.max.assign
stay Cloudera Manager China open , Therefore, no safety valve is required .
4. Final solution
in other words , For in C5.x Run in CDH 5.8 And higher ( Or from CDH 5.8 Upgrade to a higher version ) The cluster of :
- Attribute
yarn.scheduler.fair.assignmultiple
Set totrue
. - Optional : Attribute
yarn.scheduler.fair.dynamic.max.assign
Set totrue
. This requires the use of safety valves - ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml. - If you set
yarn.scheduler.fair.dynamic.max.assign
, Even if the attribute is setyarn.scheduler.fair.max.assign
It's also ignored .
principle
Question why
FairScheduler Continuous scheduling takes too long on large clusters with many applications submitted or running . This may lead to ResourceManager There is no response , Because the time spent in continuous scheduling dominates ResourceManager The usability of .
As the number of applications increases and / Or the increase in the number of nodes in the cluster , Iterating over nodes can take a long time . Due to continuous scheduling, lock is obtained , This reduces ResourceManager In other functions ( Including regular container distribution ) The proportion of time spent on .
YARN The reason for the performance degradation
NodeManager -> ResourceManager heartbeat
stay YARN In the cluster , Every NodeManager(NM)
Will regularly report to ResourceManager(RM)
Send a heartbeat . These are based on yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
Property occurs periodically during this heartbeat , Every NM tell RM How much unused capacity , also FairScheduler One or more containers will be allocated in this NM Up operation . By default , The interval between heartbeats is 1 second ( The second 1 Heart rate ).
Heartbeat and container allocation
The number of containers allocated will depend on fair-scheduler.xml Set up . The flow chart is as follows :
attribute yarn.scheduler.fair.dynamic.max.assign Is in CDH 5.9( and YARN-5035) Introduced in .
Continuously scheduled container allocation
Except for the routine ( Heartbeat based ) Outside the container distribution ,FairScheduler
It also supports continuous scheduling . This can be done by attributes yarn.scheduler.fair.continuous-scheduling-enabled
Turn on . When this property is set to true
when , Will be in FairScheduler
Start continuous scheduling .
For continuous scheduling , There is a separate thread that performs container allocation , And then according to the properties yarn.scheduler.fair.continuous-scheduling-sleep-ms
Sleep for milliseconds .
stay CDH in , This value is set as the default 5 millisecond . During this period, non scheduling RM function .
Continuous scheduling is introduced , Reduce the scheduling delay to much lower than the default value of node heartbeat 1s. The continuous scheduling thread will perform scheduling by iterating over the submitted and running applications , At the same time, find free resources on the nodes in the cluster . This applies to small clusters . The scheduler can very quickly ( In a few milliseconds ) Traverse all nodes .
As the number of applications increases and / Or the increase in the number of nodes in the cluster , Iterating over nodes can take a long time . Because continuous scheduling will acquire locks , This reduces RM In other functions ( Including regular container distribution ) The proportion of time spent on .
In the three figures above , We show the running time of continuously scheduled threads in red , The remaining RM Usability . On small and lightly loaded clusters ,RM Availability will be as shown in Figure A Shown . On larger and more heavily loaded clusters ,RM Availability will be more like figure B. This has shown RM Available only half the time . In the figure C On the heavily loaded cluster ,RM It may seem unresponsive , Because all the time can be spent on continuous scheduling . This may cause other clients ( for example :Cloudera Manager、Oozie etc. ) There is also no response .
summary :
- Continuous distribution is
yarn
An optimization of , It can speed up the container allocation and scheduling of tasks . - Continuous allocation can easily lead to unbalanced cluster load .
- It needs to be reasonably configured according to its own cluster size and computing task size , Don't let this optimization become " stumbling block ".
- attribute yarn.scheduler.fair.dynamic.max.assign Is in
CDH 5.9 - Hadoop2.6.0
andOpen source Hadoop-2.8.0
Introduced in .
Reference resources :
1. https://my.cloudera.com/knowledge/FairScheduler-Tuning-With-assignmultiple-and-Continuous?id=76442
2. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
3. https://blog.csdn.net/nazeniwaresakini/article/details/105137788
边栏推荐
- What does BSP mean
- Aborted connection 1055898 to db:
- PHP - whether the setting error displays -php xxx When PHP executes, there is no code exception prompt
- 2019腾讯暑期实习生正式笔试
- MTCNN人脸检测
- AcWing 1298. Solution to Cao Chong's pig raising problem
- Pytorch基础
- Error connecting to MySQL database: 2059 - authentication plugin 'caching_ sha2_ The solution of 'password'
- L2-007 family real estate (25 points)
- 解决安装Failed building wheel for pillow
猜你喜欢
随机推荐
QT creator support platform
ES6 let and const commands
nodejs 详解
Case analysis of data inconsistency caused by Pt OSC table change
SQL时间注入
Image recognition - pyteseract TesseractNotFoundError: tesseract is not installed or it‘s not in your path
AcWing 179.阶乘分解 题解
常用正则表达式整理
Basic use of redis
ES6 let 和 const 命令
2019腾讯暑期实习生正式笔试
01 project demand analysis (ordering system)
Record a problem of raspberry pie DNS resolution failure
QT creator specify editor settings
Attention apply personal understanding to images
自动机器学习框架介绍与使用(flaml、h2o)
Face recognition_ recognition
Pytoch Foundation
Django running error: error loading mysqldb module solution
Software I2C based on Hal Library