当前位置:网站首页>[CDH] cdh5.16 configuring the setting of yarn task centralized allocation does not take effect
[CDH] cdh5.16 configuring the setting of yarn task centralized allocation does not take effect
2022-07-06 11:31:00 【kiraraLou】
Preface
Record CDH colony yarn Service task centralized allocation configuration does not take effect .
environmental information
- CDH 5.16
- Hadoop 2.6.0
- yarn Fair scheduling mode
The course of the problem
In recent days, , The system operation and maintenance feedback said that our big data cluster has a node (nodemanager) Memory usage exceeds the alarm threshold , Trigger alarm .
After troubleshooting, it is found that the computing node (nodemanager) And nodes (nodemanager) The load gap is large , I immediately thought of it because Yarn Batch allocation is enabled .
In addition, because our tasks are all flow computing tasks , Required for a single task container Not many . So it will aggravate this phenomenon . Some nodes have been used 175G Memory , Some nodes only 40G.
Solution steps
1. Refer to online solutions
here , We have located it Yarn Batch allocation is enabled , And our task type will aggravate this phenomenon . Let's solve this problem OK 了 .
Most of the online solutions are as follows :
Method 1 :
take yarn.scheduler.fair.assignmultiple Set to false.

Method 2 :
- take
yarn.scheduler.fair.assignmultipleSet totrue.

- take
yarn.scheduler.fair.max.assignSet to A smaller value ( Such as 3 - 5).

notes : This value depends on the number of its own computing nodes and the number of containers launched by the task , It is not a so-called fixed value .
2. Configuration does not work
Here we refer to method 2 , And then yarn.scheduler.fair.max.assign Set up in order to 2. It's restarted resource manager After service , Reschedule tasks , It was found that the configuration did not take effect , All of a single task container Are centrally scheduled to a node . Explain this batch allocation yarn.scheduler.fair.max.assign The configuration did not take effect .

So the test will yarn.scheduler.fair.assignmultiple Set to false. Then repeat the above operation , It was found that the function of batch allocation was indeed turned off , Under a task container Are divided into different nodes .

Here comes the problem , Also set up yarn.scheduler.fair.assignmultiple by true and yarn.scheduler.fair.max.assign by 2, Will not take effect at the same time .
3. Why ignore ?
After all kinds of searches , Finally found
from CDH 5.9 Start , For new clusters ( I.e. not from CDH 5.8 Upgrade to a higher version of CDH The cluster of ), No matter what is running in the cluster NodeManager How many , Continuous scheduling is disabled by default , namely yarn.scheduler.fair.continuous-scheduling -enabled Set to false,yarn.scheduler.fair.assignmultiple Set to true also yarn.scheduler.fair.dynamic.max.assign The default is also set to true.
We know from the above , about CDH 5.9 - Hadoop 2.6.0 For later versions ,yarn.scheduler.fair.dynamic.max.assign This configuration has been added to the service configuration by default , And for true , But for the open source version Hadoop Come on ,hadoop 2.6.0 No such configuration , But in hadoop 2.8.0 Later, this configuration was added .
Here, our previous reference configurations are the corresponding version configurations of the referenced open source version , So ignore yarn.scheduler.fair.dynamic.max.assign This configuration , As a result, the configuration did not take effect .
This also makes me understand ,CDH Version and The real difference between open source versions , Before, I just thought CDH Only the package is better , The code changes are not very big , Now it seems that there is some deviation in understanding .
Be careful : from C6.1.0 Start , Set up
yarn.scheduler.fair.dynamic.max.assignandyarn.scheduler.fair.max.assignstay Cloudera Manager China open , Therefore, no safety valve is required .
4. Final solution
in other words , For in C5.x Run in CDH 5.8 And higher ( Or from CDH 5.8 Upgrade to a higher version ) The cluster of :
- Attribute
yarn.scheduler.fair.assignmultipleSet totrue. - Optional : Attribute
yarn.scheduler.fair.dynamic.max.assignSet totrue. This requires the use of safety valves - ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml. - If you set
yarn.scheduler.fair.dynamic.max.assign, Even if the attribute is setyarn.scheduler.fair.max.assignIt's also ignored .
principle
Question why
FairScheduler Continuous scheduling takes too long on large clusters with many applications submitted or running . This may lead to ResourceManager There is no response , Because the time spent in continuous scheduling dominates ResourceManager The usability of .
As the number of applications increases and / Or the increase in the number of nodes in the cluster , Iterating over nodes can take a long time . Due to continuous scheduling, lock is obtained , This reduces ResourceManager In other functions ( Including regular container distribution ) The proportion of time spent on .
YARN The reason for the performance degradation
NodeManager -> ResourceManager heartbeat
stay YARN In the cluster , Every NodeManager(NM) Will regularly report to ResourceManager(RM) Send a heartbeat . These are based on yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
Property occurs periodically during this heartbeat , Every NM tell RM How much unused capacity , also FairScheduler One or more containers will be allocated in this NM Up operation . By default , The interval between heartbeats is 1 second ( The second 1 Heart rate ).
Heartbeat and container allocation
The number of containers allocated will depend on fair-scheduler.xml Set up . The flow chart is as follows :
attribute yarn.scheduler.fair.dynamic.max.assign Is in CDH 5.9( and YARN-5035) Introduced in .
Continuously scheduled container allocation
Except for the routine ( Heartbeat based ) Outside the container distribution ,FairScheduler It also supports continuous scheduling . This can be done by attributes yarn.scheduler.fair.continuous-scheduling-enabled Turn on . When this property is set to true when , Will be in FairScheduler Start continuous scheduling .
For continuous scheduling , There is a separate thread that performs container allocation , And then according to the properties yarn.scheduler.fair.continuous-scheduling-sleep-ms Sleep for milliseconds .
stay CDH in , This value is set as the default 5 millisecond . During this period, non scheduling RM function .
Continuous scheduling is introduced , Reduce the scheduling delay to much lower than the default value of node heartbeat 1s. The continuous scheduling thread will perform scheduling by iterating over the submitted and running applications , At the same time, find free resources on the nodes in the cluster . This applies to small clusters . The scheduler can very quickly ( In a few milliseconds ) Traverse all nodes .
As the number of applications increases and / Or the increase in the number of nodes in the cluster , Iterating over nodes can take a long time . Because continuous scheduling will acquire locks , This reduces RM In other functions ( Including regular container distribution ) The proportion of time spent on .

In the three figures above , We show the running time of continuously scheduled threads in red , The remaining RM Usability . On small and lightly loaded clusters ,RM Availability will be as shown in Figure A Shown . On larger and more heavily loaded clusters ,RM Availability will be more like figure B. This has shown RM Available only half the time . In the figure C On the heavily loaded cluster ,RM It may seem unresponsive , Because all the time can be spent on continuous scheduling . This may cause other clients ( for example :Cloudera Manager、Oozie etc. ) There is also no response .
summary :
- Continuous distribution is
yarnAn optimization of , It can speed up the container allocation and scheduling of tasks . - Continuous allocation can easily lead to unbalanced cluster load .
- It needs to be reasonably configured according to its own cluster size and computing task size , Don't let this optimization become " stumbling block ".
- attribute yarn.scheduler.fair.dynamic.max.assign Is in
CDH 5.9 - Hadoop2.6.0andOpen source Hadoop-2.8.0Introduced in .
Reference resources :
1. https://my.cloudera.com/knowledge/FairScheduler-Tuning-With-assignmultiple-and-Continuous?id=76442
2. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
3. https://blog.csdn.net/nazeniwaresakini/article/details/105137788
边栏推荐
- [Bluebridge cup 2021 preliminary] weight weighing
- Learning question 1:127.0.0.1 refused our visit
- How to set up voice recognition on the computer with shortcut keys
- ImportError: libmysqlclient. so. 20: Cannot open shared object file: no such file or directory solution
- Attention apply personal understanding to images
- 自动机器学习框架介绍与使用(flaml、h2o)
- Install mongdb tutorial and redis tutorial under Windows
- Error reporting solution - io UnsupportedOperation: can‘t do nonzero end-relative seeks
- Codeforces Round #753 (Div. 3)
- AcWing 179. Factorial decomposition problem solution
猜你喜欢

Learning question 1:127.0.0.1 refused our visit

【CDH】CDH5.16 配置 yarn 任务集中分配设置不生效问题

Vs2019 use wizard to generate an MFC Application
![[Blue Bridge Cup 2017 preliminary] grid division](/img/e9/e49556d0867840148a60ff4906f78e.png)
[Blue Bridge Cup 2017 preliminary] grid division

人脸识别 face_recognition

Neo4j installation tutorial

neo4j安装教程

QT creator test

Machine learning notes week02 convolutional neural network

分布式节点免密登录
随机推荐
机器学习笔记-Week02-卷积神经网络
小L的试卷
SQL时间注入
Codeforces Round #771 (Div. 2)
Attention apply personal understanding to images
Heating data in data lake?
自动机器学习框架介绍与使用(flaml、h2o)
【flink】flink学习
L2-001 紧急救援 (25 分)
使用lambda在循环中传参时,参数总为同一个值
01 project demand analysis (ordering system)
Library function -- (continuous update)
Learning question 1:127.0.0.1 refused our visit
{一周总结}带你走进js知识的海洋
wangeditor富文本组件-复制可用
Punctual atom stm32f103zet6 download serial port pin
Face recognition_ recognition
QT creator design user interface
AcWing 1294.樱花 题解
What does BSP mean