当前位置:网站首页>[yarn] CDP cluster yarn configuration capacity scheduler batch allocation

[yarn] CDP cluster yarn configuration capacity scheduler batch allocation

2022-07-06 11:32:00 kiraraLou

One 、 Preface

It's going to be upgraded recently CDH Cluster into CDP colony ,CDH In the cluster Yarn By default, the service uses fair Scheduler ,CDP The cluster uses capacity Scheduler , We've been there before The scheduler is set unreasonably due to batch allocation , As a result, tasks are allocated to certain nodes , Make the cluster resource load extremely unbalanced .

To avoid the same problem in CDP On the cluster , We conduct research in advance . Look at using Will the scheduler also have centralized allocation . But in the process of research , There are some unexpected phenomena , Waiting for follow-up .

Two 、CDH Centralized distribution

As mentioned earlier ,CDH 5.8-Hadoop2.6.0 After version , in the light of Fair dispatch , There are several configurations as follows , For task types ( Small tasks ) Accelerate the optimization of allocation .

Configuration name explain
yarn.scheduler.fair.max.assign Maximum allocation : If assignmultiple by true And dynamic.max.assign by false, Then the maximum number of containers that can be allocated in a heartbeat .
yarn.scheduler.fair.assignmultiple Assign multiple : Whether multiple containers are allowed to be allocated in a heartbeat .
yarn.scheduler.fair.dynamic.max.assign If assignmultiple It's true , Whether to dynamically determine the amount of resources that a heartbeat can allocate . After opening , About half of the unallocated resources on the node will be allocated to the container in a heartbeat . Default to true .

Through reasonable configuration , We can use centralized allocation , It will not expand the cluster load difference .

CDH How to configure a cluster is not covered here .

3、 ... and 、CDP Centralized distribution

CDP Already used in the cluster Capacity scheduling As the default scheduler , By consulting the official and Cloudera file , Find out Scheduling is also possible through heartbeat NodeManager Allocate multiple containers . The configuration is as follows :

Configuration name explain
yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled Whether to allow in a NodeManager Allocate multiple containers in the heartbeat . Default to true .
yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments If multiple-assignments-enabled by true, In a NodeManager The maximum number of containers that can be allocated in the heartbeat . The default is -1, No restrictions .
yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments If multiple-assignments-enabled by true, In a NodeManager The maximum that can be allocated in the heartbeat off-switch Number of containers . The default is 1, Indicates that only one off switch is allowed to be assigned in a heartbeat .

How to configure

  1. stay Cloudera Manager in , Select cluster > YARN Queue manager UI service .
     Insert picture description here

  2. stay YARN In the queue manager window , Click the scheduler configuration tab .
     Insert picture description here

  3. stay “ Scheduler configuration ” Window
     Insert picture description here

  4. Choose Enable Multiple Assignments Per Heartbeat Check box to allow in a NodeManager Allocate multiple containers in the heartbeat

  5. Configure the following NodeManager Heartbeat properties :

  • Maximum Container Assignments Per Heartbeat : In a NodeManager The maximum number of containers that can be allocated in the heartbeat . Set this value to -1 This restriction will be disabled .
  • Maximum Off-Switch Assignments Per Heartbeat : Can be in a NodeManager The maximum number of closed switch containers allocated in the heartbeat .

3、 ... and 、 summary

  • capacity Schedulers have similar fair The heartbeat batch allocation configuration of the scheduler .
  • CDP colony capacity The scheduler enables batch allocation by default , And the assigned quantity is 100, This value needs to be reduced .
  • Now it's tested , It is found that the configuration does not seem to be effective , This needs to be followed up by experts .

Reference resources

  1. https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/yarn-allocate-resources/topics/yarn-set-user-limits.html
  2. https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/yarn-allocate-resources/topics/yarn-configure-nm-heartbeat.html
  3. https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Reviewing_the_configuration_of_the_CapacityScheduler
原网站

版权声明
本文为[kiraraLou]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060913060954.html