当前位置:网站首页>Descscheduler secondary scheduling makes kubernetes load more balanced

Descscheduler secondary scheduling makes kubernetes load more balanced

2022-06-12 06:43:00 Chenshaowen's website

1. Why do I need secondary scheduling

Kubernetes The role of the scheduler is to Pod Bind to an optimal node . In order to achieve this function , The scheduler will need to perform a series of screening and scoring .

Kubernetes The scheduling of is based on Request, But every one of them Pod The actual use value of is dynamic . After a period of operation , The load on the nodes is not balanced . Some nodes are overloaded 、 However, the utilization rate of some nodes is very low .

therefore , We need a mechanism , Give Way Pod Can be healthier 、 More balanced dynamic distribution on cluster nodes , Instead of being fixed on a host after one-time scheduling .

2. descheduler Several operation modes of

descheduler yes kubernetes-sigs Sub projects under , First, clone the code locally , Enter project directory :

12
git clone https://github.com/kubernetes-sigs/deschedulercd descheduler

If the operating environment cannot be pulled gcr Mirror image , Can be k8s.gcr.io/descheduler/descheduler Replace with k8simage/descheduler.

  • Disposable Job

Only once

123
kubectl create -f kubernetes/base/rbac.yamlkubectl create -f kubernetes/base/configmap.yamlkubectl create -f kubernetes/job/job.yaml
  • Timing task CronJob

The default is */2 * * * * every other 2 Once per minute

123
kubectl create -f kubernetes/base/rbac.yamlkubectl create -f kubernetes/base/configmap.yamlkubectl create -f kubernetes/cronjob/cronjob.yaml
  • Permanent mission Deployment

The default is --descheduling-interval 5m every other 5 Once per minute

kubectl create -f kubernetes/base/rbac.yamlkubectl create -f kubernetes/base/configmap.yamlkubectl create -f kubernetes/deployment/deployment.yaml
  • CLI Command line

First, generate a policy file locally , And then execute descheduler command

1
descheduler -v=3 --evict-local-storage-pods --policy-config-file=pod-life-time.yml

descheduler Yes --help Parameter to view relevant help documents .

 1 2 3 4 5 6 7 8 91011
descheduler --helpThe descheduler evicts pods which may be bound to less desired nodesUsage: descheduler [flags] descheduler [command]Available Commands: completion generate the autocompletion script for the specified shell help Help about any command version Version of descheduler

3. Test scheduling effect

  • cordon Some nodes , Only one node is allowed to participate in scheduling
1234567
kubectl get nodeNAME STATUS ROLES AGE VERSIONnode2 Ready,SchedulingDisabled worker 69d v1.23.0node3 Ready control-plane,master,worker 85d v1.23.0node4 Ready,SchedulingDisabled worker 69d v1.23.0node5 Ready,SchedulingDisabled worker 85d v1.23.0
  • Run one 40 Application of the number of copies

You can observe that all copies of this application are node3 Node .

kubectl get pod -o wide|grep nginx-645dcf64c8|grep node3|wc -l       40
  • Deployment in cluster descheduler

What we use here is Deployment The way .

123
kubectl -n kube-system get pod |grep deschedulerdescheduler-8446895b76-7vq4q 1/1 Running 0 6m9s
  • Release node scheduling

Before dispatching , All copies are centralized in node3 node

1234567
kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% node2 218m 6% 3013Mi 43% node3 527m 14% 4430Mi 62% node4 168m 4% 2027Mi 28% node5 93m 15% 785Mi 63% 

Release node scheduling

1234567
kubectl get node NAME STATUS ROLES AGE VERSIONnode2 Ready worker 69d v1.23.0node3 Ready control-plane,master,worker 85d v1.23.0node4 Ready worker 69d v1.23.0node5 Ready worker 85d v1.23.0
  • see descheduler Related logs

When the timing requirements are met ,descheduler Will begin to expel according to the strategy Pod.

 1 2 3 4 5 6 7 8 9101112
kubectl -n kube-system logs descheduler-8446895b76-7vq4q -fI0610 10:00:26.673573 1 event.go:294] "Event occurred" object="default/nginx-645dcf64c8-z9n8k" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerLowNodeUtilization"I0610 10:00:26.798506 1 evictions.go:163] "Evicted pod" pod="default/nginx-645dcf64c8-2qm5c" reason="RemoveDuplicatePods" strategy="RemoveDuplicatePods" node="node3"I0610 10:00:26.799245 1 event.go:294] "Event occurred" object="default/nginx-645dcf64c8-2qm5c" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerRemoveDuplicatePods"I0610 10:00:26.893932 1 evictions.go:163] "Evicted pod" pod="default/nginx-645dcf64c8-9ps2g" reason="RemoveDuplicatePods" strategy="RemoveDuplicatePods" node="node3"I0610 10:00:26.894540 1 event.go:294] "Event occurred" object="default/nginx-645dcf64c8-9ps2g" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerRemoveDuplicatePods"I0610 10:00:26.992410 1 evictions.go:163] "Evicted pod" pod="default/nginx-645dcf64c8-kt7zt" reason="RemoveDuplicatePods" strategy="RemoveDuplicatePods" node="node3"I0610 10:00:26.993064 1 event.go:294] "Event occurred" object="default/nginx-645dcf64c8-kt7zt" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerRemoveDuplicatePods"I0610 10:00:27.122106 1 evictions.go:163] "Evicted pod" pod="default/nginx-645dcf64c8-lk9pd" reason="RemoveDuplicatePods" strategy="RemoveDuplicatePods" node="node3"I0610 10:00:27.122776 1 event.go:294] "Event occurred" object="default/nginx-645dcf64c8-lk9pd" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerRemoveDuplicatePods"I0610 10:00:27.225304 1 evictions.go:163] "Evicted pod" pod="default/nginx-645dcf64c8-mztjb" reason="RemoveDuplicatePods" strategy="RemoveDuplicatePods" node="node3"
  • After the second scheduling Pod Distribution

Node load ,node3 falling , Other nodes have risen a little .

1234567
kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% node2 300m 8% 3158Mi 45% node3 450m 12% 3991Mi 56% node4 190m 5% 2331Mi 32% node5 111m 18% 910Mi 73% 

Pod Distribution on nodes , This is done without configuring any affinity 、 In the anti affinity scenario .

node Pod Number ( common 40 copy )
node211
node310
node411
node58

Pod The quantity distribution of is very balanced , among node2-4 The virtual machine configuration is the same ,node5 Low configuration . The following figure shows the whole process :

4. descheduler Scheduling strategy

View the default policy configuration recommended by the official warehouse :

 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829
cat kubernetes/base/configmap.yaml---apiVersion: v1kind: ConfigMapmetadata: name: descheduler-policy-configmap namespace: kube-systemdata: policy.yaml: | apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "RemoveDuplicates": enabled: true "RemovePodsViolatingInterPodAntiAffinity": enabled: true "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: "cpu" : 20 "memory": 20 "pods": 20 targetThresholds: "cpu" : 50 "memory": 50 "pods": 50

The default is on RemoveDuplicates、RemovePodsViolatingInterPodAntiAffinity、LowNodeUtilization Strategy . We can configure it according to the actual scenario .

descheduler At present, the following scheduling strategies are provided :

  • RemoveDuplicates

Evict multiple nodes on the same node Pod

  • LowNodeUtilization

Find low load nodes , Evict from other nodes Pod

  • HighNodeUtilization

Find high load nodes , Get rid of the above Pod

  • RemovePodsViolatingInterPodAntiAffinity

Expulsion violation Pod Anti affinity Pod

  • RemovePodsViolatingNodeAffinity

Expulsion violation Node Anti affinity Pod

  • RemovePodsViolatingNodeTaints

In violation of the NoSchedule Stained Pod

  • RemovePodsViolatingTopologySpreadConstraint

Expel those that violate the topology domain Pod

  • RemovePodsHavingTooManyRestarts

Evict those who restart too many times Pod

  • PodLifeTime

Eviction operation time exceeds the specified time Pod

  • RemoveFailedPods

Banish the state of failure Pod

5. descheduler What are the applicable scenarios

descheduler Our perspective is dynamic , It includes two aspects :Node and Pod.Node Dynamic means ,Node The label of 、 The stain 、 To configure 、 When the quantity, etc. changes .Pod Dynamic means ,Pod Actual resource usage value of 、 stay Node The distribution on is not constant .

According to these dynamic characteristics , The following applicable scenarios can be summarized :

  • New node added
  • After the node restarts
  • Modify the node topology domain 、 After the stain , Hope that the stock of Pod It can also satisfy the topological domain 、 The stain
  • Pod There is no balanced distribution among different nodes

If it's because Pod The actual use value of is far more than Reqeust value , A better way is to adjust Request value , Rather than let Pod Rescheduling .

6. Reference resources

原网站

版权声明
本文为[Chenshaowen's website]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206120637598276.html