当前位置:网站首页>kubelet node pressure eviction
kubelet node pressure eviction
2022-08-01 15:02:00 【Liangkel】
kubelet节点压力驱逐
kubelet监控集群节点的 CPU、内存、磁盘空间和文件系统的inode 等资源,根据kubeletEviction policy configuration in startup parameters,当这些资源中的一个或者多个达到特定的消耗水平,kubelet One or more nodes can be actively evictedpod,以回收资源,Reduce node resource pressure.
基于kubernets v1.17.4
1.When does eviction happen
kubeletThe following data items are combined to make an eviction decision:
(1)驱逐信号;
(2)驱逐策略;
(3)驱逐监测间隔;
1.1 驱逐信号
节点上的memory、nodefs、pidand other resources have eviction signals,kubeletThe eviction decision is made by comparing the eviction signal with the eviction policy;
The expulsion signals are listed below:
(1)memory.available;
(2)nodefs.available;
(3)nodefs.inodesFree;
(4)imagefs.available;
(5)imagefs.inodesFree;
(6)pid.available;
kubeletThe following file system partitions are supported:
(1)nodefs:The primary filesystem of the node,For local disk volumes、emptyDir、日志存储等. 例如,nodefs包含/var/lib/kubelet/;
(2)imagefs:Optional file system,For container runtime to store container image and container writable layer.
1.2 驱逐策略
kubeletThere are two types of node pressure eviction,软驱逐和硬驱逐;
软驱逐
软驱逐机制表示,当node节点的memory、nodefsAfter the resource reaches a certain threshold,需要持续观察一段时间(宽限期),If the resource is restored to below the threshold during the period,则不进行pod的驱逐,If it is above the threshold for a period of time(宽限期),则触发pod的驱逐.
kubeletSoft eviction related startup parameter configuration:
(1)eviction-soft:A set of soft eviction conditions,如memory.available<1.5Gi,nodefs.available<500Mi, If the eviction condition lasts longer than the corresponding eviction grace period,则触发pod驱逐;
(2)eviction-soft-grace-period:A set of soft eviction grace periods,如memory.available=1m30s,nodefs.available=1m30s;
(3)eviction-max-pod-grace-period:podWhen being softly evicted,停止pod中containermaximum grace period,默认值0,单位秒;
硬驱逐
There is no grace period for hard eviction policies,当达到硬驱逐条件时,kubelet会立即触发pod的驱逐,而不是优雅终止.
kubeletHard eviction related startup parameter configuration:
(1)eviction-hard:一组硬驱逐条件,如memory.available<1Mi,nodefs.available<1Mi,nodefs.inodesFree<1,kubeletThe default hard eviction condition for memory.available<100Mi,nodefs.available<10%,imagefs.available<15%,nodefs.inodesFree<5%;
Other eviction parameter configuration
(1)最小驱逐回收--eviction-minimum-reclaim;
在某些情况下,驱逐podOnly a small amount of resources can be recovered,This can lead to frequent eviction conditions that trigger evictions;
为了解决上述问题,可以配置--eviction-minimum-reclaim参数,When an eviction is triggered by an eviction signal,After the amount of resources reclaimed by eviction no longer meets the eviction conditions,会继续回收--eviction-minimum-reclaimThe amount of resources configured by the parameter;
1.3 驱逐监测间隔
If there is no eviction in a certain eviction logicpod,则会等待10sThen make the next eviction logic polling call;
2.驱逐哪些pod
2.1 内存资源
For evictions due to tight memory resources,kubeletDetermined according to the following conditionspod的驱逐顺序:
(1)pod's actual resource usage exceeds its requested amount,Exceeded priority is evicted;
(2)pod的优先级定义(pod.Spec.Priority),Smaller values are easier to evict;
(3)podThe difference between the actual resource usage and the requested amount,差值越小,easier to be expelled;
2.2 pid资源
对于因pidWhen resources are strained and evictions occur,kubeletDetermined according to the following conditionspod的驱逐顺序:
(1)pod的优先级定义(pod.Spec.Priority),Smaller values are easier to evict;
2.3 fs资源
2.3.1 有专用imagefs文件系统
Available for duenodefs大小、nodefs inodeWhen resources are strained and evictions occur,kubeletDetermined according to the following conditionspod的驱逐顺序:
(1)podactual use of resources(包括podlocal volume with pod中所有容器的日志),Those whose actual usage exceeds the requested amount are evicted first;
(2)podactual use of resources(包括podlocal volume with pod中所有容器的日志)The size of the difference between its requested amount,差值越小,easier to be expelled;
Available for dueimagefs大小、imagefs inodeWhen resources are strained and evictions occur,kubeletDetermined according to the following conditionspod的驱逐顺序:
(1)podActual usage of container writable layer resources,Those whose actual usage exceeds the requested amount are evicted first;
(2)podThe size of the difference between the actual usage of the container's writable layer resources and the amount requested,差值越小,easier to be expelled;
2.3.2 no dedicatedimagefs文件系统
Available for duefs大小、inodeWhen resources are strained and evictions occur,kubeletDetermined according to the following conditionspod的驱逐顺序:
(1)podactual use of resources(包括pod容器可写层、podlocal volume with pod中所有容器的日志),Those whose actual usage exceeds the requested amount are evicted first;
(2)podactual use of resources(包括pod容器可写层、podlocal volume with pod中所有容器的日志)The size of the difference between its requested amount,差值越小,easier to be expelled;
About whether there is a dedicatedimagefsFile system judgment
当nodefs(kubelet的根文件系统)与imagefs(dockerThe file system where the image is stored)When the partition is the same,It is judged that there is no exclusive useimagefs文件系统,Otherwise, it is judged to be dedicatedimagefs文件系统;
总结一下就是,nodefs是kubelet启动参数--root-dir目录所在分区,imagefs是dockerThe partition where the installation directory is located;
3.How to expelpod
pod驱逐流程
(1)根据kubelet启动参数配置,Get the eviction policy configuration;
(2)从cAdvisor、CRIRuntimesGet various statistics,Such as the total amount and usage of each resource on the node、The resource declaration and usage of the container, etc;
(3)Compare the eviction policy configuration and the various resource statistics above,Filter out eviction signals that trigger eviction;
(4)Sort the eviction signals filtered out above,Queues the memory eviction signal before all other signals,And take the first eviction signal from the sorted result;
(5)Actively try to recyclefs、inode资源,If the recovered resources are sufficient,则直接return,No need to execute eviction downpod的逻辑;
(6)According to the expulsion signal that is finally screened out,Use the corresponding sorting function to givepod列表进行排序;
(7)遍历排序后的pod列表,尝试驱逐pod;
几个注意点:
(1)Each eviction process,At most one will be expelledpod;
(2)Once the eviction process is complete,If this process has evictionpod,The loop execution continues immediatelypod驱逐流程,If there is no eviction in this eviction processpod,则等待10sThen execute the cycle againpod驱逐流程;
(3)驱逐pod,只是将pod.status.phase值更新为Failed,And attach expulsionreason:Evictedand details on triggering the eviction,不会删除pod;而pod.status.phase值被更新为Failed后,replicaset controllerA new one will be created againpodcall to other nodes,achieve expulsionpod的效果;
Actively try to recyclefs、inode资源
当因fs、inodeResources are stretched and evictions are neededpod时,will be deportedpod之前,Try active recycling firstfs、inode资源;
有专用imagefs文件系统
Available for duenodefs大小、nodefs inodeWhen resources are strained and evictions occur,Active recycling will not be triggeredfs、inode资源;
Available for dueimagefs大小、imagefs inodeWhen resources are strained and evictions occur,The following actions are triggered to actively recyclefs、inode资源:
(1)删除已停止的容器;
(2)Delete unused container images;
no dedicatedimagefs文件系统
Available for duefs大小、fs inodeWhen resources are strained and evictions occur,The following actions are triggered to actively recyclefs、inode资源:
(1)删除已停止的容器;
(2)Delete unused container images;
总结
kubelet监控集群节点的 CPU、内存、磁盘空间和文件系统的inode 等资源,根据kubeletEviction policy configuration in startup parameters,当这些资源中的一个或者多个达到特定的消耗水平,kubelet One or more nodes can be actively evictedpod,以回收资源,Reduce node resource pressure.
This article is from when did the eviction happen、驱逐哪些pod、How to expelpod三个角度对kubeletNode pressure eviction was analyzed.
下一篇将对kubeletDo a source code analysis for node pressure eviction.
边栏推荐
猜你喜欢

1161. 最大层内元素和

Wovent Bio IPO: Annual revenue of 480 million pension fund is a shareholder
![[Binary Tree] Path Sum II](/img/ed/741b213f620f19975bdb479de015b1.png)
[Binary Tree] Path Sum II

Longkou united chemical registration: through 550 million revenue xiu-mei li control 92.5% stake

uniapp 获取cookie与携带cookie请求数据

倪光南:openEuler已达国际同类社区水准

hzero-resource秒退

Grid布局 容器属性(一) `grid-template`系列属性

MySQL中字符串比较大小(日期字符串比较问题)

会议OA项目(六)--- (待开会议、历史会议、所有会议)
随机推荐
VIM实用指南(0)基本概念与初次体验
2022年5月20日最全摸鱼游戏导航
接口测试框架开发实践5:配置文件读取
eslint语法报错解决
CSDN配置功能总结
轮询和长轮询的区别
MySQL【创建和管理表】
你真的会测试用户登录吗?
Kernel pwn 入门 (6)
线性代数的简单应用
stm32l476芯片介绍(nvidia驱动无法找到兼容的图形硬件)
what is tail tooth feast
BPM是什么意思?BPM的优势及好处有哪些?
VIM实用指南(-1)VIM的前世今生
反序列化漏洞详解
倪光南:openEuler已达国际同类社区水准
开放原子全球开源峰会原圆满结束,openEuler模式得到参会者高度认可
MBI5020 LED Driver
预定义和自定义
MySQL:索引