当前位置:网站首页>Disk troubleshooting of kubernetes node
Disk troubleshooting of kubernetes node
2022-07-27 14:43:00 【New titanium cloud suit】
The new titanium cloud service has been accumulatively shared with you 667 Technical dry goods

Through this paper , You will learn about Kubernetes The correct handling method when the node encounters disk pressure , Including the cause of disk pressure and every step of troubleshooting .
No matter what application you are running , All need some basic resources .CPU、 Memory and disk space are common , Will be used for all applications . Most engineers are interested in how to deal with CPU And memory have a correct understanding , But not everyone takes the time to understand how to use disks correctly .

stay Kubernetes Environment , as time goes on , This could be catastrophic , Because once overloaded ,Kubernetes Will start “ save ” own . This is by killing pod To achieve , Thus reducing the load on the node . If the application does not know how to handle sudden exceptions correctly , This can lead to problems , Or it may result in insufficient resources to handle a given load .
Through this paper , We can well understand and deal with similar disk failures .
What is? Node Disk Pressure
Node disk pressure, seeing the name of a thing one thinks of its function , The disk connected to the node is under pressure . You are unlikely to encounter Node disk pressure, because Kubernetes Some measures are built in to avoid it , But it does happen from time to time . Although there are many factors that can lead to Node disk pressure, But you may encounter two main reasons .
You may encounter Node disk pressure The first reason is Kubernetes Unused images are not cleaned up in time . By default , It shouldn't have happened , because Kubernetes Regularly check whether there are unused images , And then delete it . This is unlikely to be the source of node disk pressure ; however , This should be kept in mind .
Another problem you are likely to encounter is the accumulation of logs .Kubernetes The default behavior in is to save the log in two cases : It will save the log of any running container , And save the log of the recently exited container , To help troubleshoot . This is an attempt to strike a balance between keeping important logs and deleting useless logs over time . however , If you have a long-running container with a large number of logs , Then these logs may accumulate enough to overload the capacity of the node disk .
Find out exactly what the problem is , You need to find out which files take up the most space .
Troubleshooting node disk pressure
To solve the problem of node disk pressure , You need to figure out which files take up the most space . because Kubernetes stay Linux Up operation , So you can run du The command is done easily . You can manually go through SSH Connect to each Kubernetes node , You can also use DaemonSet(https://www.containiq.com/post/using-kubernetes-daemonsets-effectively).
Deployment and understanding DaemonSet
To deploy DaemonSet, You can use DaemonSet Of GitHub Gist(https://gist.githubusercontent.com/omerlh/cc5724ffeea17917eb06843dbff987b7/raw/1e58c8850aeeb6d22d8061338f09e5e1534ab638/daemonset.yaml) , You can also create a file that contains the following :
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: disk-checker
labels:
app: disk-checker
spec:
selector:
matchLabels:
app: disk-checker
template:
metadata:
labels:
app: disk-checker
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- resources:
requests:
cpu: 0.15
securityContext:
privileged: true
image: busybox
imagePullPolicy: IfNotPresent
name: disk-checked
command: ["/bin/sh"]
args: ["-c", "du -a /host | sort -n -r | head -n 20"]
volumeMounts:
- name: host
mountPath: "/host"
volumes:
- name: host
hostPath:
path: "/"Now you can run the following command :
$ kubectl apply -f https://gist.githubusercontent.com/omerlh/cc5724ffeea17917eb06843dbff987b7/raw/1e58c8850aeeb6d22d8061338f09e5e1534ab638/daemonset.yamlIn the use of DaemonSet Before troubleshooting , It is important to understand what happened . If you look at the manifest file above , You will notice that it is actually a very simple service . Many of them are template files , But the important thing to note is command and args Field . This is the setup du Where the command runs , Then before printing 20 results . following , You can also see that the host volume is in the path /host Bind to container at .
Use DaemonSet
First , You need to make sure DaemonSet Deployed correctly , You can run kubectl get pods -l app=disk-checker To complete . This should produce and output the following :
$ kubectl get pods -l app=disk-checker
NAME READY STATUS RESTARTS AGE
disk-checker-bwkbj 1/1 Running 0 2s What you see here pod The number depends on the number of nodes running in the cluster . After confirming that the node is running , You can perform kubectl logs -l app=disk-checker Start checking the running pod Log . This may take some time , But finally you should see a list of files and their sizes , This will give you a deeper understanding of what takes up space on the node . What you want to do next depends on the file that takes up space —— You need to check DaemonSet And understand what is happening , And whether it is a log file 、 Application files or other files that are using your disk space .
Possible solutions
Analysis and understanding DaemonSet The output of is very important , We can solve the current problem from it . There are two possible solutions .
You may find that the problem is caused by application data , Therefore, the file cannot be deleted . under these circumstances , You will have to increase the size of the node disk to ensure that there is enough space to store application files . This is a relatively simple solution , But it will increase the cost of running the cluster . therefore , A better way is to first look at the structure of the application , See if you can find ways to reduce dependence on application files , Thus reducing the overall demand for disk usage .
On the other hand , You may find that your application generates a large number of files that are no longer needed . under these circumstances , It's as simple as deleting unnecessary files . According to the way your application is set up in terms of availability , You may just need to restart pod, Which leads to Kubernetes Automatically clean up any files in the container . Please note that , This is only done when using temporary volumes , Instead of using persistent volumes .
Last
up to now , You should know what this means when you encounter node disk pressure problems , And what your immediate thoughts should be when you encounter problems : Collect relevant error logs .
You may have to upgrade the size of the disks in the cluster , Or clean up unused files . No matter the problem or the solution , You can now better understand this problem .
Learn about the new titanium cloud service
Previous technical dry goods
· Ten thousand words long text : Cloud Architecture Design Principles | attach PDF download
· Ten thousand words long text | Use RBAC Restricted pair Kubernetes Access to resources
· Ten thousand words long text | oriented k8s Programming , How to write a Operator
· Terraform actual combat | Ten thousand words long text
· CephFS Performance benchmarking and cluster optimization | Ten thousand word summary
· Low code development , Development by the whole people , Eliminate professional programmers !
· Domestic mainstream public cloud VPC Use comparison and summary
· Ceph OSD Troubleshooting | Ten thousand words experience summary
· IT Hybrid cloud strategy : What is it? 、 Why? , How to build ?

Share

Poke at
边栏推荐
- Research on Chinese idiom metaphorical knowledge recognition and relevance based on transfer learning and text enhancement
- Flexible and easy to use WYSIWYG visual report
- 终于有人把面试必考的动态规划、链表、二叉树、字符串全部撸完了
- User question understanding and answer content organization for epidemic disease Science Popularization
- Win11壁纸变黑怎么办?Win11壁纸变黑了的解决方法
- Interprocess communication
- Unity2d -- camera follow
- 机场云商sign解析
- STM32 - capacitive touch button experiment
- Golang excellent open source project summary
猜你喜欢

在Oracle VirtualBox中导入Kali Linux官方制作的虚拟机

JS what is declaration in advance? The order of function and variable declaration in advance (the foreshadowing of execution context)

这年头谁还不会抓包,WireShark 抓包及常用协议分析送给你!

Slam overview Reading Note 4: a survey on deep learning for localization and mapping: towards the age of spatial 2020

Construction of knowledge map of financial securities and discovery of related stocks from the perspective of knowledge relevance

Redis

【STM32】EXTI

What you want most is the most comprehensive summary of C language knowledge. Don't hurry to learn

Win11壁纸变黑怎么办?Win11壁纸变黑了的解决方法

Advanced MySQL III. storage engine
随机推荐
在Oracle VirtualBox中导入Kali Linux官方制作的虚拟机
进程间通信
SLAM综述阅读笔记六:基于图像语义的SLAM调研:移动机器人自主导航面向应用的解决方案 2020
HDU4496 D-City【并查集】
Unity3D学习笔记10——纹理数组
How to return to the parent directory with commands
FPGA时序约束分享04_output delay 约束
[popular science] the difference and connection between accuracy and resolution
2022牛客多校二_ E I
【STM32】EXTI
codeforces 1708E - DFS Trees
Research on automatic classification of electronic medical records based on unbalanced data
Positive mask, negative mask, wildcard
Who can't capture packets these days? Wireshark packet capture and common protocol analysis are for you!
架构——MVC的升华
【医疗行业】DICOM converter Tools
C language layered understanding (C language array)
Simple encapsulation steps of request data request of uniapp
Detoxify! After Harbin Institute of technology was disabled MATLAB, domestic industrial software fought back domineering
力扣SQL语句习题,错题记录
