当前位置:网站首页>Troubleshooting of high CPU load but low CPU usage
Troubleshooting of high CPU load but low CPU usage
2022-07-03 06:49:00 【The way of research and development】
Tell a story
Recently, services have always appeared cpu load
High alert , And the alarm often appears in the early morning of low peak period , Therefore, it is obviously not the high load caused by user traffic , however cpu buzy
But very low . View memory usage :mem.memused
near 100%, Check the disk status :swap.used
periodic (30 About minutes ) Higher , disk.io.util
low , however disk.io.avgqu-sz
( Average request queue length ) periodic (30 About minutes ) Higher , And and cpu load
high Same frequency . The machine was checked later crontab -l
, The viewing cycle is 30 Minutes of scheduled tasks , It is found that the scheduled task is puppet
, And check the execution time and cpu load
High is also right . Therefore, many of the above phenomena resonate at the same frequency , We can only show that these phenomena are strongly correlated , It's like “ The story of beer and diapers “, But what is the specific logical attribution chain ? Every link in the chain needs evidence .
Conclusion
`mem.memused` high (OS Out of memory )
-> `swap.used` high -> `disk.io.avgqu-sz` Disk operation queued -> "cpu load" high -> Trigger alarm
`puppet` Periodic tasks a large number of disk reads
To analyze problems
Our machine memory 8G.JVM Parameters :
-Xmx6g -Xms6g -Xmn3g
- Question 1 : Why?
mem.memused
Have been steadily approaching 8G? and jvm Definition 6G Only half used , It's impossible to fill up 8G?memused = MemTotal - MemFree - Buffers/Cached
. Look at the formula of statistical method , as long as jvm Do not release memory to the operating system ,Buffers/Cached
andMemFree
The size of will not change .jvm Of GC Just logical memory release , But still jvm Managed by , It's not a physical release ( therefore top View the Java process RES Columns use memory 6G about ). So it's likejvm.memory.used
Indicators will be sensitive to tracking GC It brings jvm Memory changes . From the operating system level, it is close to use 6G 了 . - Question two : Why is memory usage so high that swap Partition ?
When applying for the machine, you installed tomcat( In fact, you don't need ), After service deployment , There are two on the machine Java process , One of them is tomcat Starting up , Observe its memory usage through the following command 1.5G about .
[[email protected] ~]$ ps -p 3408 -o rss,vsz
RSS VSZ
1554172 8672328
With business services JVM Memory more and more memory is requested from the operating system , Can pass top Command to see RES The columns gradually grow to close 6G. Total memory usage = JVM1(6G) + JVM2(tomcat 1.5G)+ Not JVM Memory . Lead to OS Finally, the available memory is insufficient , And then use swap Partition
- Question 3 : Why?
cpu load
High and highcpu usage low ?
Waiting disk I/O Too many processes completed , The length of the process queue is too large , however cpu Very few processes are running , The load is too large ,cpu Low usage . - Question 4 : Why are there many disk request queues It can lead to
cpu load
high ?
uptime and top You can see it when you wait for orders load average indicators , Three numbers from left to right represent 1 minute 、5 minute 、15 Minutes of load average:
$ uptime
11:44:47 up 46 days 14:54, 2 users, load average: 2.98, 3.08, 3.02
- If the average is 0.0, It means that the system is idle
- If 1min The average is higher than 5min or 15min Average , Then the load is increasing
- If 1min The average value is lower than 5min or 15min Average , Then the load is decreasing
- If they are higher than the system CPU The number of , Then the system is likely to encounter performance problems ( As the case may be )
stay Linux in , For the whole system ,load averages yes “system load averages”, Measure the number of running and waiting threads (CPU, disk , Uninterrupted lock ), Include uninterruptible sleep The number of processes . Unlike other operating systems cpu load
The definition of ,Linux It's not just about CPU The load of resources . advantage : It includes the demand for different resources .
When you see load average When it's high , You don't know it's runnable Too many processes or uninterruptible sleep There are too many processes , It is impossible to judge CPU Not enough or IO The device has a bottleneck .
The process is in cpu The above operation requires access to disk files , This is the time cpu Will make a request to the kernel to call the file , Let the kernel pass DMA Way to get files from disk , At this time, it will switch to other processes or idle , This task will be transformed into uninterruptible sleep state . When there are too many read and write requests, it will lead to uninterruptible sleep There are too many processes in the state , This results in a high load ,cpu Low case .
#define LOAD_FREQ (5*HZ+1) /* 5 sec intervals */
* The global load average is an exponentially decaying average of nr_running +
* nr_uninterruptible.
*
* Once every LOAD_FREQ:
*
* nr_active = 0;
* for_each_possible_cpu(cpu)
* nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
*
* avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
HZ
is the kernel timer frequency, which is defined when compiling the kernel. On my system, it’s 250
:
% grep "CONFIG_HZ=" /boot/config-$(uname -r)
CONFIG_HZ=250
solve the problem
- Remove the pre installed tomcat Software
- Reduce JVM Maximum heap usage configured
Problem solved . ️ !
Reference material :
appendix :
top command :
[[email protected] ~]# top
top - 12:13:22 up 167 days, 20:47, 2 users, load average: 0.00, 0.01, 0.05
Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.1 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65759080 total, 58842616 free, 547908 used, 6368556 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 64264884 avail Mem
................
The explanation for the third line above :
us(user cpu time): In user mode cpu Time ratio . When the value is high , Explain what the user process consumes CPU More time , such as , If the value exceeds for a long time 50%, We need to optimize the program algorithm or code .
sy(system cpu time): System state cpu Time ratio .
ni(user nice cpu time): Used as a nice Weighted process assigned user state cpu Time ratio
id(idle cpu time): Idle cpu Time ratio . If the value continues to be 0, meanwhile sy yes us Twice as many , Generally speaking, the system is faced with CPU The shortage of resources .
wa(io wait cpu time):cpu Wait for disk write completion time . When the value is high , explain IO Waiting is more serious , This may be caused by random access to a large number of disks , Or there may be a bottleneck in disk performance .
hi(hardware irq): Hard interrupts take time
si(software irq): Soft interrupt consumes time
st(steal time): Virtual machines steal time
边栏推荐
- Unit test framework + Test Suite
- Yolov2 learning and summary
- [open source project recommendation colugomum] this group of undergraduates open source retail industry solutions based on the domestic deep learning framework paddlepadddle
- UTC time, GMT time, CST time
- Chapter 8. MapReduce production experience
- EasyExcel
- Golang operation redis: write and read hash type data
- 如何迁移或复制VMware虚拟机系统
- C2338 Cannot format an argument. To make type T formattable provide a formatter<T> specialization:
- Application scenarios of Catalan number
猜你喜欢
2022 - 06 - 23 vgmp - OSPF - Inter - Domain Security Policy - nat Policy (Update)
机器学习 | 简单但是能提升模型效果的特征标准化方法(RobustScaler、MinMaxScaler、StandardScaler 比较和解析)
VMware virtual machine C disk expansion
Paper notes vsalm literature review "a comprehensive survey of visual slam algorithms"
Jenkins
How to scan when Canon c3120l is a network shared printer
After the Chrome browser is updated, lodop printing cannot be called
第8章、MapReduce 生产经验
Read blog type data from mysql, Chinese garbled code - solved
2022 cisp-pte (III) command execution
随机推荐
Practical plug-ins in idea
Winter vacation work of software engineering practice
100000 bonus is divided up. Come and meet the "sister who braves the wind and waves" among the winners
10000小时定律不会让你成为编程大师,但至少是个好的起点
A letter to graduating college students
VMware virtual machine C disk expansion
Asynchronous programming: async/await in asp Net
The list of "I'm crazy about open source" was released in the first week, with 160 developers on the list
Basic teaching of crawler code
[5g NR] UE registration process
Cannot get value with @value, null
Chapter 8. MapReduce production experience
[untitled] 5 self use history
Class and object summary
UNI-APP中条件注释 实现跨段兼容、导航跳转 和 传参、组件创建使用和生命周期函数
SSH link remote server and local display of remote graphical interface
scroll-view指定滚动元素的起始位置
Pytest -- write and manage test cases
机器学习 | 简单但是能提升模型效果的特征标准化方法(RobustScaler、MinMaxScaler、StandardScaler 比较和解析)
Tool class static method calls @autowired injected service