当前位置:网站首页>Fault analysis - a case of excessive CPU load caused by a large number of short-time processes
Fault analysis - a case of excessive CPU load caused by a large number of short-time processes
2022-06-09 03:25:00 【ActionTech】
author : Ren Kun
Now living in Zhuhai , He has been a full-time Oracle and MySQL DBA, Now I'm mainly responsible for MySQL、mongoDB and Redis Maintenance work .
In this paper, the source : Original contribution
* Produced by aikesheng open source community , Original content is not allowed to be used without authorization , For reprint, please contact the editor and indicate the source .
1、 background
The development environment of a project , A single virtual machine is equipped with 1 set mongo The cluster is used to test ,1 individual mongos + 3 node config + 1shard * 3 copy , A total of 7 individual mongo example . mongo edition 4.2.19,OS by centos 7.9.
After the test cpu The load is maintained at 50% about , And then mongo Of qps Has dropped to 0.
Only... Is installed on this machine mongo, Will all mongo Instance close ,cpu The load returns to normal immediately , then mongo Instance on , After a while cpu The load began to soar again . The scene can be reproduced , And confirm that it is with mongo The instance is related to .
2、 The diagnosis
perform top command ,cpu Of usr It's reached 40%, But the first few processes %cpu It doesn't add up enough .

see mongos Of qps, Indeed, the user's command has not been executed .

dstat View the overall load (vmstat Bad formatting , The last few columns are always out of alignment ).

except cpu Abnormal load , Other indicators are normal , Interruption and context switching are not high , It's unlikely that these two triggered .
perf record -ag – sleep 10 && perf report see cpu The implementation of .

There are a lot of mongo call , however API Naming is not intuitive , Unable to guess the corresponding execution logic .
thus , Confirmation is mongo Problems caused by instances , however mongo Your application connection is 0, Look at the call API The stack can't find useful information .
Back to the beginning of this article ,top Process cpu The combined utilization rate is far less than cpu Overall load , The probability is that there are frequent short-term processes that steal this part CPU resources , Lead to top The command has no time to capture Statistics .
sar -w 1 View the number of processes generated per second , Average number of new entries per second 80 Multiple processes , It should be it .

Find applications that frequently establish short-term processes , May adopt execsnoop, The tool passes through ftrace Monitor the progress in real time exec() Behavior , And output the basic information of short-time process , Including process PID/PPID、 Command line arguments .
# download execsnoop#
cd /usr/bin
wget https://raw.githubusercontent.com/brendangregg/perf‐tools/master/execsnoop
chmod 755 execsnoop
Here's the output , It's all the monitoring system , Keep connecting mongo And execute... On the output result grep Filter , Each operation spawns a new thread / process ,10s Capture 了 400 Multiple records .

take zabixx Process shutdown ,cpu Return to normal immediately , Found the culprit .
Our other environments have also adopted zabbix monitor , But they haven't encountered similar problems .
This node is deployed 7 individual mongo example ,zabbix The default is for each mongo All instances are monitored , It is equivalent to the amplification of execution loss 7 times , There is only one machine 4 nucleus CPU Virtual machine .
When these factors are put together, problems will break out . This is a development environment , Temporarily closed zabbix monitor , Subsequently, the monitoring logic should be optimized , Minimize connections db The number of times and grep The length of the call chain .
3、 Summary
When the machine cpu The load keeps rising but can't grasp top Process time , May adopt execsnoop Grab short-term processes , Similar tools include iosnoop、opensnoop.
边栏推荐
- postgresql判断数据库的主从关系
- Ccf-csp 202109-3 pulse neural network, sometimes 100 points..
- Tamidog information | Maersk completed another large-scale enterprise acquisition
- How to remove carriage return and line feed of text type when exporting data from Informix database
- opencv学习笔记一
- 洛谷P3647 [APIO2014] 连珠线 题解
- Practical combat of Youku terminal side bullet screen piercing technology: pixelai mobile terminal real-time portrait segmentation
- 神经网络学习(四)----神经网络各层知识的简单总结
- "Baget" takes you one minute to set up your own private nuget server
- Neural network learning (VI) -- understanding the relationship between deep learning and machine learning
猜你喜欢

Android 程序常用功能《清除缓存》

Tree storage structure -- three different tree representations

Optimization of static file size by rtsp/onvif protocol video platform easynvr
![[reinforcement learning notes] V value and Q value in reinforcement learning](/img/8a/c6143a8d6c0bc2c42a608acb019d95.png)
[reinforcement learning notes] V value and Q value in reinforcement learning

The easynvr hardware of the video edge computing gateway always reports an error when it is started in the service mode. How to troubleshoot and solve it?

一文搞懂Cookie+Session,Redis+Token,JWT三者的区别

85.(leaflet之家)leaflet军事标绘-直线箭头绘制

Runtime constant area - Method area

免费的视频格式转换器

Practical combat of Youku terminal side bullet screen piercing technology: pixelai mobile terminal real-time portrait segmentation
随机推荐
Scala - numerical characteristic bucket
The writing speed is increased by tens of times. The application of tdengine in tostar intelligent factory solution
ERP总体介绍
Leetcode 454. Quad add II hash
现在VB6.0已经和SQL连接了,但是使用查询功能时无法做到任意条件查询,网上的情况和我的也不太相符,请问该如何实现呢?
Customized development of blind box app system
Ccf-csp 201803-3 URL mapping 100 points
In 2022, along with the Internet layoffs, the digital economy is stepping up. As a digital hub, the Internet of things platform is moving towards the trend of IOT multi system data fusion and integrat
Is it safe to open a stock account online? What is the stock account opening process?
Zhongang Mining: fluorite resources listed in the strategic mineral catalogue
6.18 approaching, Ott becomes a new marketing battlefield
Ccf-csp 201903-3 damaged RAID5 70 points to be optimized
What's the matter with online stock account opening? Is it safe to open an account online?
How to get the preferential activities for stock account opening? Is it safe to open an account online?
Handwriting perceptron, KNN, decision tree (ID3) for binary classification of iris
神经网络学习(五)----常见的网络结构对比
What does this SQL question mean
RTSP/Onvif协议视频平台EasyNVR对静态文件大小的优化
How to remove carriage return and line feed of text type when exporting data from Informix database
Oracle connecting to PLSQL