当前位置:网站首页>Analysis of CPU surge in production environment service
Analysis of CPU surge in production environment service
2022-07-02 14:04:00 【Big fish eating cats】
List of articles
Preface
Recently, I encountered a service in the production environment CPU High problem , This chapter will record the whole process of problem handling , And share it with you , Hope to help people who encounter the same problem .
One 、 The cause of the matter
Operate the feedback system in the Group (3 Old project years ago ) slowly , Then I tried it myself and found that the data loading of all pages was very slow , Then the server also called the police , This service CPU Biao Shengda 100% 了 , Preliminary speculation is that there is a memory leak .
Two 、 analysis stack journal
1. List the running list of the current process
Instructions : top -c
Pictured above PID=21398 The process of CPU Has more than 100%
2. Of the printing process GC situation
Instructions :jstat -gcutil 21398 1000 1000
The survivor space is abnormal ?
S0 0.00 S1 100.00 ? Isn't the space in the survivor area strange , I didn't understand it at the first time .
Take a close look at ,JVM The parameter specifies G1GC, because G1 The layout of the pile follows HotSpot VM Others inside GC Dissimilarity —— It has only one set of logical survivor space, Not like anything else HotSpot GC There are two clear paragraphs 、 A fixed address space is used for survivor space—— So use jstat see G1 It must be survivor space 0 Show 0%,survivor space 1 Show 100%, This is normal .
In the old age, the space occupancy rate exceeded 82%, In the old days, the high space occupancy rate is generally due to the problem of code writing , This and the previously guessed memory leak problem can be matched ( In addition, all functions are normal after restarting the service to stop bleeding , Locking is a memory leak problem ).
The space occupancy rate of the method area exceeds 94.66, The method area stores the loading information of the class 、 Constant 、 Static variables, etc , According to experience, the memory overflow of the method area is caused by the incompatibility of the newly imported framework , I checked that no strange framework was introduced , Then the analysis focuses on Constants and static variables .
3. List the threads with the highest consumption under the process
Instructions :top -Hp 21398
Pictured above PID=21771 The thread of CPU already 99.9%
21771 It's a decimal number , The thread in the snapshot is hexadecimal , Convert it to hexadecimal :550b
4. Export a snapshot of the process
Instructions : jstack -l 21398 > ./21398.stack
remarks : stack The file is very small
5. View Thread exception information
Instructions :jstack -l 21398 > ./21398.stack

from stack It can only be seen from the log that many threads are waiting and have deadlock problems , If you have experience, you can start from a large number of threads time_waiting You can guess it is tcp There is a problem with the connection pool , After a global search in the project, I found that there is no thread pool configured at all , Then the specific problem needs to be analyzed dump file .
6. Of the export process dump file
jmap -dump:format=b,file=21398.dmp 21398
remarks : Exported is heap memory dump file The volume is consistent with the heap , Large size , Very slow , It may affect the service .
jmap -dump:live,format=b,file=21398.hprof 21398
remarks : What is exported is the one that lives in heap memory dump file The volume should be smaller than the stack ( recommend ),hprof It's for MAT Analytical .
3、 ... and 、dump File analysis
1.MAT Tool analysis report

2.Histogram Large object analysis

Histogram:
- Class Name : Class name ,java Class name ;
- Objects : Number of objects of class ;
- Shallow Heap : The amount of memory consumed by an object , Does not contain references to other objects ;
- Retained Heap : Can be GC The total amount recycled to memory ;
From the above analysis results , We can conclude that :
- Retained Heap Nothing to be GC The object of recycling , The real hammer is a memory leak ;
- Memory leak ’ murderer ‘ Namely byte[] object , It's bigger than 1G 了 , The key analysis ;
- char[] There are also hidden dangers , Need specific analysis .
byte[] content analysis :
byte[] Stored in http Request header information , In fact, only a few hundred of these request headers are copied B, Why all the objects here are neat and occupy about 10M, Guess it's a local configuration problem , Then I checked the configuration file first , I found it “ murderer ”, and byte[] Of the same size , The murderer found the following :
This piece should be counted as spring boot One of the bug, After you configure this parameter, it will help you default the structure you passed in size Array of sizes , If the request is large, you can directly OOM.
Then I traced it git The historical submission record found that this configuration was submitted the first time the project was built ( Obviously, it is copying the configuration of other projects ), Buried a big hole , It can be seen how important the project scaffold is .
Solution Just remove this configuration .
char[] content analysis :
This problem is obviously a log problem , Check the service log carefully , It is found that there are repeated logs ( use aop Log , stay controller There are also logs in the layer ), The daily log volume is about 2.5G about ( The service business is not complicated , The log volume is obviously wrong ), It is also found that the log printing is done by string splicing , According to the above analysis, the log transformation is as follows :
- The log output is changed from string splicing to placeholder ;
- private static final Logger logger = LoggerFactory.getLogger(xxx.class); Use annotation instead .
The above problems are basically analyzed .
3. The optimized results
After continuous observation for nearly half a month ,CPU The problem of Gao has been completely solved .
summary
Encountered in practice CPU The problem of soaring is often analyzed from multiple dimensions , Far more than some writing demo Then locate the memory overflow problem ( You can directly locate the error code ) It's a lot more complicated , Last but not least MAT Tools , It's really easy to use .
边栏推荐
- Data consistency between redis and database
- ArrayList and LinkedList
- P3807 [template] Lucas theorem /lucas theorem
- Integral link, inertia link and proportion link in Simulink
- 你的 Sleep 服务会梦到服务网格外的 bookinfo 吗
- Codeforces Round #803 (Div. 2)(A~D)
- 故事点 vs. 人天
- 抓包工具fiddler学习
- Slashgear shares 2021 life changing technology products, which are somewhat unexpected
- 【文档树、设置】字体变小
猜你喜欢

Mysql5.7 installation super easy tutorial

QT - make a simple calculator - realize four operations

The 29 year old programmer in Shanghai was sentenced to 10 months for "deleting the database and running away" on the day of his resignation!

Qt入门-制作一个简易的计算器

PyQt5_QScrollArea内容保存成图片

(POJ - 1984) navigation nightare (weighted and search set)

Launcher启动过程

selenium 元素定位方法

MySQL 45 lecture - learning from the actual battle of geek time MySQL 45 Lecture Notes - 04 | easy to understand index (Part 1)

Use of UIC in QT
随机推荐
大家信夫一站式信用平台让信用场景“用起来
The xftp connection Haikang camera reported an error: the SFTP subsystem application has been rejected. Please ensure that the SFTP subsystem settings of the SSH connection are valid
Stone merging Board [interval DP] (ordinary stone Merging & Ring Stone merging)
Design of non main lamp: how to make intelligent lighting more "intelligent"?
每天坚持20分钟go的基础二
In 2021, the global TCB adapter revenue was about $93 million, and it is expected to reach $315.5 million in 2028
Runhe hi3516 development board openharmony small system and standard system burning
Node. JS accessing PostgreSQL database through ODBC
P3008 [USACO11JAN]Roads and Planes G (SPFA + SLF优化)
Whole house Wi Fi: a pain point that no one can solve?
Launcher启动过程
Slashgear shares 2021 life changing technology products, which are somewhat unexpected
联合搜索:搜索中的所有需求
Android kotlin broadcast technology point
2022家用投影仪首选!当贝F5强悍音画效果带来极致视听体验
Solve "sub number integer", "jump happily", "turn on the light"
Pointer from entry to advanced (1)
使用BLoC 构建 Flutter的页面实例
Will your sleep service dream of the extra bookinfo on the service network
线性dp求解 最长子序列 —— 小题三则