当前位置:网站首页>Production problem troubleshooting reference
Production problem troubleshooting reference
2022-06-11 16:00:00 【5ycode】
When sorting out data , Found the problem troubleshooting manual sorted out several years ago , Under the share .
Basic principles for handling production problems :
Resume business at the first time ( a key )
Problems that cannot be solved by restarting , Roll back if you can
If the business is irreversible , It is time to find a solution to the problem ( Once here , explain , Major version changes , Didn't do Bplan)
Analyze the problem after resuming the business

perform top command

Focus on load averag set up A(0.41 near 5 minute )B(0.32 near 10 minute )C(0.32 near 15 minute )
Such as :0.41 Represents near 5 Minutes of load value , first 0.32 Represents near 10 Minutes of load value , the second 0.32 Represents near 15 Minutes of load value ;
hypothesis cpu The core number of is 4, When load Greater than 4 When cpu already 100% Call the police ;
If :A>B>C And A Gradually bigger , representative cpu Of load The value continues to rise , You can only restart at this time ( Keep the site first )
If :A<B<C And A Gradually smaller , representative cpu Of load Values are gradually recovering , You can observe it ;
perform sh show-busy-java-threads.sh Find the most cost cpu The thread of
https://github.com/oldratlee/useful-scripts/blob/master/show-busy-java-threads
If the first few are gc And the process is relatively high , It means that the program is frequently fullgc; Object not disposed ( lock 、 Affairs, etc. ), Or large objects appear ;
If the first few are log4j And the process is relatively high , It represents log output backlog , Look at this time log Output , commonly log The logs have been delayed for a long time
If the first few are tomcat Related to , Indicates that the connection is full , New requests come , Getting links all the time ( High concurrency or slow interfaces )
If the first few are other businesses , Analysis is required ;
Example :
show-busy-java-threads -p 1111 # Acquisition process 1111 Most expensive cpu Of 5 Threads
show-busy-java-threads 1 10 # Every other second , Total implementation 10 Time
show-busy-java-threads -a 1.log # Output the result to 1.log In file
show-busy-java-threads -S ~/test/ # take jstack Output to the current user test Under the table of contents
sh show-busy-java-threads.sh 1 10 -S ~/test1/ -a 2.log
comprehensive
# From all the running Java Find the most consumed... In the process CPU The thread of ( default 5 individual ), Print out its thread stack
# By default, all of the Java Find the most consumed... In the process CPU The thread of , It's more convenient to use
# Of course, you can manually specify what to analyze Java process Id, To make sure you only show the one you care about Java Information about the process
show-busy-java-threads -p < designated Java process Id>
show-busy-java-threads -c < Number of thread stacks to display >
# Multiple execution ; this 2 Parameters are used in a similar way vmstat command
show-busy-java-threads < The number of seconds between repetitions > [< Number of repetitions >]
# Record to file for retrospective review
show-busy-java-threads -a < Run the output record to the file >
# Appoint jstack Storage directory for output files , It is convenient to record subsequent analysis
show-busy-java-threads -S < Storage jstack Directory of output files >
perform net.sh Script
# The port of the current service ( Too high represents too many links for external requests )
echo “9016 Number of port connections :” `netstat -nat|grep -i "9016"|wc -l`
# The total number of links to the port of the current database ( Too high indicates that it may be slow sql perhaps , Pay attention to multiple data sources )
echo “3306 Number of port connections :” `netstat -nat|grep -i "3306"|wc -l`
# At present redis The total number of links ( Over representation redis The number of links for is too high , Pay attention to multiple data sources )
echo “6379 Number of port connections :” `netstat -nat|grep -i "6379"|wc -l`
# see tcp Link status and corresponding status ( If TIME_WAIT>ESTABLISHED You need to adjust the server parameters )
echo "TCP Link status and quantity :" `netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a,S[a]}'`
# see tcp link TIME_WAIT The corresponding number and ip
echo `netstat -natp|grep TIME_WAIT|awk '{print $5}'|awk -F ":" '{print $1}'|sort -n|uniq -c|sort -nr`
Stack information printing jstack
May have a look https://alibaba.github.io/arthas/install-detail.html
jstack pid >pid.log
Such as :jstack 7040 > 7040.log
# Count the status of different threads in the stack
awk -F: '/java.lang.Thread.State:/ {++S[$2]} END {for(a in S) print a,S[a]}' 7040.log

If BLOCKED Too much , It means that there are deadlocks in the program , It is recommended to search directly in the file BLOCKED Field , Locate the corresponding code
If RUNNABLE Too much , See if it is the same class , If it is , It means that the slow interface or queue has been processing ( Such as log output )
public enum State {
/**
* Thread state for a thread which has not yet started.
* establish , But it didn't start
*/
NEW,
/**
* Thread state for a runnable thread. A thread in the runnable
* state is executing in the Java virtual machine but it may
* be waiting for other resources from the operating system
* such as processor.
* Running
*/
RUNNABLE,
/**
* Thread state for a thread blocked waiting for a monitor lock.
* A thread in the blocked state is waiting for a monitor lock
* to enter a synchronized block/method or
* reenter a synchronized block/method after calling
* {@link Object#wait() Object.wait}.
* Blocking , Waiting for lock ( Critical resources ) Such as : Get into synchroinzed block Method or re-enter synchronize block Method ,
* notes :java Re entry is supported reentrant Of .
*/
BLOCKED,
/**
* Thread state for a waiting thread.
* A thread is in the waiting state due to calling one of the
* following methods:
* wait for , Wait indefinitely for another thread to perform a specific action , The common ones are as follows :
* <ul>
* <li>{@link Object#wait() Object.wait} No waiting time set </li>
* <li>{@link #join() Thread.join} No waiting time set </li>
* <li>{@link LockSupport#park() LockSupport.park} No waiting time set </li>
* </ul>
*
* <p>A thread in the waiting state is waiting for another thread to
* perform a particular action.
*
* For example, a thread that has called {@code Object.wait()}
* on an object is waiting for another thread to call
* {@code Object.notify()} or {@code Object.notifyAll()} on
* that object. A thread that has called {@code Thread.join()}
* is waiting for a specified thread to terminate.
*/
WAITING,
/**
* Thread state for a waiting thread with a specified waiting time.
* A thread is in the timed waiting state due to calling one of
* the following methods with a specified positive waiting time:
* Wait regularly , Wait for another thread to execute , And set a specific waiting time ( Overtime )
* <ul>
* <li>{@link #sleep Thread.sleep}</li>
* <li>{@link Object#wait(long) Object.wait} with timeout</li>
* <li>{@link #join(long) Thread.join} with timeout</li>
* <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
* <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
* </ul>
*/
TIMED_WAITING,
/**
* Thread state for a terminated thread.
* The thread has completed execution.
* End , When the thread has exited execution .
*/
TERMINATED;
}
gc Information view
Jstat -gcutil pid 1000
# 7040 The said process id,1000 Indicates how often it is output
Jstat -gcutil 7040 1000

If O Reached 100%, And FGC Fast change , It means frequent fullgc, every time FGC Metropolis stw;
dump Heap information
First jmap You can view specific command examples
jmap -heap pid
Example: jmap -dump:live,format=b,file=heap.bin <pid>
jmap -dump:live,format=b,file=7040.bin 7040 # Output heap file for offline analysis ( If cpu Too high output is not recommended , Meeting stw)
Jmap -heap 7040 Output the current heap information

appendix :
solve TIME_WAIT
vim /etc/sysctl.conf
# Add the following
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 2000 65500
net.ipv4.tcp_max_syn_backlog = 20480
net.ipv4.tcp_max_tw_buckets = 62000
net.core.somaxconn = 10240
vm.overcommit_memory=1
vm.swappiness = 1
/sbin/sysctl -p Let the changes take effect
Solve the problem of insufficient local threads :
vim /etc/security/limits.d/90-nproc.conf adjustment appuser Number of threads
appuser soft nproc 10240
appuser hard nproc 10240
adopt ulimit -a Query the configuration of the current user (max user processes)
边栏推荐
- DB4AI: 数据库驱动AI
- Yef 2022 opened yesterday. The whole process of free live broadcast on multiple network platforms opened an online technology feast!
- Streaking? Baa!
- 【0006】title、關鍵字及頁面描述
- 数据库资源负载管理(上篇)
- 验证码是自动化的天敌?阿里研究出了解决方法
- Database resource load management (Part 2)
- Application of AI in index recommendation
- postgresql创建数据库
- How to manage concurrent write operations? Get you started quickly
猜你喜欢

Verification code is the natural enemy of automation? Ali developed a solution

码农必备SQL调优(上)

From digital twinning to digital immortality, the "three-stage theory" of the development of the meta universe

Will you be punished for not wearing seat belts in the back row?

Opengauss database JDBC environment connection configuration (eclipse)

Learn automatic testing of postman interface from 0 to 1

openGauss简单查询SQL的执行流程解析

Daily blog - wechat service permission 12 matters

Open the door of the hybrid cloud market, Lenovo xcloud's way to break the situation

From repeatedly rejected manuscripts to post-90s assistant professor, Wang Hao of Rutgers University: curiosity drives me to constantly explore
随机推荐
openGauss简单查询SQL的执行流程解析
GO语言-数组Array
鼻孔插灯,智商上升,风靡硅谷,3万就成
MAUI 入门教程系列(1.框架简介)
The third generation Pentium B70 won the C-NCAP five-star safety performance again
同学,你听说过MOT吗?
openGauss数据库JDBC环境连接配置(Eclipse)
推开混合云市场大门,Lenovo xCloud的破局之道
[digital signal processing] correlation function (correlation function property | conjugate symmetry property of correlation function | even symmetry of real signal autocorrelation function | conjugat
Kaixia was selected into the 2022 global top 100 innovation institutions list of Kerui Weian
openGauss数据库闪回功能验证
Will you be punished for not wearing seat belts in the back row?
postgresql创建数据库
【sql语句基础】——删(delete) /改(update)
Memory optimization table mot management
[system safety] XLII PowerShell malicious code detection series (4) paper summary and abstract syntax tree (AST) extraction
Database design recommendations
openGauss AI能力升级,打造全新的AI-Native数据库
[Yugong series] June 2022 Net architecture class 076- execution principle of distributed middleware schedulemaster
Import data to the database? Try the copy from stdin statement