当前位置:网站首页>Production problem troubleshooting reference
Production problem troubleshooting reference
2022-06-11 16:00:00 【5ycode】
When sorting out data , Found the problem troubleshooting manual sorted out several years ago , Under the share .
Basic principles for handling production problems :
Resume business at the first time ( a key )
Problems that cannot be solved by restarting , Roll back if you can
If the business is irreversible , It is time to find a solution to the problem ( Once here , explain , Major version changes , Didn't do Bplan)
Analyze the problem after resuming the business

perform top command

Focus on load averag set up A(0.41 near 5 minute )B(0.32 near 10 minute )C(0.32 near 15 minute )
Such as :0.41 Represents near 5 Minutes of load value , first 0.32 Represents near 10 Minutes of load value , the second 0.32 Represents near 15 Minutes of load value ;
hypothesis cpu The core number of is 4, When load Greater than 4 When cpu already 100% Call the police ;
If :A>B>C And A Gradually bigger , representative cpu Of load The value continues to rise , You can only restart at this time ( Keep the site first )
If :A<B<C And A Gradually smaller , representative cpu Of load Values are gradually recovering , You can observe it ;
perform sh show-busy-java-threads.sh Find the most cost cpu The thread of
https://github.com/oldratlee/useful-scripts/blob/master/show-busy-java-threads
If the first few are gc And the process is relatively high , It means that the program is frequently fullgc; Object not disposed ( lock 、 Affairs, etc. ), Or large objects appear ;
If the first few are log4j And the process is relatively high , It represents log output backlog , Look at this time log Output , commonly log The logs have been delayed for a long time
If the first few are tomcat Related to , Indicates that the connection is full , New requests come , Getting links all the time ( High concurrency or slow interfaces )
If the first few are other businesses , Analysis is required ;
Example :
show-busy-java-threads -p 1111 # Acquisition process 1111 Most expensive cpu Of 5 Threads
show-busy-java-threads 1 10 # Every other second , Total implementation 10 Time
show-busy-java-threads -a 1.log # Output the result to 1.log In file
show-busy-java-threads -S ~/test/ # take jstack Output to the current user test Under the table of contents
sh show-busy-java-threads.sh 1 10 -S ~/test1/ -a 2.log
comprehensive
# From all the running Java Find the most consumed... In the process CPU The thread of ( default 5 individual ), Print out its thread stack
# By default, all of the Java Find the most consumed... In the process CPU The thread of , It's more convenient to use
# Of course, you can manually specify what to analyze Java process Id, To make sure you only show the one you care about Java Information about the process
show-busy-java-threads -p < designated Java process Id>
show-busy-java-threads -c < Number of thread stacks to display >
# Multiple execution ; this 2 Parameters are used in a similar way vmstat command
show-busy-java-threads < The number of seconds between repetitions > [< Number of repetitions >]
# Record to file for retrospective review
show-busy-java-threads -a < Run the output record to the file >
# Appoint jstack Storage directory for output files , It is convenient to record subsequent analysis
show-busy-java-threads -S < Storage jstack Directory of output files >
perform net.sh Script
# The port of the current service ( Too high represents too many links for external requests )
echo “9016 Number of port connections :” `netstat -nat|grep -i "9016"|wc -l`
# The total number of links to the port of the current database ( Too high indicates that it may be slow sql perhaps , Pay attention to multiple data sources )
echo “3306 Number of port connections :” `netstat -nat|grep -i "3306"|wc -l`
# At present redis The total number of links ( Over representation redis The number of links for is too high , Pay attention to multiple data sources )
echo “6379 Number of port connections :” `netstat -nat|grep -i "6379"|wc -l`
# see tcp Link status and corresponding status ( If TIME_WAIT>ESTABLISHED You need to adjust the server parameters )
echo "TCP Link status and quantity :" `netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a,S[a]}'`
# see tcp link TIME_WAIT The corresponding number and ip
echo `netstat -natp|grep TIME_WAIT|awk '{print $5}'|awk -F ":" '{print $1}'|sort -n|uniq -c|sort -nr`
Stack information printing jstack
May have a look https://alibaba.github.io/arthas/install-detail.html
jstack pid >pid.log
Such as :jstack 7040 > 7040.log
# Count the status of different threads in the stack
awk -F: '/java.lang.Thread.State:/ {++S[$2]} END {for(a in S) print a,S[a]}' 7040.log

If BLOCKED Too much , It means that there are deadlocks in the program , It is recommended to search directly in the file BLOCKED Field , Locate the corresponding code
If RUNNABLE Too much , See if it is the same class , If it is , It means that the slow interface or queue has been processing ( Such as log output )
public enum State {
/**
* Thread state for a thread which has not yet started.
* establish , But it didn't start
*/
NEW,
/**
* Thread state for a runnable thread. A thread in the runnable
* state is executing in the Java virtual machine but it may
* be waiting for other resources from the operating system
* such as processor.
* Running
*/
RUNNABLE,
/**
* Thread state for a thread blocked waiting for a monitor lock.
* A thread in the blocked state is waiting for a monitor lock
* to enter a synchronized block/method or
* reenter a synchronized block/method after calling
* {@link Object#wait() Object.wait}.
* Blocking , Waiting for lock ( Critical resources ) Such as : Get into synchroinzed block Method or re-enter synchronize block Method ,
* notes :java Re entry is supported reentrant Of .
*/
BLOCKED,
/**
* Thread state for a waiting thread.
* A thread is in the waiting state due to calling one of the
* following methods:
* wait for , Wait indefinitely for another thread to perform a specific action , The common ones are as follows :
* <ul>
* <li>{@link Object#wait() Object.wait} No waiting time set </li>
* <li>{@link #join() Thread.join} No waiting time set </li>
* <li>{@link LockSupport#park() LockSupport.park} No waiting time set </li>
* </ul>
*
* <p>A thread in the waiting state is waiting for another thread to
* perform a particular action.
*
* For example, a thread that has called {@code Object.wait()}
* on an object is waiting for another thread to call
* {@code Object.notify()} or {@code Object.notifyAll()} on
* that object. A thread that has called {@code Thread.join()}
* is waiting for a specified thread to terminate.
*/
WAITING,
/**
* Thread state for a waiting thread with a specified waiting time.
* A thread is in the timed waiting state due to calling one of
* the following methods with a specified positive waiting time:
* Wait regularly , Wait for another thread to execute , And set a specific waiting time ( Overtime )
* <ul>
* <li>{@link #sleep Thread.sleep}</li>
* <li>{@link Object#wait(long) Object.wait} with timeout</li>
* <li>{@link #join(long) Thread.join} with timeout</li>
* <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
* <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
* </ul>
*/
TIMED_WAITING,
/**
* Thread state for a terminated thread.
* The thread has completed execution.
* End , When the thread has exited execution .
*/
TERMINATED;
}
gc Information view
Jstat -gcutil pid 1000
# 7040 The said process id,1000 Indicates how often it is output
Jstat -gcutil 7040 1000

If O Reached 100%, And FGC Fast change , It means frequent fullgc, every time FGC Metropolis stw;
dump Heap information
First jmap You can view specific command examples
jmap -heap pid
Example: jmap -dump:live,format=b,file=heap.bin <pid>
jmap -dump:live,format=b,file=7040.bin 7040 # Output heap file for offline analysis ( If cpu Too high output is not recommended , Meeting stw)
Jmap -heap 7040 Output the current heap information

appendix :
solve TIME_WAIT
vim /etc/sysctl.conf
# Add the following
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 2000 65500
net.ipv4.tcp_max_syn_backlog = 20480
net.ipv4.tcp_max_tw_buckets = 62000
net.core.somaxconn = 10240
vm.overcommit_memory=1
vm.swappiness = 1
/sbin/sysctl -p Let the changes take effect
Solve the problem of insufficient local threads :
vim /etc/security/limits.d/90-nproc.conf adjustment appuser Number of threads
appuser soft nproc 10240
appuser hard nproc 10240
adopt ulimit -a Query the configuration of the current user (max user processes)
边栏推荐
- 干掉 Swagger UI,这款神器更好用、更高效!
- Verification code is the natural enemy of automation? Ali developed a solution
- Discussion on opengauss parallel decoding
- Learn automatic testing of postman interface from 0 to 1
- Go language slice
- Nat commun | language model can learn complex molecular distribution
- Using cloud DB to build apps quick start - quick games
- [system safety] XLII PowerShell malicious code detection series (4) paper summary and abstract syntax tree (AST) extraction
- Hands on, how should selenium deal with pseudo elements?
- GO語言-值類型和引用類型
猜你喜欢

Analysis of breadcrumb usage scenarios on websites

Cloud data management will break the island of storage and the island of team

Yef 2022 opened yesterday. The whole process of free live broadcast on multiple network platforms opened an online technology feast!

Tianjin Port coke wharf hand in hand map flapping software to visually unlock the smart coke port

Why are bugs changing more and more?

It's really not human to let the express delivery arrive before the refund

让快递快到来不及退款的,真的不是人

Introduction and use of etcd
![[0006] titre, mots clés et description de la page](/img/28/973bdb04420c9e6e9a2331663c6948.png)
[0006] titre, mots clés et description de la page

Classmate, have you heard of mot?
随机推荐
It's really not human to let the express delivery arrive before the refund
Tianjin Port coke wharf hand in hand map flapping software to visually unlock the smart coke port
Zero foundation self-study software test, I spent 7 days sorting out a set of learning routes, hoping to help you
postgresql创建表
Analysis of the execution process of opengauss simple query SQL
Maui introductory tutorial series (1. framework introduction)
Collect | thoroughly understand the meaning and calculation of receptive field
[Yugong series] June 2022 Net architecture class 077 distributed middleware schedulemaster loading assembly timing task
收藏 |彻底搞懂感受野的含义与计算
让快递快到来不及退款的,真的不是人
openGauss企业版安装
了解下openGauss的密态支持函数/存储过程
数据库资源负载管理(下篇)
Yef 2022 opened yesterday. The whole process of free live broadcast on multiple network platforms opened an online technology feast!
Opengauss database JDBC environment connection configuration (eclipse)
openGauss数据库ODBC环境连接配置(Windows)
How does the taskbar under the computer display open programs
Application of AI in index recommendation
Nielseniq announces appointment of Tracey Massey as chief operating officer
【愚公系列】2022年06月 .NET架构班 077-分布式中间件 ScheduleMaster加载程序集定时任务