当前位置:网站首页>Production problem troubleshooting reference

Production problem troubleshooting reference

2022-06-11 16:00:00 5ycode

When sorting out data , Found the problem troubleshooting manual sorted out several years ago , Under the share .

Basic principles for handling production problems :

  • Resume business at the first time ( a key )

  • Problems that cannot be solved by restarting , Roll back if you can

  • If the business is irreversible , It is time to find a solution to the problem ( Once here , explain , Major version changes , Didn't do Bplan)

  • Analyze the problem after resuming the business

 picture

perform top command

 picture
Focus on  load averag  set up A(0.41 near 5 minute )B(0.32 near 10 minute )C(0.32 near 15 minute )

  • Such as :0.41 Represents near 5 Minutes of load value , first 0.32 Represents near 10 Minutes of load value , the second 0.32 Represents near 15 Minutes of load value ;

  • hypothesis cpu The core number of is 4, When load Greater than 4 When cpu already 100% Call the police ;

  • If :A>B>C And A Gradually bigger , representative cpu Of load The value continues to rise , You can only restart at this time ( Keep the site first )

  • If :A<B<C And A Gradually smaller , representative cpu Of load Values are gradually recovering , You can observe it ;

perform sh show-busy-java-threads.sh  Find the most cost cpu The thread of

https://github.com/oldratlee/useful-scripts/blob/master/show-busy-java-threads

  • If the first few are gc And the process is relatively high , It means that the program is frequently fullgc; Object not disposed ( lock 、 Affairs, etc. ), Or large objects appear ;

  • If the first few are log4j And the process is relatively high , It represents log output backlog , Look at this time log Output , commonly log The logs have been delayed for a long time

  • If the first few are tomcat Related to , Indicates that the connection is full , New requests come , Getting links all the time ( High concurrency or slow interfaces )

  • If the first few are other businesses , Analysis is required ;
    Example :

show-busy-java-threads  -p  1111  #   Acquisition process 1111 Most expensive cpu Of 5 Threads 
show-busy-java-threads 1  10   #  Every other second , Total implementation 10 Time 
show-busy-java-threads -a 1.log  #  Output the result to 1.log In file 
show-busy-java-threads  -S ~/test/ #  take jstack Output to the current user test Under the table of contents  
sh show-busy-java-threads.sh  1 10  -S ~/test1/ -a 2.log

  comprehensive 
#  From all the running Java Find the most consumed... In the process CPU The thread of ( default 5 individual ), Print out its thread stack 
#  By default, all of the Java Find the most consumed... In the process CPU The thread of , It's more convenient to use 
#  Of course, you can manually specify what to analyze Java process Id, To make sure you only show the one you care about Java Information about the process 
show-busy-java-threads -p < designated Java process Id>
show-busy-java-threads -c < Number of thread stacks to display >
#  Multiple execution ; this 2 Parameters are used in a similar way vmstat command 
show-busy-java-threads < The number of seconds between repetitions > [< Number of repetitions >]
#  Record to file for retrospective review 
show-busy-java-threads -a < Run the output record to the file >
#  Appoint jstack Storage directory for output files , It is convenient to record subsequent analysis 
show-busy-java-threads -S < Storage jstack Directory of output files >

perform net.sh  Script

# The port of the current service ( Too high represents too many links for external requests )
echo “9016  Number of port connections :” `netstat -nat|grep -i "9016"|wc -l`
# The total number of links to the port of the current database ( Too high indicates that it may be slow sql perhaps , Pay attention to multiple data sources )
echo “3306 Number of port connections :” `netstat -nat|grep -i "3306"|wc -l`
# At present redis The total number of links ( Over representation redis The number of links for is too high , Pay attention to multiple data sources )
echo “6379 Number of port connections :” `netstat -nat|grep -i "6379"|wc -l`
#  see tcp Link status and corresponding status ( If TIME_WAIT>ESTABLISHED You need to adjust the server parameters )
echo "TCP  Link status and quantity :" `netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a,S[a]}'`
#  see tcp link TIME_WAIT The corresponding number and ip
echo `netstat -natp|grep TIME_WAIT|awk '{print $5}'|awk -F ":" '{print $1}'|sort -n|uniq -c|sort -nr`

Stack information printing jstack

May have a look https://alibaba.github.io/arthas/install-detail.html

jstack  pid >pid.log
 Such as :jstack 7040 > 7040.log
#  Count the status of different threads in the stack 
awk  -F: '/java.lang.Thread.State:/ {++S[$2]} END {for(a in S) print a,S[a]}'   7040.log

 picture

If BLOCKED Too much , It means that there are deadlocks in the program , It is recommended to search directly in the file BLOCKED Field , Locate the corresponding code
If RUNNABLE Too much , See if it is the same class , If it is , It means that the slow interface or queue has been processing ( Such as log output )

  public enum State {
        /**
         * Thread state for a thread which has not yet started.
         *  establish , But it didn't start 
         */
        NEW,

        /**
         * Thread state for a runnable thread.  A thread in the runnable
         * state is executing in the Java virtual machine but it may
         * be waiting for other resources from the operating system
         * such as processor.
         *  Running 
         */
        RUNNABLE,

        /**
         * Thread state for a thread blocked waiting for a monitor lock.
         * A thread in the blocked state is waiting for a monitor lock
         * to enter a synchronized block/method or
         * reenter a synchronized block/method after calling
         * {@link Object#wait() Object.wait}.
         *  Blocking , Waiting for lock ( Critical resources ) Such as : Get into synchroinzed block Method or re-enter synchronize block Method ,
         *  notes :java Re entry is supported reentrant  Of .
         */
        BLOCKED,

        /**
         * Thread state for a waiting thread.
         * A thread is in the waiting state due to calling one of the
         * following methods:
         *  wait for , Wait indefinitely for another thread to perform a specific action , The common ones are as follows :
         * <ul>
         *   <li>{@link Object#wait() Object.wait}  No waiting time set </li>
         *   <li>{@link #join() Thread.join}  No waiting time set </li>
         *   <li>{@link LockSupport#park() LockSupport.park}  No waiting time set </li>
         * </ul>
         *
         * <p>A thread in the waiting state is waiting for another thread to
         * perform a particular action.
         *
         * For example, a thread that has called {@code Object.wait()}
         * on an object is waiting for another thread to call
         * {@code Object.notify()} or {@code Object.notifyAll()} on
         * that object. A thread that has called {@code Thread.join()}
         * is waiting for a specified thread to terminate.
         */
        WAITING,

        /**
         * Thread state for a waiting thread with a specified waiting time.
         * A thread is in the timed waiting state due to calling one of
         * the following methods with a specified positive waiting time:
         *  Wait regularly , Wait for another thread to execute , And set a specific waiting time ( Overtime )
         * <ul>
         *   <li>{@link #sleep Thread.sleep}</li>
         *   <li>{@link Object#wait(long) Object.wait} with timeout</li>
         *   <li>{@link #join(long) Thread.join} with timeout</li>
         *   <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
         *   <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
         * </ul>
         */
        TIMED_WAITING,

        /**
         * Thread state for a terminated thread.
         * The thread has completed execution.
         *  End , When the thread has exited execution .
         */
        TERMINATED;
    }

gc Information view

Jstat -gcutil pid 1000
# 7040 The said process id,1000 Indicates how often it is output 
Jstat -gcutil 7040 1000

 picture

If O Reached 100%, And FGC Fast change , It means frequent fullgc, every time FGC Metropolis stw;

dump Heap information

 First jmap You can view specific command examples 
jmap -heap pid
Example: jmap -dump:live,format=b,file=heap.bin <pid>

jmap -dump:live,format=b,file=7040.bin  7040  # Output heap file for offline analysis ( If cpu Too high output is not recommended , Meeting stw)
Jmap -heap 7040  Output the current heap information 

 picture

appendix :
solve TIME_WAIT

vim /etc/sysctl.conf  
# Add the following 
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 2000 65500
net.ipv4.tcp_max_syn_backlog = 20480
net.ipv4.tcp_max_tw_buckets = 62000
net.core.somaxconn = 10240
vm.overcommit_memory=1
vm.swappiness = 1
 
/sbin/sysctl -p  Let the changes take effect 

Solve the problem of insufficient local threads :

vim /etc/security/limits.d/90-nproc.conf         adjustment appuser Number of threads 
appuser  soft nproc 10240 
appuser  hard nproc 10240
 adopt  ulimit -a  Query the configuration of the current user (max user processes)
原网站

版权声明
本文为[5ycode]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203011938346284.html