当前位置:网站首页>记录一次线上死锁的定位分析
记录一次线上死锁的定位分析
2022-07-25 11:21:00 【JackLi0812】
线上死锁定位分析
现在 IT 界普遍高并发, 分布式环境, 难免遇到
死锁,死循环等问题, 平时开发我们一般都可以停掉服务, 然后打 trace ---> 编译 ---> 修改源代码 ---> 重新编译 ---> ..... ---> 解决问题, 或者通过集成开发环境(如:IDEA,Eclipse)提供的Debug功能打断点 ---> watch variable ---> 进入/跳出方法 ---> .....---> 解决问题, 但是, 当遇到生产环境却往往不知所措, 因为生产环境是不能随便让你停止服务的, 并且生产环境是不会开放 debug 端口的(debug 需要开放调试<<-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005>>), 那么, 线上死锁如何定位? 怎么---> 解决问题?
实际上, Java 已经为我们提供了一些好用的定位分析工具, 当然, 这是一句废话, 你肯定知道, 你还知道哪些命令在 JDK 的安装目录下的
bin下和java以及javac在一块躺着. 是的, 的确如此, 你可真是一个小机灵鬼哦(奈何从来没用过是不是? 今天帅帅给你一个应用场景, 哈哈…)
(昨晚睡觉前提了点代码到 jfoa 怎么也没想到导致了线上死锁…)

1. jps
1.1 jps 虚拟机进程状况工具
要定位分析 java 死锁, 死循环你得先知道发生问题的
LVMID(Logical Volume Manager ID), 就好比你要看 Windows 上的一个应用的详细信息, 你得先找到这个应用运行在哪个进程上一样.
或者你可以将LVMID理解为PID, 通常LVMID和 进程PID是一致的, 但是并不非得一样.
因此, 要想知道
LVMID, 可以通过查询 PID, 但是其一这是不保险的, 其二, 当同时启动了多个虚拟机进程时, 无法通过名称区分, 就必须得使用jps了(任务管理器里面有 n 个 java.exe).
JackLi:~ dreamli$ jps
1461
1847 GradleDaemon
1210
1851 RunnerApplication
1852 Jps
JackLi:~ dreamli$
上面是一个简单的使用, 第一列的数字就是
LVMID, 后面跟的是启动的主类. 帅帅使用的gradle构建工具, 所以有一个GradleDaemon, 然后jfoa项目的启动入口在RunnerApplication
1.2 jps 参数
jps 有以下几个参数
-q只输出LVMID
JackLi:~ dreamli$ jps -q
1875
1461
1847
1210
1851
-m输出传给 main 函数的参数
JackLi:~ dreamli$ jps -m
1876 Jps -m
1461
1847 GradleDaemon 6.2.2
1210
1851 RunnerApplication
-l输出主类的全名, 如果进程执行的是 jar, 则会输出 jar 路径
JackLi:~ dreamli$ jps -l
1461
1877 sun.tools.jps.Jps
1847 org.gradle.launcher.daemon.bootstrap.GradleDaemon
1210
1851 club.javafamily.runner.RunnerApplication
-v输出虚拟机进程启动时 JVM 参数
JackLi:~ dreamli$ jps -v
1461 -Dosgi.requiredJavaVersion=1.8 -Dosgi.instance.area.default=@user.home/eclipse-workspace -XX:+UseG1GC -XX:+UseStringDeduplication -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -Dosgi.requiredJavaVersion=1.8 -Xms256m -Xmx1024m -Xdock:icon=../Resources/Eclipse.icns -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -javaagent:/Applications/Eclipse.app/Contents/Eclipse/lombok.jar
1878 Jps -Dapplication.home=/Users/dreamli/Software/java/zulu-jfx-252/zulu-8.jdk/Contents/Home -Xms8m
1847 GradleDaemon -Xmx2g -Dfile.encoding=UTF-8 -Duser.country=CN -Duser.language=zh -Duser.variant
1210 -Xms128m -Xmx2048m -XX:ReservedCodeCacheSize=240m -XX:+UseCompressedOops -Dfile.encoding=UTF-8 -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -Djdk.http.auth.tunneling.disabledSchemes="" -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -Xverify:none -XX:ErrorFile=/Users/dreamli/java_error_in_idea_%p.log -XX:HeapDumpPath=/Users/dreamli/java_error_in_idea.hprof -javaagent:/Applications/IntelliJ IDEA.app/Contents/bin/jetbrains-agent.jar -Djb.vmOptionsFile=/Users/dreamli/Library/Preferences/IntelliJIdea2019.3/idea.vmoptions -Didea.home.path=/Applications/IntelliJ IDEA.app/Contents -Didea.executable=idea -Didea.paths.selector=IntelliJIdea2019.3
1851 RunnerApplication -Djava.rmi.server.hostname=127.0.0.1 -Dspring.profiles.active=dev -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/dreamli/Workspace/MyRepository/javafamily/jfoa/runner/build/server-temp -Duser.country=CN -Duser.language=zh -Duser.variant
2 jstack
2.1 jstack: java 堆栈跟踪工具
jstack命令用于生成当前时刻线程快照(一般称为threaddump文件或者javacore文件), 线程快照就是当前虚拟机内每一条线程正在执行的方法堆栈集合, 生成线程快照的目的是定位线程出现长时间停顿的原因, 如: 线程死锁, 死循环, 请求外部资源的长时间等待等都是导致线程长时间停顿的常见原因.
基本命令格式如下:jstack [options] lvmid
JackLi:~ dreamli$ jps -m
1461
1847 GradleDaemon 6.2.2
1210
1851 RunnerApplication
1980 Jps -m
JackLi:~ dreamli$ jstack 1851
2020-11-26 23:59:27
Full thread dump OpenJDK 64-Bit Server VM (25.252-b14 mixed mode):
"Attach Listener" #62 daemon prio=9 os_prio=31 tid=0x00007faece823800 nid=0xac03 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"MessageBroker-2" #61 prio=5 os_prio=31 tid=0x00007faed3018800 nid=0x15403 waiting on condition [0x000070000ab6c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000773124fa8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2.2 jstack 参数
jstack有一下几个参数:
-F: 当正常输出的请求不被响应时, 强制输出线程堆栈-l: 除堆栈外显示关于锁的信息-m: 调用本地方法的话, 可以显示 C/C++ 的堆栈
3. 死锁定位分析
昨晚代码撸到半夜, 提交了很多代码, 没想到…死锁------留下了心疼自己的泪水…
3.1 jps 定位 lvmid
第一步不用说了, 先找到
lvmid
3.2 jstack 获取线程堆栈
3.1 提点 Linux 小知识
- 将线程堆栈写入本地文件
jstack -m 1851 >> ....您的路径../jfoa/jstack-2020-11-26.log
- 获取远程 Linux 上的堆栈文件
scp [email protected]:/root/jfoa-logs/jstack-2020-11-26.log ./jstack.txt
3.2 线程堆栈文件分析
这里贴出来一些核心的地方
"http-nio-80-exec-6" #58 daemon prio=5 os_prio=0 tid=0x00007efc15ccb800 nid=0x40 waiting on condition [0x00007efbdd490000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e051e308> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:107)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:33)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
at club.javafamily.runner.web.portal.service.SubjectVoteService.changeVote(SubjectVoteService.java:93)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:749)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$848/1976926828.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000000dd01f1a0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"https-jsse-nio-443-exec-5" #45 daemon prio=5 os_prio=0 tid=0x00007efc15a17800 nid=0x33 waiting on condition [0x00007efbde19a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
...
Locked ownable synchronizers:
- <0x00000000e0520d38> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"https-jsse-nio-443-exec-10" #50 daemon prio=5 os_prio=0 tid=0x00007efc15c2d800 nid=0x38 waiting on condition [0x00007efbddc94000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getSubjectVoteDto(SubjectVoteService.java:61)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
Locked ownable synchronizers:
- <0x00000000e051fc38> (a java.util.concurrent.ThreadPoolExecutor$Worker)
这里主要分析
Locked ownable synchronizers以及"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
Locked ownable synchronizers显示了锁信息"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]如下
"线程名" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [等待锁状态, 0(0x0000000000000000) 代表没锁]
java.lang.Thread.State: 线程状态
然后结合方法调用堆栈分析死锁, 死循环的原因.
4 问题重现
帅帅的问题最终定位到原因是
ReentrantReadWriteLock在读锁中获取写锁.(在写锁中获取读锁不会死锁)
- Service
package org.jack.thread01;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
/** * @Description: 在读锁中获取写锁 * @Warning: 死锁 * @Author: Jack Li * @Package: thread-01 - ThreadService * @Date: Nov 26, 2020 10:57:49 PM * @Version: 1.0.0 * @TimeComplexity: Required[O(n)] ---- Current[O(n^2)] * @ExecuteResult: Success! * @Status: Accepted */
public class ThreadService {
public void method1() {
System.out.println("Method 1 准备获取读锁");
lock.readLock().lock();
System.out.println("Method 1 得到读锁");
try {
Thread.sleep(2000);
System.out.println("Method 1 准备获取写锁");
method3();
System.out.println("Method 1 获得写锁并释放");
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.readLock().unlock();
System.out.println("Method 1 释放读锁");
}
}
public void method2() {
System.out.println("Method 2 准备获取写锁");
lock.writeLock().lock();
System.out.println("Method 2 获得写锁");
try {
Thread.sleep(1000);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.writeLock().unlock();
System.out.println("Method 2 释放写锁");
}
}
public void method3() {
lock.writeLock().lock();
try {
Thread.sleep(500);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.writeLock().unlock();
}
}
private final ReadWriteLock lock = new ReentrantReadWriteLock();
}
- 程序入口
public class DealLockTest {
public static void main(String[] args) {
new Thread(() -> {
threadService.method1();
}) .start();
new Thread(() -> {
threadService.method2();
}) .start();
}
private static ThreadService threadService = new ThreadService();
// private static ThreadService2 threadService = new ThreadService2();
}
- 线程堆栈
"Thread-1" #11 prio=5 os_prio=31 tid=0x00007fbda01c2000 nid=0xa903 waiting on condition [0x0000700003e8c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.jack.thread01.ThreadService.method2(ThreadService.java:50)
at org.jack.thread01.DealLockTest.lambda$1(DealLockTest.java:11)
at org.jack.thread01.DealLockTest$$Lambda$2/295530567.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-0" #10 prio=5 os_prio=31 tid=0x00007fbda01be800 nid=0x5503 waiting on condition [0x0000700003d89000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.jack.thread01.ThreadService.method3(ThreadService.java:68)
at org.jack.thread01.ThreadService.method1(ThreadService.java:31)
at org.jack.thread01.DealLockTest.lambda$0(DealLockTest.java:7)
at org.jack.thread01.DealLockTest$$Lambda$1/250421012.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Thread-0 先获取了读锁, 然后在读锁还没释放时又去获取写锁, 获取写锁时又要等待所有的读锁释放.从而导致了死锁.

边栏推荐
- brpc源码解析(三)—— 请求其他服务器以及往socket写数据的机制
- 投屏收费背后:爱奇艺季度盈利,优酷急了?
- 'C:\xampp\php\ext\php_zip.dll' - %1 不是有效的 Win32 应用程序 解决
- PHP 上传ftp路径文件到外网服务器上 curl base64图片
- Brpc source code analysis (I) -- the main process of RPC service addition and server startup
- LeetCode 50. Pow(x,n)
- PHP curl post x-www-form-urlencoded
- Video caption (cross modal video summary / subtitle generation)
- Transformer变体(Sparse Transformer,Longformer,Switch Transformer)
- 【AI4Code】《InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees》ICSE‘21
猜你喜欢

投屏收费背后:爱奇艺季度盈利,优酷急了?

Transformer变体(Sparse Transformer,Longformer,Switch Transformer)

【GCN-RS】Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for RS (SIGIR‘22)
![[GCN multimodal RS] pre training representations of multi modal multi query e-commerce search KDD 2022](/img/9c/0434d40fa540078309249d415b3659.png)
[GCN multimodal RS] pre training representations of multi modal multi query e-commerce search KDD 2022

Application of comparative learning (lcgnn, videomoco, graphcl, XMC GaN)

对比学习的应用(LCGNN,VideoMoCo,GraphCL,XMC-GAN)

知识图谱用于推荐系统问题(MVIN,KERL,CKAN,KRED,GAEAT)

Application and innovation of low code technology in logistics management

【AI4Code】《Pythia: AI-assisted Code Completion System》(KDD 2019)

Intelligent information retrieval(智能信息检索综述)
随机推荐
JS interview question: handwriting throttle function
Transformer变体(Sparse Transformer,Longformer,Switch Transformer)
[multimodal] hit: hierarchical transformer with momentum contract for video text retrieval iccv 2021
Learning to pre train graph neural networks
php 一台服务器传图片到另一台上 curl post file_get_contents保存图片
Return and finally? Everyone, please look over here,
[high concurrency] I summarized the best learning route of concurrent programming with 10 diagrams!! (recommended Collection)
30 sets of Chinese style ppt/ creative ppt templates
Brpc source code analysis (I) -- the main process of RPC service addition and server startup
Management of software defects
[untitled]
brpc源码解析(一)—— rpc服务添加以及服务器启动主要过程
Classification parameter stack of JS common built-in object data types
Multi-Label Image Classification(多标签图像分类)
dirReader. Readentries compatibility issues. Exception error domexception
银行理财子公司蓄力布局A股;现金管理类理财产品整改加速
brpc源码解析(二)—— brpc收到请求的处理过程
【高并发】我用10张图总结出了这份并发编程最佳学习路线!!(建议收藏)
氢能创业大赛 | 国家能源局科技司副司长刘亚芳:构建高质量创新体系是我国氢能产业发展的核心
Figure neural network for recommending system problems (imp-gcn, lr-gcn)