当前位置:网站首页>记录一次线上死锁的定位分析
记录一次线上死锁的定位分析
2022-07-25 11:21:00 【JackLi0812】
线上死锁定位分析
现在 IT 界普遍高并发, 分布式环境, 难免遇到
死锁,死循环等问题, 平时开发我们一般都可以停掉服务, 然后打 trace ---> 编译 ---> 修改源代码 ---> 重新编译 ---> ..... ---> 解决问题, 或者通过集成开发环境(如:IDEA,Eclipse)提供的Debug功能打断点 ---> watch variable ---> 进入/跳出方法 ---> .....---> 解决问题, 但是, 当遇到生产环境却往往不知所措, 因为生产环境是不能随便让你停止服务的, 并且生产环境是不会开放 debug 端口的(debug 需要开放调试<<-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005>>), 那么, 线上死锁如何定位? 怎么---> 解决问题?
实际上, Java 已经为我们提供了一些好用的定位分析工具, 当然, 这是一句废话, 你肯定知道, 你还知道哪些命令在 JDK 的安装目录下的
bin下和java以及javac在一块躺着. 是的, 的确如此, 你可真是一个小机灵鬼哦(奈何从来没用过是不是? 今天帅帅给你一个应用场景, 哈哈…)
(昨晚睡觉前提了点代码到 jfoa 怎么也没想到导致了线上死锁…)

1. jps
1.1 jps 虚拟机进程状况工具
要定位分析 java 死锁, 死循环你得先知道发生问题的
LVMID(Logical Volume Manager ID), 就好比你要看 Windows 上的一个应用的详细信息, 你得先找到这个应用运行在哪个进程上一样.
或者你可以将LVMID理解为PID, 通常LVMID和 进程PID是一致的, 但是并不非得一样.
因此, 要想知道
LVMID, 可以通过查询 PID, 但是其一这是不保险的, 其二, 当同时启动了多个虚拟机进程时, 无法通过名称区分, 就必须得使用jps了(任务管理器里面有 n 个 java.exe).
JackLi:~ dreamli$ jps
1461
1847 GradleDaemon
1210
1851 RunnerApplication
1852 Jps
JackLi:~ dreamli$
上面是一个简单的使用, 第一列的数字就是
LVMID, 后面跟的是启动的主类. 帅帅使用的gradle构建工具, 所以有一个GradleDaemon, 然后jfoa项目的启动入口在RunnerApplication
1.2 jps 参数
jps 有以下几个参数
-q只输出LVMID
JackLi:~ dreamli$ jps -q
1875
1461
1847
1210
1851
-m输出传给 main 函数的参数
JackLi:~ dreamli$ jps -m
1876 Jps -m
1461
1847 GradleDaemon 6.2.2
1210
1851 RunnerApplication
-l输出主类的全名, 如果进程执行的是 jar, 则会输出 jar 路径
JackLi:~ dreamli$ jps -l
1461
1877 sun.tools.jps.Jps
1847 org.gradle.launcher.daemon.bootstrap.GradleDaemon
1210
1851 club.javafamily.runner.RunnerApplication
-v输出虚拟机进程启动时 JVM 参数
JackLi:~ dreamli$ jps -v
1461 -Dosgi.requiredJavaVersion=1.8 -Dosgi.instance.area.default=@user.home/eclipse-workspace -XX:+UseG1GC -XX:+UseStringDeduplication -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -Dosgi.requiredJavaVersion=1.8 -Xms256m -Xmx1024m -Xdock:icon=../Resources/Eclipse.icns -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -javaagent:/Applications/Eclipse.app/Contents/Eclipse/lombok.jar
1878 Jps -Dapplication.home=/Users/dreamli/Software/java/zulu-jfx-252/zulu-8.jdk/Contents/Home -Xms8m
1847 GradleDaemon -Xmx2g -Dfile.encoding=UTF-8 -Duser.country=CN -Duser.language=zh -Duser.variant
1210 -Xms128m -Xmx2048m -XX:ReservedCodeCacheSize=240m -XX:+UseCompressedOops -Dfile.encoding=UTF-8 -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -Djdk.http.auth.tunneling.disabledSchemes="" -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -Xverify:none -XX:ErrorFile=/Users/dreamli/java_error_in_idea_%p.log -XX:HeapDumpPath=/Users/dreamli/java_error_in_idea.hprof -javaagent:/Applications/IntelliJ IDEA.app/Contents/bin/jetbrains-agent.jar -Djb.vmOptionsFile=/Users/dreamli/Library/Preferences/IntelliJIdea2019.3/idea.vmoptions -Didea.home.path=/Applications/IntelliJ IDEA.app/Contents -Didea.executable=idea -Didea.paths.selector=IntelliJIdea2019.3
1851 RunnerApplication -Djava.rmi.server.hostname=127.0.0.1 -Dspring.profiles.active=dev -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/dreamli/Workspace/MyRepository/javafamily/jfoa/runner/build/server-temp -Duser.country=CN -Duser.language=zh -Duser.variant
2 jstack
2.1 jstack: java 堆栈跟踪工具
jstack命令用于生成当前时刻线程快照(一般称为threaddump文件或者javacore文件), 线程快照就是当前虚拟机内每一条线程正在执行的方法堆栈集合, 生成线程快照的目的是定位线程出现长时间停顿的原因, 如: 线程死锁, 死循环, 请求外部资源的长时间等待等都是导致线程长时间停顿的常见原因.
基本命令格式如下:jstack [options] lvmid
JackLi:~ dreamli$ jps -m
1461
1847 GradleDaemon 6.2.2
1210
1851 RunnerApplication
1980 Jps -m
JackLi:~ dreamli$ jstack 1851
2020-11-26 23:59:27
Full thread dump OpenJDK 64-Bit Server VM (25.252-b14 mixed mode):
"Attach Listener" #62 daemon prio=9 os_prio=31 tid=0x00007faece823800 nid=0xac03 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"MessageBroker-2" #61 prio=5 os_prio=31 tid=0x00007faed3018800 nid=0x15403 waiting on condition [0x000070000ab6c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000773124fa8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2.2 jstack 参数
jstack有一下几个参数:
-F: 当正常输出的请求不被响应时, 强制输出线程堆栈-l: 除堆栈外显示关于锁的信息-m: 调用本地方法的话, 可以显示 C/C++ 的堆栈
3. 死锁定位分析
昨晚代码撸到半夜, 提交了很多代码, 没想到…死锁------留下了心疼自己的泪水…
3.1 jps 定位 lvmid
第一步不用说了, 先找到
lvmid
3.2 jstack 获取线程堆栈
3.1 提点 Linux 小知识
- 将线程堆栈写入本地文件
jstack -m 1851 >> ....您的路径../jfoa/jstack-2020-11-26.log
- 获取远程 Linux 上的堆栈文件
scp [email protected]:/root/jfoa-logs/jstack-2020-11-26.log ./jstack.txt
3.2 线程堆栈文件分析
这里贴出来一些核心的地方
"http-nio-80-exec-6" #58 daemon prio=5 os_prio=0 tid=0x00007efc15ccb800 nid=0x40 waiting on condition [0x00007efbdd490000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e051e308> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:107)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:33)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
at club.javafamily.runner.web.portal.service.SubjectVoteService.changeVote(SubjectVoteService.java:93)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:749)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$848/1976926828.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000000dd01f1a0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"https-jsse-nio-443-exec-5" #45 daemon prio=5 os_prio=0 tid=0x00007efc15a17800 nid=0x33 waiting on condition [0x00007efbde19a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
...
Locked ownable synchronizers:
- <0x00000000e0520d38> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"https-jsse-nio-443-exec-10" #50 daemon prio=5 os_prio=0 tid=0x00007efc15c2d800 nid=0x38 waiting on condition [0x00007efbddc94000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getSubjectVoteDto(SubjectVoteService.java:61)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
Locked ownable synchronizers:
- <0x00000000e051fc38> (a java.util.concurrent.ThreadPoolExecutor$Worker)
这里主要分析
Locked ownable synchronizers以及"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
Locked ownable synchronizers显示了锁信息"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]如下
"线程名" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [等待锁状态, 0(0x0000000000000000) 代表没锁]
java.lang.Thread.State: 线程状态
然后结合方法调用堆栈分析死锁, 死循环的原因.
4 问题重现
帅帅的问题最终定位到原因是
ReentrantReadWriteLock在读锁中获取写锁.(在写锁中获取读锁不会死锁)
- Service
package org.jack.thread01;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
/** * @Description: 在读锁中获取写锁 * @Warning: 死锁 * @Author: Jack Li * @Package: thread-01 - ThreadService * @Date: Nov 26, 2020 10:57:49 PM * @Version: 1.0.0 * @TimeComplexity: Required[O(n)] ---- Current[O(n^2)] * @ExecuteResult: Success! * @Status: Accepted */
public class ThreadService {
public void method1() {
System.out.println("Method 1 准备获取读锁");
lock.readLock().lock();
System.out.println("Method 1 得到读锁");
try {
Thread.sleep(2000);
System.out.println("Method 1 准备获取写锁");
method3();
System.out.println("Method 1 获得写锁并释放");
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.readLock().unlock();
System.out.println("Method 1 释放读锁");
}
}
public void method2() {
System.out.println("Method 2 准备获取写锁");
lock.writeLock().lock();
System.out.println("Method 2 获得写锁");
try {
Thread.sleep(1000);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.writeLock().unlock();
System.out.println("Method 2 释放写锁");
}
}
public void method3() {
lock.writeLock().lock();
try {
Thread.sleep(500);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.writeLock().unlock();
}
}
private final ReadWriteLock lock = new ReentrantReadWriteLock();
}
- 程序入口
public class DealLockTest {
public static void main(String[] args) {
new Thread(() -> {
threadService.method1();
}) .start();
new Thread(() -> {
threadService.method2();
}) .start();
}
private static ThreadService threadService = new ThreadService();
// private static ThreadService2 threadService = new ThreadService2();
}
- 线程堆栈
"Thread-1" #11 prio=5 os_prio=31 tid=0x00007fbda01c2000 nid=0xa903 waiting on condition [0x0000700003e8c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.jack.thread01.ThreadService.method2(ThreadService.java:50)
at org.jack.thread01.DealLockTest.lambda$1(DealLockTest.java:11)
at org.jack.thread01.DealLockTest$$Lambda$2/295530567.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-0" #10 prio=5 os_prio=31 tid=0x00007fbda01be800 nid=0x5503 waiting on condition [0x0000700003d89000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.jack.thread01.ThreadService.method3(ThreadService.java:68)
at org.jack.thread01.ThreadService.method1(ThreadService.java:31)
at org.jack.thread01.DealLockTest.lambda$0(DealLockTest.java:7)
at org.jack.thread01.DealLockTest$$Lambda$1/250421012.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Thread-0 先获取了读锁, 然后在读锁还没释放时又去获取写锁, 获取写锁时又要等待所有的读锁释放.从而导致了死锁.

边栏推荐
- 一文入门Redis
- 奉劝那些刚参加工作的学弟学妹们:要想进大厂,这些并发编程知识是你必须要掌握的!完整学习路线!!(建议收藏)
- 【高并发】我用10张图总结出了这份并发编程最佳学习路线!!(建议收藏)
- 浅谈低代码技术在物流管理中的应用与创新
- The JSP specification requires that an attribute name is preceded by whitespace
- [USB device design] - composite device, dual hid high-speed (64BYTE and 1024byte)
- OSPF comprehensive experiment
- brpc源码解析(三)—— 请求其他服务器以及往socket写数据的机制
- 【AI4Code】《Unified Pre-training for Program Understanding and Generation》 NAACL 2021
- 【AI4Code】《CodeBERT: A Pre-Trained Model for Programming and Natural Languages》 EMNLP 2020
猜你喜欢

Meta learning (meta learning and small sample learning)
![[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22](/img/02/5f24b4af44f2f9933ce0f031d69a19.png)
[multimodal] transferrec: learning transferable recommendation from texture of modality feedback arXiv '22

Knowledge maps are used to recommend system problems (mvin, Ctrl, ckan, Kred, gaeat)

【图攻防】《Backdoor Attacks to Graph Neural Networks 》(SACMAT‘21)

【GCN-RS】Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for RS (SIGIR‘22)

Transformer variants (routing transformer, linformer, big bird)

dirReader. Readentries compatibility issues. Exception error domexception

Brpc source code analysis (VI) -- detailed explanation of basic socket

OSPF comprehensive experiment

知识图谱用于推荐系统问题(MVIN,KERL,CKAN,KRED,GAEAT)
随机推荐
'C:\xampp\php\ext\php_ zip. Dll'-%1 is not a valid Win32 Application Solution
[multimodal] hit: hierarchical transformer with momentum contract for video text retrieval iccv 2021
web编程(二)CGI相关
Solutions to the failure of winddowns planning task execution bat to execute PHP files
【GCN-RS】Learning Explicit User Interest Boundary for Recommendation (WWW‘22)
创新突破!亚信科技助力中国移动某省完成核心账务数据库自主可控改造
【AI4Code】《CoSQA: 20,000+ Web Queries for Code Search and Question Answering》 ACL 2021
[USB device design] - composite device, dual hid high-speed (64BYTE and 1024byte)
图神经网络用于推荐系统问题(IMP-GCN,LR-GCN)
brpc源码解析(七)—— worker基于ParkingLot的bthread调度
[high concurrency] Why is the simpledateformat class thread safe? (six solutions are attached, which are recommended for collection)
【云驻共创】AI在数学界有哪些作用?未来对数学界会有哪些颠覆性影响?
【无标题】
Innovation and breakthrough! AsiaInfo technology helped a province of China Mobile complete the independent and controllable transformation of its core accounting database
Objects in JS
【多模态】《HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval》ICCV 2021
银行理财子公司蓄力布局A股;现金管理类理财产品整改加速
知识图谱用于推荐系统问题(MVIN,KERL,CKAN,KRED,GAEAT)
【AI4Code】《Unified Pre-training for Program Understanding and Generation》 NAACL 2021
苹果供应链十年浮沉:洋班主任和它的中国学生们