当前位置:网站首页>The problem come from line screening process
The problem come from line screening process
2022-08-05 06:13:00 【sick caterpillar】
Troubleshooting
- For various common online problems, sort out the troubleshooting ideas.
Business questions
- Online problems are mostly caused by business problems. When most requests in the online environment are normal, when some or a user has problems, how to troubleshoot them?
- Under the current microservice system, there are generally distributed link tracking systems and ELK log systems. We can find the point of the problem through the monitoring platform:
- Crawling of exception logs

- At this point, we can get the current user's request information through log tracking:

- Use the watch command of Arths to monitor the corresponding abnormal interface, get the corresponding parameters through the log, and simulate the request of the online user through the invoke command of Dubbo, so as to reproduce the problem and solve the problem
Non-business questions
- Arthas tool is a good tool for locating problems online, easy to install
- In the troubleshooting process for non-business problems, it is necessary to first check the computer core resources such as CPU, memory, threads, etc.
- We can get the corresponding information in this service through the dashboard command, and get the latest data every few seconds.
- You can see in the thread monitoring area: thread id, name, status, CPU usage, whether to guard the thread, etc.
- Memory Hee Hee: Heap Memory, Eden Area, Survivor Area, Old Age, Method Area
- Machine condition

As above, we can get the key thread id of the corresponding thread information
Then you can query the execution stack of a thread through Thread thread_id without even dumping
There is also decompiled jad, and online query of the source code information of the corresponding class is convenient for troubleshooting
However, most of the online incidents do not have time to search temporarily, corresponding to the generation system, there is not much time for online positioning,
I will proceed as follows:
- Sequentially restart the problematic machines to see if that fixes the problem,
- At the same time, execute the jmap -dump command on the last machine to be restarted to save the thread status of the java heap
- If the machine cannot be restored after restarting, it will be rolled back to the previous version to ensure normal online business
- Import the saved dump file to the local
- Use the java visualVM tool that comes with jdk to import the dump file
- visualVM can view the classes used in the dump file records through the visual interface, the objects in each class and the specific content in various current environments can be analyzed offline and solved after analyzing the specific reasons.
边栏推荐
猜你喜欢
随机推荐
I217-V network disconnection problem in large traffic under openwrt soft routing
入门文档05-2 使用return指示当前任务已完成
Technology Sharing Miscellaneous Technologies
深度 Zabbix 使用指南——来自惨绿少年
静态路由
Spark源码-任务提交流程之-6.2-sparkContext初始化-TaskScheduler任务调度器
spark源码-任务提交流程之-2-YarnClusterApplication
The idea of commonly used shortcut key
【Day8】(超详细步骤)使用LVM扩容
spark算子-map vs mapPartitions算子
spark operator-textFile operator
spark源码-任务提交流程之-7-流程梳理总结
腾讯云云函数SCF—入门须知
IJCAI 2022|Boundary-Guided Camouflage Object Detection Model BGNet
入门文档06 向流(stream)中添加文件
LeetCode面试题
【Day8】 RAID磁盘阵列
小度 小度 在呢!
Spark source code - task submission process - 4-container to start executor
交换机原理









