当前位置:网站首页>The problem come from line screening process
The problem come from line screening process
2022-08-05 06:13:00 【sick caterpillar】
Troubleshooting
- For various common online problems, sort out the troubleshooting ideas.
Business questions
- Online problems are mostly caused by business problems. When most requests in the online environment are normal, when some or a user has problems, how to troubleshoot them?
- Under the current microservice system, there are generally distributed link tracking systems and ELK log systems. We can find the point of the problem through the monitoring platform:
- Crawling of exception logs
- At this point, we can get the current user's request information through log tracking:
- Use the watch command of Arths to monitor the corresponding abnormal interface, get the corresponding parameters through the log, and simulate the request of the online user through the invoke command of Dubbo, so as to reproduce the problem and solve the problem
Non-business questions
- Arthas tool is a good tool for locating problems online, easy to install
- In the troubleshooting process for non-business problems, it is necessary to first check the computer core resources such as CPU, memory, threads, etc.
- We can get the corresponding information in this service through the dashboard command, and get the latest data every few seconds.
- You can see in the thread monitoring area: thread id, name, status, CPU usage, whether to guard the thread, etc.
- Memory Hee Hee: Heap Memory, Eden Area, Survivor Area, Old Age, Method Area
- Machine condition
As above, we can get the key thread id of the corresponding thread information
Then you can query the execution stack of a thread through Thread thread_id without even dumping
There is also decompiled jad, and online query of the source code information of the corresponding class is convenient for troubleshooting
However, most of the online incidents do not have time to search temporarily, corresponding to the generation system, there is not much time for online positioning,
I will proceed as follows:
- Sequentially restart the problematic machines to see if that fixes the problem,
- At the same time, execute the jmap -dump command on the last machine to be restarted to save the thread status of the java heap
- If the machine cannot be restored after restarting, it will be rolled back to the previous version to ensure normal online business
- Import the saved dump file to the local
- Use the java visualVM tool that comes with jdk to import the dump file
- visualVM can view the classes used in the dump file records through the visual interface, the objects in each class and the specific content in various current environments can be analyzed offline and solved after analyzing the specific reasons.
边栏推荐
- The problem of calling ds18b20 through a single bus
- 调用TensorFlow Objection Detection API进行目标检测并将检测结果保存至本地
- 入门文档11 自动添加版本号
- CIPU,对云计算产业有什么影响
- 7步完成云上监控
- The Servlet to jump to the JSP page, forwarding and redirection
- IJCAI 2022|Boundary-Guided Camouflage Object Detection Model BGNet
- Spark source code - task submission process - 6-sparkContext initialization
- I/O performance and reliability
- Spark源码-任务提交流程之-6.1-sparkContext初始化-创建spark driver端执行环境SparkEnv
猜你喜欢
随机推荐
入门文档04 一个任务依赖另外一个任务时,需要按顺序执行
Spark源码-任务提交流程之-6.2-sparkContext初始化-TaskScheduler任务调度器
入门文档01 series按顺序执行
The spark operator - repartition operator
【Day8】Knowledge about disk and disk partition
Introductory document 05-2 use return instructions the current task has been completed
ACL 和NAT
ACLs and NATs
正则表达式小示例--获取重复最多的字符及其数量
[Day1] VMware software installation
lvm logical volume and disk quota
idea 常用快捷键
Configuration of TensorFlow ObjecDetectionAPI under Anaconda3 of win10 system
Getting Started 04 When a task depends on another task, it needs to be executed in sequence
Three modes of vim
Spark source code - task submission process - 4-container to start executor
I/O性能与可靠性
交换机原理
【Day5】软硬链接 文件存储,删除,目录管理命令
spark算子-map vs mapPartitions算子