当前位置:网站首页>[stonedb fault diagnosis] system resource bottleneck diagnosis
[stonedb fault diagnosis] system resource bottleneck diagnosis
2022-07-27 22:07:00 【51CTO】
When there is a bottleneck in the resources of the operating system , Not only the application services on the operating system are affected , Moreover, executing simple commands in the operating system may not return results . Before the operating system is completely rammed , You can use related commands to CPU、 Memory 、IO And the use of network resources , Then analyze and confirm whether these resources are reasonably utilized , Is there a bottleneck .
CPUtop、vmstat Can be checked CPU Usage situation , but top The results are more comprehensive .top The returned result has two layers , The upper layer is the statistical information of system performance , The lower level is the process statistics , The default in accordance with the CPU Sort by usage . top An example of the return result is as follows :
first line
10:12:21: Current system time up 5 days: The number of running days since the last system startup 4 user: Number of users logging in to the system load average: In the past 1 minute 、5 minute 、15 minute , The average value of the system load
The second line
total: Total number of system processes running: Number of running processes sleeping: Number of dormant processes stopped: The number of processes in the stopped state zombie: Number of processes in zombie state
The third line
us: User process occupation CPU Percent of sy: The system process occupies CPU Percent of ni: The priority is occupied by the changed process CPU Percent of id: Free CPU Percentage occupied wa:IO Waiting for occupation CPU Percent of hi: Hardware interrupt occupation CPU Percent of si: Software interrupt occupancy CPU Percent of st: Virtualized environments occupy CPU Percent of We need to focus on CPU The usage rate of , When us When the value is higher , Indicates the user process consumption CPU More time , If it takes longer than 50% when , Application services should be optimized as soon as possible . When sy When the value is higher , Description system process consumption CPU More time , For example, it may be the unreasonable configuration of the operating system or the emergence of the operating system Bug. When wa When the value is higher , Explain the system IO Waiting is more serious , For example, there may be a lot of randomness IO visit ,IO Bandwidth bottleneck .
In the fourth row
total: Total physical memory size , Unit is M free: Free memory size used: Size of memory used buff/cache: Cached memory size
The fifth row
total:Swap size free: Idle Swap size used: Used Swap size avail Mem: Cached Swap size
Process list
PID: Process id USER: The owner of the process PR: Priority of the process , The smaller the value, the more priority is given to execution NI: process nice value , A positive value indicates that the priority of the process is reduced , A negative value means to increase the priority of the process ,nice The value range is (-20,19), By default , Process nice The value is 0 VIRT: Virtual memory size occupied by the process RES: The physical memory size occupied by the process SHR: The size of shared memory occupied by the process S: Process status , among S Indicating dormancy ,R Indicates running ,Z Indicates a dead state ,N Indicates that the process priority value is negative %CPU: process CPU Usage rate %MEM: Process memory usage TIME+: After the process starts, it occupies CPU The total time of , I.e. occupation CPU Cumulative value of service time COMMAND: Process start command name appear CPU High usage diagnostic methods : 1) Find the function called
notes :xxx by top -H Return to the most consumed CPU The process of . 2) Find out the consumption CPU Of SQL
notes :xxx by pidstat Return to the most consumed CPU The thread of .
Memorytop、vmstat、free Can check the memory usage . free An example of the return result is as follows :
total: Total physical memory size ,total = used + free + buff/cache used: Size of memory used free: Free memory size shared: Shared memory size buff/cache: Cache memory size available: Available physical memory size ,available = free + buff/cache There are diagnostic methods for high memory utilization : 1) Check if the configuration is reasonable , for example : Operating system physical memory 128G, And assign to the database instance 110G, Because operating system processes and other applications also need memory , It's easy to run out of memory ; 2) Check whether the number of concurrent connections is too high ,read_buffer_size、read_rnd_buffer_size、sort_buffer_size、thread_stack、join_buffer_size、binlog_cache_size All are session Grade , The more connections , The more memory you need , Therefore, these parameters cannot be set too large ; 3) Check whether there is unreasonable join, for example : When multiple tables are associated , The result set of the driving table in the execution plan is relatively large , It needs to be executed repeatedly , Easy to cause memory leaks ; 4) Check whether there are too many open files and table_open_cache Whether the setting is reasonable , When accessing a table , The table will be put into the cache table_open_cache, The purpose is to visit faster next time , But if table_open_cache Set too large , And there are many open tables , It consumes a lot of memory .
IOiostat、dstat、pidstat Can be checked IO Usage situation . iostat An example of the return result is as follows :
rrqm/s: Every second merge Read operand of wrqm/s: Every second merge Write operand of r/s: Read every second IO frequency w/s: Write every second IO frequency rkB/s: Read every second IO size , Unit is KB wkB/s: Write every second IO size , Unit is KB avgrq-sz: Average request size , The unit is sector (512B) avgqu-sz: The average number of requests active in the driver request queue and in the device await: Average IO response time , Including the waiting in the driver request queue and the device IO response time r_await: Every read operation IO response time w_await: Every write operation IO response time svctm: Disk device IO Mean response time %util: The device is busy processing IO Percentage of requests ( Usage rate ), How busy the disk is r/s + w/s:IOPS appear IO High usage diagnostic methods : 1) Find the most used disk device
2) Find out the occupation IO High application
3) Find out the occupation IO High thread
4) Find out the occupation IO high SQL
notes :xxx by pidstat Return to the most consumed IO The thread of .
This article by the blog one article many sends the platform OpenWrite Release !
边栏推荐
- [question 23] Sudoku game with rotation | DFS (Beijing Institute of Technology / Beijing Institute of Technology / programming methods and practice / primary school)
- 8000 word explanation of OBSA principle and application practice
- Enumeration and annotation
- Huawei establishes global ecological development department: fully promote HMS global ecological construction
- Will the United States prohibit all Chinese enterprises from purchasing American chips? Trump responded like this
- In addition to "adding machines", in fact, your micro service can be optimized like this
- 零钱通项目(两个版本)含思路详解
- 2021-11-05 understand main method syntax, code block and final keyword
- ThreadLocal principle and source code analysis (click in step by step, don't recite, learn ideas)
- 【StoneDB故障诊断】系统资源瓶颈诊断
猜你喜欢

MySQL execution process and order

Project analysis (from technology to project and product)

Regular expression exercise

STM32 project Sharing -- mqtt intelligent access control system (including app control)

Nine days later, we are together to focus on the new development of audio and video and mystery technology

MySQL执行过程及执行顺序

Simple use of enum
![[question 24] logic closed loop (Beijing Institute of Technology / Beijing University of Technology / programming methods and practice / primary school)](/img/c4/71a9933a3a1fdd14f84a41b640f5b5.jpg)
[question 24] logic closed loop (Beijing Institute of Technology / Beijing University of Technology / programming methods and practice / primary school)

Pytoch distributed training

项目分析(从技术到项目、产品)
随机推荐
学完4种 Redis 集群方案要多久?我一口气给你说完
What is modcount in the source code? What's the effect
8000 word explanation of OBSA principle and application practice
Mask automatic update description file (mask description file)
@Can component be used on the same class as @bean?
First knowledge of esp8266 (I) -- access point and wireless terminal mode
Software test interview question: please say who is the best person to complete these tests, and what is the test?
Software test interview question: does software acceptance test include formal acceptance test, alpha test and beta test?
Inertial navigation principle (VII) -imu error classification (II) -allan variance analysis method +imu test + calibration introduction
In depth understanding of recursive method calls (including instance maze problem, tower of Hanoi, monkey eating peach, fiboracci, factorial))
Monitor the running of server jar and restart script
项目分析(从技术到项目、产品)
8000 word explanation of OBSA principle and application practice
Software test interview questions: the steps to write test cases by drawing cause and effect diagrams are___ And transforming the cause and effect diagram into a state diagram in five steps. What are
[question 21] idiom Solitaire (Beijing Institute of Technology / Beijing University of Technology / programming methods and practice / primary school)
B站崩了,如果我们是那晚负责修复的开发人员
Software testing interview question: what aspects should be considered when designing test cases, that is, which aspects should different test cases be tested for?
[question 22] dungeons and Warriors (Beijing Institute of Technology / Beijing Institute of Technology / programming methods and practice / primary school)
Project analysis (from technology to project and product)
@The difference between Autowired annotation and @resource annotation