当前位置:网站首页>Gaussdb (DWS) database intelligent monitoring operation and maintenance service - node monitoring indicators

Gaussdb (DWS) database intelligent monitoring operation and maintenance service - node monitoring indicators

2022-06-23 19:38:00 Hua Weiyun

  GaussDB(DWS) Use DMS To carry the intelligent operation and maintenance system of database , It provides monitoring during database operation and maintenance , analysis , Handle the three core processes . This article will focus on DMS Monitoring data of cluster host hardware indicators in the service .

  stay GaussDB(DWS) After the cluster is created in the product , You can see the created cluster information on the cluster management page , Select the monitoring panel function in the cluster operation option , You can enter DMS In service .

DMS It provides a number of database related monitoring and tool functions , In this article, we mainly focus on the node monitoring indicators in the monitoring function .

The node monitoring in the database cluster mainly lies in CPU Memory disk Four aspects of network , From the current overview interface, you can see some current status indicators of the host

CPU Usage rate adopt /proc/stat get CPU state , from SYS( Kernel mode ) And USER( User mode ) Add the percentages to get the current CPU Usage rate , This indicator reflects the node CPU Pressure state .
Memory usage adopt /proc/meminfo Get memory information , The current memory usage is obtained by subtracting the ratio of free memory to cache from the total memory , This indicator reflects the node memory usage status .
Average disk usage By reading the node disk mount information , Get node disk capacity usage .
disk I/O adopt iostat Command to get the current node IO state , This indicator reflects the current disk IO Flow .
TCP Protocol retransmission rate adopt /proc/net/snmp Obtain the statistical results of node network protocols , This index reflects the network quality of nodes to some extent .
The Internet I/O adopt /proc/net/dev Get the traffic of each network port of the node , This index reflects the network traffic pressure state of the node .
state By checking node to cluster CCN Whether the node is reachable , Judge the current status of the node .

Move the mouse to an indicator , You can also view more detailed monitoring values, such as CPU, You can view the user status The system state IDLE IO Waiting to be consumed CPU Proportion .

The node monitoring page can also provide more detailed information about disk and network activities , For example, the disk function will be applied to each disk of each node IO Collect and display status indicators .

DMS Service from CPU Memory disk The network monitors the database nodes from four aspects , How do these monitoring indicators reflect the current status of the database , How to find out the possible problems in the database from these indicators

CPU indicators :CPU The utilization rate reflects the current running business of the cluster , The more business, the more computation , Node CPU The higher the usage rate , When you observe during the peak period of cluster business, you can see CPU Usage is high . about CPU Two types of scenario problems are illustrated for indicators :

CPU Usage has been high Check the services running in the cluster , Business SQL Or whether the table index design is unreasonable or whether the node needs to be checked GaussDB The process has been consuming CPU resources .
Each node CPU Usage rates vary significantly Check the cluster business , Whether the business distribution is unreasonable , about CN Nodes can see if load balancing is not configured , Resulting in excessive pressure at a single node .

Memory metrics : The memory utilization rate reflects the memory consumption of the current cluster during operation , The more data the business involves , The more memory a node consumes . Examples of memory metrics related issues :

Memory usage is growing slowly Check whether there is a memory leak
The memory utilization of each node is significantly different Check the cluster business , Whether the business distribution is unreasonable

Disk metrics : The disk index reflects the disk usage of the cluster data during the current cluster operation , For example, disk metrics :

Disk usage is growing rapidly in the short term Check whether the business is consistent with the disk occupation , Whether there are too many dirty pages in the database , Temporary files occupy a lot of disk resources .
Node disk IO The waiting time is too long Check the disk status , Whether the disk is in a slow disk state .
The disk utilization of each node is significantly different Check the cluster business , Whether the business distribution is unreasonable , Causing data skew .

Network metrics : The network indicators reflect the network traffic status of each node when the cluster is running , For example, network problems :

The Internet TCP High retransmission rate Check the cluster network , Network congestion , The network status is poor .
The number of network packet losses is too high Check cluster services and networks , Whether the network card packet loss or the network itself is caused by business pressure .

原网站

版权声明
本文为[Hua Weiyun]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231834259247.html