当前位置:网站首页>Basic information of Promethus (I)
Basic information of Promethus (I)
2022-07-03 03:02:00 【Sanhuang egg】
List of articles
One 、promethus brief introduction
Google's internal large-scale cluster system Bo Ruo , yes kubernetes The forerunner of . Its monitoring system is Promethus. and Promethus It's a clone , So it fits very well K8s Monitoring of , Very suitable for containers .
Promethus Is a set of open source monitoring 、 Call the police 、 The time series 、 Samples collected by the combination of databases , Stored in memory in the form of time series (TSDB Time series database ) in , And save it to the hard disk regularly ( In a persistent way ) Time series database does not belong to SQL Databases are not NOSQL database .
1、Pomeranian Characteristics
- Customization is mostly data model ( Time series data from metric Name and a group key/value Label composition )
- Very efficient storage, the average sample data accounts for about 3.5Bytes about . for example :320 Ten thousand time series , Every time 30 Second sampling , keep 60 God , It takes about 228G Of disk space
- Flexible and powerful query statements on multiple dimensions (promSQL)
- Independent of distributed storage , Support single master node work
- Based on HTTP Of pull Method to collect time series data
- Can pass push gateway In progress sequence database push (pushing)
- The target server to be collected can be obtained through service discovery or static configuration
- A variety of visual chart and dashboard support
2、 Applicable scenario
Promethus It can record any pure digital time series well . It is suitable for machine centered monitoring , It is also suitable for monitoring highly dynamic service-oriented architecture . In microservices , Its support for multidimensional data collection and query is a special advantage .(K8s)
Promethus Designed for reliability , It is a system to be used during service interruption , Can quickly diagnose problems .
Every Promethus The servers are all independent , Instead of relying on network storage or other remote services . When Promethus When I hang up , Will first write a log of the reason for hanging up , Users can troubleshoot and restart directly according to the log information Promethus.
3、 Not applicable to the scene
Promethus It is not suitable for occasions with high accuracy
for example : Need accuracy 100% In the billing system of ,Promethus The data may not be sufficiently detailed and complete .
Two 、 Introduction to common monitoring
1、 Several common monitoring services
cacti
Cacti ( Cactus in English 〉 It's a set of bases PHP、 MySQL、 SNMP and RRDtool Developed network traffic monitoring / Graphic analysis tools . It passes through snmpget To get data , Use RRDTool mapping , But users do not need to know RRDTool Complex parameters . It provides very powerful data and user management functions , You can specify every – Users can view the tree structure 、 Host equipment and any - Pictures , You can also LDAP Combining user authentication , You can also customize templates , In the display and monitoring of historical data , Its function is quite good .Cacti( The network traffic ) # The interview may be low
By adding a template , Make monitoring addition of different devices reusable , And it has the function of custom drawing , Powerful computing power ( Superposition function of data )Nagios
Nagios It's one - An open source free network monitoring tool , Can effectively monitor windows、Linux and Unix The host state of , Switch router and other network equipment , Including printer, etc . Send email or SMS alarm when the system or service status is abnormal – Inform the website operation and maintenance personnel of the time , Send a normal email or SMS notification after the status is restored .nagios The main feature is monitoring alarm , The most powerful is the alarm function , It can support a variety of alarm modes , But the disadvantage is that there is no powerful data collection mechanism , And the data plot is very crude , As more and more hosts are monitored , Adding hosts is also very cumbersome , Configuration files are all text-based , I won't support it web Mode management and configuration , It's easy to make mistakes , Not suitable for maintenance .
Zabbix
zabbix It's one - Based on WEB The interface provides enterprise level open source solutions for distributed system monitoring and network monitoring .zabbix Can monitor all kinds of network parameters , Ensure the safe operation of the server system ; It also provides a powerful notification mechanism to enable the system operation and maintenance personnel to quickly locate / Solve all kinds of problems .zabbix from 2 Part of the form ,zabbix server And Optional components zabbix agent. zabbix server Can pass SNMP,zabbix、agent、ping. Port monitoring and other methods provide access to remote servers / Monitoring of network status , Data collection and other functions , He can run on linux,solaris-ux,ATX,Free BSD,Open BSD,os x Platforms such as
Prometheus
borg. kubernetes
borgmon ( The monitoring system ) The version corresponding to the clone : prometheus (go Language )therefore prometheus Perfect for K8S On the structure of
As a data monitoring solution , It's supported by a large community , From 700 From multiple companies 6300 Contributors ,13500 Code mention Make peace 7200 Pull requests
Prometheus It has the following characteristics :
SQL、NOSQL、TSDB( Timing data , By force , This is also a non relational database )
node Nodes in the 1h within cpu Changes in usage
- Multidimensional data model ( Based on time series Key、value Key value pair )
With key It's a horizontal axis ,value Form a time series data for the vertical axis , Multiple time series data , A trend chart ( The time series )( Here's the picture , Although the painting is simple , But you can also understand , The horizontal axis is key( time axis ), The vertical axis is value( Value axis ), At the same time , One key Can correspond to one or more value( The number ), This point , It is called time data . Data will change over time , therefore , It will change slowly in the future , Multiple time data form a time series ( It can be understood as a curve ))
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-BXW19sBN-1644678519839)(D:\ picture \image-20220212225842190.png)]
- Flexible query and aggregation language PromQL
- First determine the time , Make sure it's the host
- Provide local storage and distributed storage
- Based on HTTP and HTTPs Of Pull The model collects time series data :(pul1 Data push , The time series : Data value index at each time point , Continuous production . The horizontal axis identifies the time , The vertical axis is the data value , Dynamic change of values over a period of time , All the points and lines form a large-scale broken line chart )
- available pushgateway:(Prometheus Optional middleware ) Realization Push Pattern ( Only execute the script or one Secondary / Short cycle tasks , Or not using the seventh layer of exposure , Use push The way )
- Target machines can be discovered through dynamic service discovery or static configuration ( adopt consul Auto discover and shrink )( Static is convenient , disposable , It will not automatically update the servers added in the monitoring group or considered as terminals )( Dynamic is to update new servers in real time )
- Support a variety of charts and data ( After data collection, it can support many types of display )
3、 ... and 、 Design idea of operation and maintenance monitoring platform
- Data collection module
- The object of data collection , The way data is collected
- Data extraction module (prometheus-TSDB The query language is PromQL)
- Extract unnecessary data from a large amount of data , Discard what you don't need
- Monitoring alarm module ( The Boolean expression determines whether an alarm is required PromQ (CPU Usage rate ) > 80%)
- Judge , Is it true , Call the police when it is established , If not, continue to monitor
It can be subdivided into 6 layer
The sixth floor : User presentation management Same user management 、 Centralized monitoring 、 Centralized maintenance
The fifth floor : Alarm event generation layer Record alarm events in real time 、 Form analysis chart ( Trend analysis 、 visualization )
The fourth level : Alarm rule configuration layer alarm Rule settings 、 Alarm value setting
The third level : Data extraction layer Regularly collect data to the monitoring module
The second floor : Data presentation layer Data generation curve display ( Dynamic display of time series data )
first floor : Data collection layer Multi channel monitoring data
Four 、promethus Monitoring system
1、 Monitoring system ( Monitoring indicators )
System layer monitoring ( Three categories or the following four categories are ok )
- cpu、load、memory、swap、disk I/O、process etc. .
- network monitoring : Network devices 、 The workload 、 Network delay 、 Packet loss rate, etc
Middleware and basic application monitoring ( No choice )
Message middleware :kafka、RocketMQ、RabbitMQ Wait for the message broker (redis middleware )
WEB( application ) The server :tomcat、weblogic、apache、php、spring series
database / Cache database :MYSQL、PostgreSQL、MogoDB、es、redis etc. :
redis What needs to be monitored in :
- redis System layer monitoring of the server
- redis Service status
- RDB AOF Log monitoring
journal , If it's sentinel mode , Sentinels will share cluster information , Generate log , This log directly contains sentinel information of other nodes and redis Information
key The number of
key Hit data / frequency
Connect the system and redis Is the maximum number of connections
System :ulimit -a
redis: reids-cli Sign in , Again config get maxclients See the maximum number of connections
Monitoring indicators : How to select the indicators to be monitored ?
Data flow :
The connection between monitoring services : Data flow of services
Monitor the port 、 modular 、API Docking between : Data flow of business
Monitor the port 、 Docking between devices : Network level traffic / The flow of data
Selection direction of indicators
- Indicator selection for host level monitoring : Generally, it is the data of some indicators that affect the application status in the server , such as ( Hardware ,CPU、 The Internet 、I/O、 kernel 、 disk 、 Maximum number of open files 、 File descriptor 、socket etc. )
- The selection direction of network indicators :, In architecture , Intranet and extranet 、 Network traffic data , And delays 、 Packet loss 、 Efficiency performance 、 Queue port 、socket Data indicators of
- The selection direction of business level indicators : In the flow of business data / Data link route for final processing ,API Interface data flow direction 、 Personalized data flow monitoring ( Business measurement 、 Sense of experience 、 Stable / Security 、 These are the indicators that need to be monitored in the operation and maintenance system )
- Indicator selection direction of application layer : Example : monitor mysql Number of tables in 、 The record of the master table ,select Number of statements 、insert Number 、 Slow query statement 、 Error log 、 Master slave log 、 Memory in the service 、 Deadlock state 、 Number of threads open 、socket、 File descriptor . Other applications need to be monitored in the same direction , But there are some differences .
Application layer monitoring
- Used to measure the state and performance of application code
- Monitoring classification : Black box monitoring 、 White box monitoring
- Black box monitoring : Probe based monitoring , Will not actively intervene , Impact data
- White box monitoring : Introspective pointer , Waiting to be downloaded (cudvisoc)
Business layer monitoring
- To measure the value of the application , For example, the sales of e-commerce business 、ops、DAU Diurnal activity 、 Conversion rate, etc , Business interface : Number of logins , Number of registrations 、 Order quantity 、 Search volume and payment volume .
边栏推荐
- Matlab tips (24) RBF, GRNN, PNN neural network
- 从C到Capable-----利用指针作为函数参数求字符串是否为回文字符
- [fluent] futurebuilder asynchronous programming (futurebuilder construction method | asyncsnapshot asynchronous calculation)
- TCP handshake three times and wave four times. Why does TCP need handshake three times and wave four times? TCP connection establishes a failure processing mechanism
- Destroy the session and empty the specified attributes
- [principles of multithreading and high concurrency: 1_cpu multi-level cache model]
- 疫情当头,作为Leader如何进行代码版本和需求开发管控?| 社区征文
- 函数栈帧的创建与销毁
- HW initial preparation
- Add automatic model generation function to hade
猜你喜欢

Kubernetes cluster log and efk architecture log scheme

MySql实战45讲【事务隔离】

Add MDF database file to SQL Server database, and the error is reported

Check log4j problems using stain analysis

I2C subsystem (IV): I2C debug

从C到Capable-----利用指针作为函数参数求字符串是否为回文字符

Kubernetes family container housekeeper pod online Q & A?
![MySQL Real combat 45 [SQL query and Update Execution Process]](/img/cd/3a635f0c3bb4ac3c8241cb77285cc8.png)
MySQL Real combat 45 [SQL query and Update Execution Process]

docker安装redis
![ASP. Net core 6 framework unveiling example demonstration [02]: application development based on routing, MVC and grpc](/img/cb/145937a27ef08050a370d5a255215a.jpg)
ASP. Net core 6 framework unveiling example demonstration [02]: application development based on routing, MVC and grpc
随机推荐
Can netstat still play like this?
模糊查詢時報錯Parameter index out of range (1 > number of parameters, which is 0)
MySQL Real combat 45 [SQL query and Update Execution Process]
当lambda没有输入时,是何含义?
左值右指解释的比较好的
销毁Session和清空指定的属性
Tensorflow to pytorch notes; tf. gather_ Nd (x, y) to pytorch
open file in 'w' mode: IOError: [Errno 2] No such file or directory
[fluent] futurebuilder asynchronous programming (futurebuilder construction method | asyncsnapshot asynchronous calculation)
二维格式数组格式索引下标连续问题导致 返回json 格式问题
L'index des paramètres d'erreur est sorti de la plage pour les requêtes floues (1 > Nombre de paramètres, qui est 0)
Didi programmers are despised by relatives: an annual salary of 800000 is not as good as two teachers
Notifydatasetchanged not applicable to recyclerview - notifydatasetchanged not working on recyclerview
I2C 子系统(二):I3C spec
Andwhere multiple or query ORM conditions in yii2
Installation and use of memory leak tool VLD
Reset or clear NET MemoryStream - Reset or Clear . NET MemoryStream
Practice of traffic recording and playback in vivo
[Fuhan 6630 encodes and stores videos, and uses RTSP server and timestamp synchronization to realize VLC viewing videos]
How do you adjust the scope of activerecord Association in rails 3- How do you scope ActiveRecord associations in Rails 3?