当前位置:网站首页>Learn about Prometheus from 0 to 1

Learn about Prometheus from 0 to 1

2022-06-11 16:28:00 jerry_ dyy

Introduction to monitoring platform :

The unified monitoring platform is composed of seven roles : Monitoring source 、 Data collection 、 data storage 、 Data analysis 、 Data presentation 、 Early warning center 、CMDB( Enterprise software and hardware asset management )

  • Monitoring source :

From the level of , It can be roughly divided into three layers , Business application layer 、 Middleware layer 、 Infrastructure layer . The business application layer mainly includes application software 、 Enterprise message bus, etc , The middleware layer includes the database 、 cache 、 Configuration center 、 And other system software , The infrastructure layer mainly includes physical machines 、 virtual machine 、 Containers 、 Network devices 、 Storage devices, etc .

  • Data collection :

Data sources are so diverse , The task of data collection is not easy . Data collection can be divided into business indicators 、 Application indicators 、 System software monitoring indicators 、 System indicators . Apply monitoring indicators such as : Usability 、 abnormal 、 throughput 、 response time 、 Current number of waiting transactions 、 Resource utilization 、 Request quantity 、 Log size 、 performance 、 Queue depth 、 Number of threads 、 Number of service calls 、 Traffic volume 、 Service availability, etc , Business monitoring indicators such as large amount of running water 、 Flow area 、 Flow details 、 Number of requests 、 response time 、 Number of responses, etc , System monitoring indicators are as follows: :CPU load 、 Memory load 、 Disk load 、 The Internet IO、 disk IO、tcp The number of connections 、 Number of processes, etc .

In terms of acquisition mode, it can be generally divided into interface acquisition 、 client agent collection 、 Active capture through network protocol (http、snmp etc. )

  • data storage :

The collected data is usually stored in the file system ( Such as HDFS)、 Index system ( Such as elasticsearch)、 Index library ( Such as influxdb)、 Message queue ( Such as kafka, Temporarily store or buffer messages )、 database ( Such as mysql)

  • Data analysis :

For the collected data , Data processing . There are two types of processing : Real time processing and batch processing . Technology includes Map/Reduce Calculation 、 Full log retrieval 、 Flow computation 、 Index calculation, etc , The key point is to select different calculation methods according to different scenarios .

  • Data presentation :

Display the processing results in a chart , In the multi screen era , Cross device support is essential .

  • early warning :

If problems are found during data processing , An exception analysis is required 、 Risk estimation and event triggering or alarm .

  • CMDB( Enterprise software and hardware asset management ):

CMDB It is a very important link in the unified monitoring platform , Although there are many kinds of monitoring sources , But most of them are related , For example, the application runs in the running environment , The normal operation of applications depends on the network and storage devices , An application also depends on other applications ( Business depends on ), Once any one of the links goes wrong , Will lead to the unavailability of the application .CMDB In addition to storing hardware and software assets , Also store such an association between assets , An asset has failed , We should be able to quickly know which other assets will be affected according to this relationship , Then solve the problem one by one .

Why choose Prometheus?

Prometheus VS Zabbix:

Time of issue

development language

performance

Community support

Container support

Enterprise use

Deployment difficulty

Prometheus

2016

go

Support ten thousand as a unit

Relatively inferior zabbix, But the number is increasing day by day

Not only support swarm Native clusters , And support Kubernetes Monitoring of container clusters , It is the best solution for container monitoring at present

Basically use Kubernetes With the container enterprise ,prometheus Is the best choice

There's only one core server Components , One command to start

Zabbix

2012

c + php

Up to 10000 bytes

Widely applied , Support more mature , Any problems encountered can be found

Zabbix Appeared earlier , At that time, the container was not born , Naturally, the support for containers is also relatively poor

In traditional monitoring system , Especially in server related monitoring , Take the absolute advantage of

Multiple systems , A variety of monitoring information collection methods

Framework principle :

 

  • Prometheus Server:

Prometheus Sever yes Prometheus The core part of a component , Responsible for the acquisition of monitoring data , Storage and query .Prometheus Server You can monitor targets through static configuration management , It can also be used in combination with Service Discovery Dynamic management of monitoring objectives , And get data from these monitoring targets . secondly Prometheus Sever The collected data needs to be stored ,Prometheus Server It's a real-time database in itself , The collected monitoring data is stored in the local disk according to the time series .Prometheus Server It provides customized PromQL, Realize the query and analysis of data . in addition Prometheus Server The ability to federate clusters can be made from other Prometheus Server Get data from the instance .

  • Exporters:

Exporter Pass the endpoint of monitoring data collection through HTTP The form of service is exposed to Prometheus Server,Prometheus Server By accessing the Exporter Provided Endpoint Endpoint , That is, the monitoring data that needs to be collected can be obtained . Can be Exporter It is divided into 2 class :

Collect directly : This kind of Exporter It's built in right Prometheus Monitoring support , such as cAdvisor,Kubernetes,Etcd,Gokit etc. , Both are built directly for Prometheus Expose endpoint of monitoring data .

Indirect acquisition : The original monitoring target does not directly support Prometheus, So it needs to pass Prometheus Provided Client Library Write the monitoring and acquisition program of the monitoring target . for example :Mysql Exporter,JMX Exporter,Consul Exporter etc. .

  • AlertManager:

stay Prometheus Server Support based on Prom QL Create alarm rules , If meet Prom QL The rules of definition , An alarm will be generated . stay AlertManager from Prometheus server Termination received alerts after , There will be de duplication , grouping , And route to the corresponding acceptance method , Call the police . Common ways of receiving are : E-mail ,pagerduty,webhook etc. .

  • PushGateway:

Prometheus Data collection is based on Prometheus Server from Exporter pull data , So when the network environment doesn't allow Prometheus Server and Exporter When communicating , have access to PushGateway To transit . adopt PushGateway Take the initiative of monitoring data of internal network Push To Gateway in ,Prometheus Server Using the Exporter Same way , Take the monitoring data from PushGateway pull To Prometheus Server.

  • Prometheus The workflow of the :

1、Prometheus server Regularly from the configured jobs perhaps exporters Middle pull metrics, Or receive from Pushgateway Sent by metrics, Or from other Prometheus server Middle pull metrics;

2、Prometheus server Store the collected... Locally metrics, And run the defined alerts.rules, Record new time series or directions Alert manager Push alarm ;

3、Alertmanager According to the configuration file , Processing of received alarms , Give an alarm ;

4、 stay Web UI or Grafana in , adopt PromQL from Prometheus server Query and visualize the collected data .

Installation and deployment :

install Prometheus:

Getting started | Prometheus

Download | Prometheus

 

tar -zxvf prometheus-*.tar.gz Decompress it .

Get into prometheus-* Unpack the directory , To configure prometheus.yml:

# my global config

global:

  scrape_interval: 15s # Set to every 15 Seconds from Target Collect data once , The default is 1 Minutes at a time

  evaluation_interval: 15s # Set to every 15 Second evaluate the rules , The default is 1 Minutes at a time

  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

  alertmanagers:

    - static_configs:

        - targets: ["localhost:9093"] # To configure alertmanager Address

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

    - "prometheus.rules.yml" # Configuration rules file

  # - "first_rules.yml"

  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  # To configure a job, This is where prometheus Do it yourself exporter, monitor prometheus own

  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'

    # scheme defaults to 'http'.

    static_configs:

      - targets: ["172.16.0.213:9090"]

        labels:

          instance: prometheus

  # To configure mysql exporter, monitor mysql

  - job_name: "mysqld_exporter"

    static_configs:

      - targets: ["localhost:9104"]

        labels:

          instance: mysql_208

  # To configure linux exporter, monitor linux host

  - job_name: "node_exporter"

    static_configs:

      - targets: ["localhost:9100"]

        labels:

          instance: linux_208

  # monitor Visitor management this JVM process

  - job_name: "vms"

    metrics_path: /vms/actuator/prometheus

    static_configs:

            - targets: ["localhost:10109"]

To configure prometheus.rules.yml:

groups:

# Configure a rule , When the rules are met , Will push the alarm information to alertmanager

- name: InstanceDown

  rules:

  - alert:  " The service process died "

    expr: up == 0 # be based on Prom QL Alarm rule created by expression

    for: 5m # If it's continuous 5 The test results within minutes are up == 0, Then push the alarm information to alertmanager

    labels:

      severity: critical # The severity of the alarm

    annotations: # Alarm content

      summary: "Instance { { $labels.instance }} down"

      description: "{ { $labels.instance }} of job { { $labels.job }} has been down for more than 5 minutes."

- name: MemoryRule

  rules:

  - alert: " High memory usage alarm "

 expr: ((node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 80 # be based on Prom QL Alarm rule created by expression

    for: 5m # If it's continuous 5 The memory usage detected within minutes is higher than 80%, Then push the alarm information to alertmanager

    labels:

      severity: "High" # The severity of the alarm

      annotations: # Alarm content

      summary: " service name :{ {$labels.alertname}}"

      description: " Business 500 Call the police : { { $value }}"

      value: "{ { $value }}"

This is the alarm message received by the mailbox :

After configuration , Start it up .

Access address :localhost:9090

install Grafana:

With Grafana | Grafana documentation

Download Grafana | Grafana Labs

download , decompression , start-up .

Access address :172.16.0.213:3000

install AlertManager:

Download | Prometheus

download , decompression , To configure alertmanager.yml file :

global:

  smtp_smarthost: 'smtp.qq.com:465'

  smtp_from: '1737***[email protected]'

  smtp_auth_username: '1737***[email protected]'

  smtp_auth_password: 'idhcsx***qoe***'

  smtp_require_tls: false

route:

  receiver: 'mail-dyy'

  group_wait: 1s

  group_interval: 5s

  repeat_interval: 1h

  group_by: ['alertname']

receivers:

- name: 'mail-dyy'

  email_configs:

  - to: '1737***[email protected]'

Subsequent extensions :

The monitoring platform expands with the expansion of the application platform . At present, we have made a minimum closed loop , contain prometheus、alertmanager、grafana、node_exporter、mysqld_exporter And so on , Can monitor linux The server 、 monitor mysql、 Monitor specific java project . But it needs a lot of horizontal expansion , For example monitoring MQ、 monitor Redis、 monitor Nginx wait . When doing these horizontal expansions , There are two main points to be grasped , namely Exporter、Grafana Templates , and Exporter Exposed monitoring items need to be consistent with Grafana Inside the template PromQL correspond , If there is no corresponding ready-made template , You need to create it manually .

Exporter:

Download | Prometheus

 

Grafana Templates :

Dashboards | Grafana Labs

Search for the desired template by name :

 

Get the ID:

 

stay Grafana In the operation interface import dashboard, Enter... Above ID You can import a style template :

 

原网站

版权声明
本文为[jerry_ dyy]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206111613416980.html