当前位置:网站首页>Build Prometheus automatic monitoring and alarm system from scratch
Build Prometheus automatic monitoring and alarm system from scratch
2022-07-26 20:36:00 【Brother Xing plays with the clouds】
Build from scratch Prometheus Monitoring and alarm system
What is? Prometheus?
Prometheus By SoundCloud Open source monitoring and alarm system and time series database developed (TSDB).Prometheus Use Go Language development , yes Google BorgMon Open source version of monitoring system . 2016 Year by year Google launch Linux The original cloud foundation under the foundation (Cloud Native Computing Foundation), take Prometheus Included in its second largest open source project . Prometheus Currently, it is quite active in the open source community . Prometheus and Heapster(Heapster yes K8S A subproject of , Used to get colony Performance data for .) More complete than function 、 More comprehensive .Prometheus The performance is also enough to support tens of thousands of units colony .
Prometheus Characteristics
- Multidimensional data model .
- Flexible query language .
- Do not rely on Distributed Storage , Single The server Nodes are autonomous .
- Based on HTTP Of pull Method to collect time series data .
- Time series data can be pushed through the intermediate gateway .
- Discover target service objects through service discovery or static configuration .
- Support a variety of charts and interfaces , such as Grafana etc. .
Official website address :https://prometheus.io/
Architecture diagram
The basic principle
Prometheus The basic principle of HTTP Protocol periodically grabs the status of monitored components , Any component only needs to provide corresponding HTTP The interface can be connected to the monitoring . No need for any SDK Or other integration processes . This is very suitable for the monitoring system of virtual environment , such as VM、Docker、Kubernetes etc. . Output information of monitored components HTTP Interfaces are called exporter . At present, most of the components commonly used by Internet companies have exporter You can use it directly , such as Varnish、Haproxy、Nginx、MySQL、Linux system information ( Include disk 、 Memory 、CPU、 Network, etc. ).
Service process
- Prometheus Daemon Be responsible for regularly grabbing the target metrics( indicators ) data , Each grab target needs to expose one http The interface of the service provides it with regular fetching .Prometheus Support through profile 、 text file 、Zookeeper、Consul、DNS SRV Lookup And so on .Prometheus use PULL Monitoring by , namely The server You can go directly through the target PULL Data or indirectly through the intermediate gateway Push data .
- Prometheus Store all fetched data locally , And clear up and organize the data through certain rules , And store the results in a new time series .
- Prometheus adopt PromQL And others API Visualize the collected data .Prometheus Support many ways of chart visualization , for example Grafana、 Self contained Promdash And the template engine provided by itself, etc .Prometheus Also provide HTTP API Query mode of , Customize the required output .
- PushGateway Support Client Take the initiative to push metrics To PushGateway, and Prometheus It's just going on a regular basis Gateway Grab data up .
- Alertmanager Is independent of Prometheus A component of , Can support Prometheus Query statement , Provide very flexible alarm mode .
Three kits
- Server Mainly responsible for data acquisition and storage , Provide PromQL Query language support .
- Alertmanager Warning Manager , Used for alarm .
- Push Gateway Temporary support Job The middle gateway of active push index .
Introduction to this flying pig course
- 1. Demonstrate the installation Prometheus Server
- 2. Demonstrated through golang and node-exporter Provide metrics Interface
- 3. demonstration pushgateway Use
- 4. demonstration grafana Use
- 5. demonstration alertmanager Use
Installation preparation
Here mine IP yes 10.211.55.25, Log in , Create corresponding folder
mkdir -p /home/chenqionghe/promethues
mkdir -p /home/chenqionghe/promethues/server
mkdir -p /home/chenqionghe/promethues/client
touch /home/chenqionghe/promethues/server/rules.yml
chmod 777 /home/chenqionghe/promethues/server/rules.ymlLet's start with the three kits
One . install Prometheus Server
adopt docker The way First create a profile /home/chenqionghe/test/prometheus/prometheus.yml You need to change the file permission to 777, Do not cause modification of files on the host Can cause content out of sync problems
global:
scrape_interval: 15s # Default fetch interval , 15 Grabs data from target once per second .
external_labels:
monitor: 'codelab-monitor'
# Here is the configuration of the grab object
scrape_configs:
# This configuration is an example of time sequence within this configuration , Each one will automatically add this {job_name:"prometheus"} The label of - job_name: 'prometheus'
scrape_interval: 5s # Global fetch interval overridden , from 15 Second rewrite 5 second
static_configs:
- targets: ['localhost:9090']function
docker rm -f prometheus
docker run --name=prometheus -d \
-p 9090:9090 \
-v /home/chenqionghe/promethues/server/prometheus.yml:/etc/prometheus/prometheus.yml \
-v /home/chenqionghe/promethues/server/rules.yml:/etc/prometheus/rules.yml \
prom/prometheus:v2.7.2 \
--config.file=/etc/prometheus/prometheus.yml \
--web.enable-lifecycleAdd at startup --web.enable-lifecycle Enable remote hot load profile The call instruction is curl -X POST http://localhost:9090/-/reload
visit http://10.211.55.25:9090 We will see the following l Interface
visit http://10.211.55.25:9090/metrics
We've configured 9090 port , Default prometheus I'll grab myself /metrics Interface stay Graph Option can already see the monitored data
Two . Installation client provides metrics Interface
1. adopt golang Provided by the client metrics
mkdir -p /home/chenqionghe/promethues/client/golang/src
cd !$
export GOPATH=/home/chenqionghe/promethues/client/golang/
# Cloning project
git clone https://github.com/prometheus/client_golang.git
# Installation needs FQ Third party package for
mkdir -p $GOPATH/src/golang.org/x/
cd !$
git clone https://github.com/golang/net.git
git clone https://github.com/golang/sys.git
git clone https://github.com/golang/tools.git
# Install necessary packages
go get -u -v github.com/prometheus/client_golang/prometheus
# compile
cd $GOPATH/src/client_golang/examples/random
go build -o random main.gofunction 3 Example metrics Interface
./random -listen-address=:8080 &
./random -listen-address=:8081 &
./random -listen-address=:8082 &2. adopt node exporter Provide metrics
docker run -d \
--name=node-exporter \
-p 9100:9100 \
prom/node-exporterThen configure these two interfaces to prometheus.yml, service crond reload curl -X POST http://localhost:9090/-/reload
global:
scrape_interval: 15s # Default fetch interval , 15 Grabs data from target once per second .
external_labels:
monitor: 'codelab-monitor'
rule_files:
#- 'prometheus.rules'
# Here is the configuration of the grab object
scrape_configs:
# This configuration is an example of time sequence within this configuration , Each one will automatically add this {job_name:"prometheus"} The label of - job_name: 'prometheus'
- job_name: 'prometheus'
scrape_interval: 5s # Global fetch interval overridden , from 15 Second rewrite 5 second
static_configs:
- targets: ['localhost:9090']
- targets: ['http://10.211.55.25:8080', 'http://10.211.55.25:8081','http://10.211.55.25:8082']
labels:
group: 'client-golang'
- targets: ['http://10.211.55.25:9100']
labels:
group: 'client-node-exporter'You can see that the interfaces are working
prometheus A variety of exporter Tools , If you are interested, you can study it
3、 ... and . install pushgateway
pushgateway To allow temporary and batch jobs to disclose their metrics to Prometheus . Because such jobs may not last long , Unable to grab , So they can push the indicators to the push gateway . Prometheus Data collection is used pull Pull model , This is from what we just set up 5 The second parameter will tell . But some data are not suitable for this way , This data can be used Push Gateway service . It's like a cache , When the data collection is completed , Upload here , from Prometheus Later again pull To come over . So let's try that , First start Push Gateway
mkdir -p /home/chenqionghe/promethues/pushgateway
cd !$
docker run -d -p 9091:9091 --name pushgateway prom/pushgatewayvisit http://10.211.55.25:9091 already pushgateway It's working
Next we can go pushgateway Push data ,prometheus Multi lingual sdk, The easiest way is through shell
- Push an indicator
echo "cqh_metric 3.14" | curl --data-binary @- http://Ubuntu-linux:9091/metrics/job/cqh- Push multiple metrics
cat <<EOF | curl --data-binary @- http://10.211.55.25:9091/metrics/job/cqh/instance/test
# Price of exercise place
muscle_metric{label="gym"} 8800
# Three big data kg
bench_press 100
dead_lift 160
deep_squal 160
EOFThen we will pushgateway Configuration to prometheus.yml inside , Overload configuration See that you can search the indicators just pushed
Four . install Grafana Exhibition
Grafana Is an open source program for visualizing large measurement data , It provides a powerful and elegant way to create 、 share 、 Browsing data . Dashboard You're different metric Data in data source . Grafana Most commonly used in Internet infrastructure and application analysis , But it's also useful in other areas , such as : Industrial sensor 、 Home automation 、 Process control, etc . Grafana Support hot plug control panel and scalable data source , Currently supported Graphite、InfluxDB、OpenTSDB、Elasticsearch、Prometheus etc. .
We use docker install
docker run -d -p 3000:3000 --name grafana grafana/grafanaThe default login account and password are admin, The interface after entering is as follows
Let's add a data source
hold Prometheus Fill in the address of
Import prometheus The template of
Open the upper left corner to select the imported template, and you will see that there are already various graphs
Let's add a chart of our own
Specify icons and keywords you want to see , Top right save
See the following data
So far, we have realized the automatic collection and display of data , Let's talk about it prometheus How to alarm automatically
5、 ... and . install AlterManager
Pormetheus Warning of consists of two separate parts . Prometheus Warning rules in the service send warnings to Alertmanager. And then this Alertmanager Manage these warnings . Include silencing, inhibition, aggregation, And how to send notifications , for example :email,PagerDuty and HipChat. Main steps for establishing warnings and notifications :
- Create and configure Alertmanager
- start-up Prometheus The service , adopt -alertmanager.url Flag configuration Alermanager Address , In order to Prometheus Service ability and Alertmanager Establishing a connection .
Create and configure Alertmanager
mkdir -p /home/chenqionghe/promethues/alertmanager
cd !$create profile alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['cqh']
group_wait: 10s # Group alarm wait time
group_interval: 10s # Group alarm interval
repeat_interval: 1m # Repeated alarm interval
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://10.211.55.2:8888/open/test'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']It's configured as web.hook The way , When server notice alertmanager Automatically called webhook http://10.211.55.2:8888/open/test
Run below altermanager
docker rm -f alertmanager
docker run -d -p 9093:9093 \
--name alertmanager \
-v /home/chenqionghe/promethues/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanagervisit http://10.211.55.25:9093
I'm going to modify Server End configuration alarm rules and altermanager Address Modify rules /home/chenqionghe/promethues/server/rules.yml
groups:
- name: cqh
rules:
- alert: cqh test
expr: dead_lift > 150
for: 1m
labels:
status: warning
annotations:
summary: "{{$labels.instance}}: Hard pull exceeding standard !lightweight baby!!!"
description: "{{$labels.instance}}: Hard pull exceeding standard !lightweight baby!!!"This rule means , Hard pull over 150 kg , For one minute , Just call the police And then modify it prometheus add to altermanager To configure
global:
scrape_interval: 15s # Default fetch interval , 15 Grabs data from target once per second .
external_labels:
monitor: 'codelab-monitor'
rule_files:
- /etc/prometheus/rules.yml
# Here is the configuration of the grab object
scrape_configs:
# This configuration is an example of time sequence within this configuration , Each one will automatically add this {job_name:"prometheus"} The label of - job_name: 'prometheus'
- job_name: 'prometheus'
scrape_interval: 5s # Global fetch interval overridden , from 15 Second rewrite 5 second
static_configs:
- targets: ['localhost:9090']
- targets: ['10.211.55.25:8080', '10.211.55.25:8081','10.211.55.25:8082']
labels:
group: 'client-golang'
- targets: ['10.211.55.25:9100']
labels:
group: 'client-node-exporter'
- targets: ['10.211.55.25:9091']
labels:
group: 'pushgateway'
alerting:
alertmanagers:
- static_configs:
- targets: ["10.211.55.25:9093"]heavy load prometheus To configure , The rules are in effect Now let's look at grafana Changes in data in
And then we click prometheus Of Alert modular , You'll see it's green -> yellow - red , Alarm triggered
Then let's take a look at what's available webhook Interface , I use the interface here golang Written , After receiving the data body Content alarm to nail
The alarm content received by nail is as follows
Come here , Build from scratch Prometheus The introduction of automatic monitoring and alarm is finished , One stop service , Auto grab interface + automatic alarm + Elegant graphic display , What are you waiting for , hurriedly high get up !
边栏推荐
- 【PyQt5基本控件使用解析】
- 员工辞职还得赔偿公司损失?34岁机长辞职被公司索赔1066万
- 第二章:遇到阻难!绕过WAF过滤!【SQL注入攻击】
- arpspoof 安装和使用
- Fitting the new direction of curriculum standards, ape guidance, creating a characteristic new concept content system
- 聊天软件项目开发2
- Leetcode刷题之——链表总结
- 查询字段较多时可以添加普通查询和高级查询两种情况
- QT signal and slot connection (loose coupling)
- After being fined "paid leave" for one month, Google fired him from AI on "love"
猜你喜欢

【面试必刷101】动态规划1

MySQL InnoDB engine (V)

Gartner发布最新《中国AI初创企业市场指南》,弘玑Cyclone再次被评为代表性企业
![[基础服务] [数据库] ClickHouse的安装和配置](/img/fe/5c24e4c3dc17a6a96985e4fe97024e.png)
[基础服务] [数据库] ClickHouse的安装和配置

Group convolution

任务二 kaggle糖尿病检测

The sandbox cooperates with artist Alec monopoly

BUU刷题记-网鼎杯专栏2

BGP的路由黑洞和防环

How to implement an asynchronous task queue system that can handle massive data (supreme Collection Edition)
随机推荐
ES6新特性
QT signal and slot connection (loose coupling)
The Sandbox 和艺术家 Alec Monopoly 达成合作
全球最聪明50家公司公布:中国厂商占据近半,华为名列第一
ES6 method & Class array into real array & method of judging array
Nmap installation and use
Cookies and sessions
LCP 11. 期望个数统计
Chat software project development 2
How to obtain Cu block partition information in HM and draw with MATLAB
How to implement an asynchronous task queue system that can handle massive data (supreme Collection Edition)
Kotlin - 协程上下文 CoroutineContext
Small scenes bring great improvement! Baidu PaddlePaddle easydl helps AI upgrade of manufacturing assembly line
谷歌的新编程语言被称为 Carbon
BUU刷题记-网鼎杯专栏2
深度可分离卷积(DepthwiseSeparableConvolution):Depthwise卷积与Pointwise卷积
美司法部律师团队要求法官拒绝受理华为诉讼
What are the key technologies of digital factory
Quick start to connection pooling
一层节点训练5个坐标的超简单神经网络代码