当前位置:网站首页>kubernetes部署thanos ruler的发送重复告警的一个隐秘的坑
kubernetes部署thanos ruler的发送重复告警的一个隐秘的坑
2022-06-28 07:15:00 【nangonghen】
1 概述:
1.1 环境
thanos ruler和alertmanager都部署在kubernetes集群,版本信息如下:
a、kubernetes集群:v1.18.5
b、thanos ruler: v0.11.0
c、alertmanager: v0.20.0
thanos ruler的yaml文件简介:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/name: thanos-rule
name: thanos-rule
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: thanos-rule
serviceName: thanos-rules
template:
metadata:
labels:
app.kubernetes.io/name: thanos-rule
spec:
containers:
- image: registry.cn-shenzhen.aliyuncs.com/gzlj/thanos-reloader:v0.1
imagePullPolicy: Always
name: reloader
resources:
limits:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- rule
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --rule-file=/etc/thanos/rules/*rules.yaml
- --data-dir=/var/thanos/rule
- --label=rule_replica="$(NAME)"
#请注意--alert.label-drop这行记录,值是带""
- --alert.label-drop="rule_replica"
- --query=dnssrv+_http._tcp.thanos-query.monitoring.svc.cluster.local
- --alertmanagers.url=http://alertmanager-main.monitoring.svc.cluster.local:9093
env:
- name: NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: quay.mirrors.ustc.edu.cn/thanos/thanos:v0.11.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 24
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: thanos-rule
ports:
- containerPort: 10901
name: grpc
protocol: TCP
- containerPort: 10902
name: http
protocol: TCP
readinessProbe:
failureThreshold: 18
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
volumeMounts:
- mountPath: /var/thanos/rule
name: data
- mountPath: /etc/thanos/rules
name: thanos-rules
restartPolicy: Always
serviceAccount: thanos-rules
serviceAccountName: thanos-rules
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: thanos-rules
name: thanos-rules
- emptyDir: {}
name: data
重点截图如下
1.2 现象
alertmanager收到重复告警,两个重复的告警唯一的区别是自定义标签rule_replica的值不一样,如图所示:
2 解决方式
尝试过更换成thanos ruler的镜像版本(v0.15.0),但现象依旧。
即将放弃的时候,我把thanos ruler的启动命令参数 --alert.label-drop="rule_replica"变成 --alert.label-drop=rule_replica,即只是去掉了双引号,alertmanager重复接收告警的现象解决。
3 解决后的现象
thanos ruler将告警信息中的标签 rule_replica 扔掉,再将告警发送给alertmanager,此时alertmanager中只存在一份告警信息,而不是先前的两份。
边栏推荐
- An important term in MySQL -- CRUD
- Hack the box:routerspace
- 阿里云服务器创建快照、回滚磁盘
- 腾讯下半年继续裁员,所有事业群至少缩减 10%,对此你怎么看?关注者
- 【Rust翻译】从头实现Rust异步执行器
- DOM parsing of XML file case code sentence by sentence analysis
- How bacnet/ip gateway collects data of building centralized control system
- Mysql8.0和Mysql5.0访问jdbc连接
- Self discipline challenge 30 days
- ABAP skill tree
猜你喜欢

Libuv framework echo server C source code explanation (TCP part)

How bacnet/ip gateway collects data of building centralized control system

Mysql57 zip file installation

Overview, implementation and use of CRC32

BACnet/IP網關如何采集樓宇集中控制系統數據

LeetCode+ 51 - 55 回溯、动态规划专题

推荐10个好用到爆的Jupyter Notebook插件,让你效率飞起

Comment la passerelle BACnet / IP recueille - t - elle les données du système central de contrôle des bâtiments?

一个小工具可以更快的写爬虫

以动态规划的方式求解最长回文子串
随机推荐
C language tutorial
open62541直接导入NodeSet文件
云原生(待更新)
编译配置in文件
强化学习——格子世界
【Rust日报】2020-05-24 Rash, Rocket, Mun, Casbin
代码提交规范
[rust translation] implement rust asynchronous actuator from scratch
普歌--三大基础排序,冒泡·选择·快速
Yesterday, I went to a large factory for an interview and asked me to do four arithmetic operations. Fortunately, I am smart enough
实时数据库 - 笔记
Libuv framework echo server C source code explanation (TCP part)
Top 25 most popular articles on vivo Internet technology in 2021
R 语言 ggmap 可视化集群
Huawei cloud computing physical node cna installation tutorial
[rust daily] May 24, 2020 rush, rocket, Mun, caspin
Devtools implementation principle and performance analysis practice
VM332 WAService. js:2 Error: _ vm. Changetabs is not a function
「杰伦熊」暴跌96.6% 明星带货NFT为何遇冷?
Comprehensive analysis of real enterprise software testing process