当前位置:网站首页>Missing monitoring: ZABBIX monitors the status of Eureka instance
Missing monitoring: ZABBIX monitors the status of Eureka instance
2022-07-06 06:55:00 【Uncle MUNE loves operation and maintenance】
background
Before formally introducing monitoring , Let's first understand the two components used in our microservice architecture .
Eureka
Eureka As Spring Cloud Registration Center for , It mainly provides the ability of service registration and discovery .
Eureka use CS(Client/Server, client / The server ) framework , It includes the following two components :
Eureka Server
Eureka Service registry , It is mainly used to provide service registration function . When the microservice starts , Will register their services to Eureka Server.Eureka Server Maintains a list of available services , Store all registered to Eureka Server Information about available services , These available services can be found in Eureka Server It can be seen intuitively in the management interface of .Eureka Client
Eureka client , It usually refers to the micro services in the micro service system , Mainly used for and Eureka Server Interact . After the microservice application starts ,Eureka Client Will send to Eureka Server Send a heartbeat ( The default period is 30 second ). if Eureka Server A message was not received in multiple heartbeat cycles Eureka Client The heart of ,Eureka Server Remove it from the list of available services ( Default 90 second ).
Apollo
Apollo( Apollo ) It is a reliable distributed configuration management center , Born in Ctrip framework R & D department , Able to centralize the management of different environments 、 Configuration of different clusters , After configuration modification, it can be pushed to the application end in real time , And have standard authority 、 Process governance and other features , Applicable to microservice configuration management scenarios .
and Spring Cloud Configuration center integration git Warehouse 、svn The warehouse and other configuration sources obtain the system configuration parameters uniformly , from Spring Cloud Config Client consumption . Its configuration file must follow a strict syntax format , Even extra spaces can cause Client Unable to read the corresponding configuration parameters , Once something goes wrong, it's easy to ignore .
be based on Spring Cloud The defect of configuration center , We use Apollo Has been replaced step by step , Make operation and maintenance 、 Development is more convenient for configuration file management between different environments .
problem
With the help of Apollo Can achieve Configuration changes take effect in real time ( Hot release )
, That is, the user Apollo After modifying the configuration and Publishing , The client can be real-time (1 second ) Received the latest configuration , And notify the application .
Apollo The hot release function meets our requirements when changing configuration attributes , Applications can perceive in real time . But in use , If Eureka Client When automatically updating the configuration , Full update , Will lead to Eureka Server The health check status during the heartbeat cycle is as follows :
- client Service discovery status “UNKNOWN”
- client The background is still running normally , Unable to contact Eureka Server Send a normal heartbeat
because Eureka Server Service discovery status is abnormal , At this time, it is impossible to provide external services normally . If the operation and maintenance does not check in time Eureka Manage each client In the state of , Then there will be a production accident .
Be careful : Every client Corresponding to one instance, Now we call it instance.
demand
For the above situations , Although we have been right instance Access to health check , But because of instance There is no alarm during normal operation , It seems that there are still loopholes in our monitoring , So we need to pass Zabbix Yes Eureka instance State monitoring to achieve full coverage of application monitoring ..
Ideas
Eureka Server Registered services with application Dimensions are grouped , Every application There are multiple under instance. So we use Zabbix Autodiscover for , adopt Eureka API You can get all the grouping information , Instead of manually adding monitoring items again every time .
because Zabbix Monitoring items cannot be repeated , So we passed application name /Instance ip Address
Name it , Distinguish between different instance, This requires that our application cannot deploy multiple applications on one server , Otherwise, the monitored items will be repeated .
Be careful : We can actually get by InstanceId
As a distinction, it is more reasonable , however InstanceId The use of is often not standardized , If included ip、 Host name and so on , Because the characters are too long, it may cause unnecessary trouble .
Eureka API
# obtain Eureka be-all application
http://192.168.3.123:1180/eureka/apps
# Get a application All of them instance
http://192.168.3.123:1180/eureka/apps/application name /
Concrete realization
Because you need to parse Eureka API Returned data , So we use python Parsing json data .
instance Auto discovery
# Execute the script to automatically discover application Name and Instance ip Address
python eureka-instance.py discovery
{
"data":[
{
"{#APP}":"TEST1",
"{#HOSTNAME}":"192.168.3.10"
},
{
"{#APP}":"TEST1",
"{#HOSTNAME}":"192.168.3.11"
}
]
}
By acquiring {#APP}
and {#HOSTNAME}
, We can combine them into monitoring items corresponding to the naming rules .
# Monitoring item combination
TEST1/192.168.3.10
TEST1/192.168.3.11
Get monitor item status
Data after automatic discovery , We can further obtain the status of the monitored items .
# 1. obtain instance 10 state
python eureka-instance.py status TEST1 192.168.3.10
# Execution results
UP
# 2. obtain instance 11 state
python eureka-instance.py status TEST1 192.168.3.11
# Execution results
UP
According to the different Instance The state of , As long as the result is not “UP” Then alarm .
The final script
#!/usr/local/miniconda/bin/python
#-*- coding:utf-8 -*-
#comment:
#1.zabbix Auto discovery eureka instance
#2. Yes instance Monitor and alarm the status of
import requests
import json
import sys
from copy import deepcopy
# return json Format data , Otherwise return to xml Format data
headers = {'Accept':'text/html, application/xhtml+xml, application/json;q=0.9, */*;q=0.8'}
def instance_discovery():
app_list = []
url="http://192.168.3.123:1180/eureka/apps/"
try:
response=requests.get(url, headers=headers)
if response.status_code == 200:
instance_dic = {}
#for app in response.json()["applications"]["application"][1:2]:
for app in response.json()["applications"]["application"]:
for instance in app['instance']:
instance_dic['{#APP}'] = instance['app']
instance_dic['{#HOSTNAME}'] = instance['hostName']
# deep copy
app_list.append(deepcopy(instance_dic))
#print(app_list)
#json serialize
discovery_app_info = {"data":app_list}
print(json.dumps(discovery_app_info, sort_keys=True, indent=4, separators=(',', ':')))
except Exception as e:
print(e)
def instance_status():
if len(sys.argv) == 4:
try:
url="http://192.168.3.123:1180/eureka/apps/%s/" % (sys.argv[2])
response=requests.get(url, headers=headers)
if response.status_code == 200:
instance_dic = {}
for instance in response.json()["application"]["instance"]:
if sys.argv[3] == instance["hostName"]:
print(instance["status"])
except Exception as e:
print(e)
else:
print("Usage: python eureka-instance.py status app hostName")
if __name__ == '__main__':
if sys.argv[1] == 'discovery':
instance_discovery()
elif sys.argv[1] == 'status':
instance_status()
else:
print("Usage: python eureka-instance.py [discovery]|[status app hostName]")
Access Zabbix
1. The configuration file
vim eureka.conf
UserParameter=instance_discovery,/usr/local/miniconda/bin/python /etc/zabbix/monitor_scripts/eureka-instance.py discovery
UserParameter=instance_status[*],/usr/local/miniconda/bin/python /etc/zabbix/monitor_scripts/eureka-instance.py status "$1" "$2"
2. Auto discovery
3. Monitor item configuration
4. The alarm information
# 1. Status as DOWN, Alarm occurs
Alarm host : middleware _eureka_192.168.3.123
host IP: 192.168.3.123
Host group : middleware _eureka
Alarm time :2022.06.01 14:58:23
recovery time :2022.06.01 15:13:24
Alarm level :High
The alarm information :Eureka/TEST1/192.168.3.10: Status as DOWN
Alarm items :instance_status[TEST1,192.168.3.10]
Details of the problem :
TEST1/192.168.3.10: DOWN
current state :
Alarm occurs
# 2. Status as UP, Restore alarm
Alarm host : middleware _eureka_192.168.3.123
host IP: 192.168.3.123
Host group : middleware _eureka
Alarm time :2022.06.01 14:58:23
recovery time :2022.06.01 15:13:24
Alarm level :High
The alarm information :Eureka/TEST1/192.168.3.10: Status as DOWN
Alarm items :instance_status[TEST1,192.168.3.10]
Details of the problem :
TEST1/192.168.3.10: UP
current state :
Alarm recovery : UP
边栏推荐
- 【Hot100】739. 每日温度
- Basic commands of MySQL
- Day 248/300 thoughts on how graduates find jobs
- Fedora/rehl installation semanage
- 攻防世界 MISC中reverseMe简述
- What is the difference between int (1) and int (10)? Senior developers can't tell!
- 接口自动化测试实践指导(上):接口自动化需要做哪些准备工作
- How to find a medical software testing institution? First flight software evaluation is an expert
- A method to measure the similarity of time series: from Euclidean distance to DTW and its variants
- hydra常用命令
猜你喜欢
【每日一题】729. 我的日程安排表 I
ROS学习_基础
[unity] how to export FBX in untiy
BUU的MISC(不定时更新)
Classification des verbes reconstruits grammaticalement - - English Rabbit Learning notes (2)
A brief introduction of reverseme in misc in the world of attack and defense
When my colleague went to the bathroom, I helped my product sister easily complete the BI data product and got a milk tea reward
Biomedical English contract translation, characteristics of Vocabulary Translation
【刷题】怎么样才能正确的迎接面试?
L'Ia dans les nuages rend la recherche géoscientifique plus facile
随机推荐
Is it difficult for girls to learn software testing? The threshold for entry is low, and learning is relatively simple
At the age of 26, I changed my career from finance to software testing. After four years of precipitation, I have been a 25K Test Development Engineer
UNIPRO Gantt chart "first experience": multi scene exploration behind attention to details
Bitcoinwin (BCW): the lending platform Celsius conceals losses of 35000 eth or insolvency
Monotonic stack
Windows Server 2016 standard installing Oracle
A brief introduction of reverseme in misc in the world of attack and defense
AttributeError: Can‘t get attribute ‘SPPF‘ on <module ‘models.common‘ from ‘/home/yolov5/models/comm
成功解决TypeError: data type ‘category‘ not understood
顶测分享:想转行,这些问题一定要考虑清楚!
SSO流程分析
一文读懂简单查询代价估算
雲上有AI,讓地球科學研究更省力
基于PyTorch和Fast RCNN快速实现目标识别
[ 英语 ] 语法重塑 之 英语学习的核心框架 —— 英语兔学习笔记(1)
Latex文字加颜色的三种办法
【软件测试进阶第1步】自动化测试基础知识
Lesson 7 tensorflow realizes convolutional neural network
Day 239/300 注册密码长度为8~14个字母数字以及标点符号至少包含2种校验
Misc of BUU (update from time to time)