当前位置:网站首页>Missing monitoring: ZABBIX monitors the status of Eureka instance
Missing monitoring: ZABBIX monitors the status of Eureka instance
2022-07-06 06:55:00 【Uncle MUNE loves operation and maintenance】
background
Before formally introducing monitoring , Let's first understand the two components used in our microservice architecture .
Eureka
Eureka As Spring Cloud Registration Center for , It mainly provides the ability of service registration and discovery .
Eureka use CS(Client/Server, client / The server ) framework , It includes the following two components :
Eureka Server
Eureka Service registry , It is mainly used to provide service registration function . When the microservice starts , Will register their services to Eureka Server.Eureka Server Maintains a list of available services , Store all registered to Eureka Server Information about available services , These available services can be found in Eureka Server It can be seen intuitively in the management interface of .Eureka Client
Eureka client , It usually refers to the micro services in the micro service system , Mainly used for and Eureka Server Interact . After the microservice application starts ,Eureka Client Will send to Eureka Server Send a heartbeat ( The default period is 30 second ). if Eureka Server A message was not received in multiple heartbeat cycles Eureka Client The heart of ,Eureka Server Remove it from the list of available services ( Default 90 second ).
Apollo
Apollo( Apollo ) It is a reliable distributed configuration management center , Born in Ctrip framework R & D department , Able to centralize the management of different environments 、 Configuration of different clusters , After configuration modification, it can be pushed to the application end in real time , And have standard authority 、 Process governance and other features , Applicable to microservice configuration management scenarios .
and Spring Cloud Configuration center integration git Warehouse 、svn The warehouse and other configuration sources obtain the system configuration parameters uniformly , from Spring Cloud Config Client consumption . Its configuration file must follow a strict syntax format , Even extra spaces can cause Client Unable to read the corresponding configuration parameters , Once something goes wrong, it's easy to ignore .
be based on Spring Cloud The defect of configuration center , We use Apollo Has been replaced step by step , Make operation and maintenance 、 Development is more convenient for configuration file management between different environments .
problem
With the help of Apollo Can achieve Configuration changes take effect in real time ( Hot release ), That is, the user Apollo After modifying the configuration and Publishing , The client can be real-time (1 second ) Received the latest configuration , And notify the application .
Apollo The hot release function meets our requirements when changing configuration attributes , Applications can perceive in real time . But in use , If Eureka Client When automatically updating the configuration , Full update , Will lead to Eureka Server The health check status during the heartbeat cycle is as follows :
- client Service discovery status “UNKNOWN”
- client The background is still running normally , Unable to contact Eureka Server Send a normal heartbeat
because Eureka Server Service discovery status is abnormal , At this time, it is impossible to provide external services normally . If the operation and maintenance does not check in time Eureka Manage each client In the state of , Then there will be a production accident .
Be careful : Every client Corresponding to one instance, Now we call it instance.
demand
For the above situations , Although we have been right instance Access to health check , But because of instance There is no alarm during normal operation , It seems that there are still loopholes in our monitoring , So we need to pass Zabbix Yes Eureka instance State monitoring to achieve full coverage of application monitoring ..
Ideas
Eureka Server Registered services with application Dimensions are grouped , Every application There are multiple under instance. So we use Zabbix Autodiscover for , adopt Eureka API You can get all the grouping information , Instead of manually adding monitoring items again every time .
because Zabbix Monitoring items cannot be repeated , So we passed application name /Instance ip Address Name it , Distinguish between different instance, This requires that our application cannot deploy multiple applications on one server , Otherwise, the monitored items will be repeated .
Be careful : We can actually get by InstanceId As a distinction, it is more reasonable , however InstanceId The use of is often not standardized , If included ip、 Host name and so on , Because the characters are too long, it may cause unnecessary trouble .
Eureka API
# obtain Eureka be-all application
http://192.168.3.123:1180/eureka/apps
# Get a application All of them instance
http://192.168.3.123:1180/eureka/apps/application name /
Concrete realization
Because you need to parse Eureka API Returned data , So we use python Parsing json data .
instance Auto discovery
# Execute the script to automatically discover application Name and Instance ip Address
python eureka-instance.py discovery
{
"data":[
{
"{#APP}":"TEST1",
"{#HOSTNAME}":"192.168.3.10"
},
{
"{#APP}":"TEST1",
"{#HOSTNAME}":"192.168.3.11"
}
]
}
By acquiring {#APP} and {#HOSTNAME}, We can combine them into monitoring items corresponding to the naming rules .
# Monitoring item combination
TEST1/192.168.3.10
TEST1/192.168.3.11
Get monitor item status
Data after automatic discovery , We can further obtain the status of the monitored items .
# 1. obtain instance 10 state
python eureka-instance.py status TEST1 192.168.3.10
# Execution results
UP
# 2. obtain instance 11 state
python eureka-instance.py status TEST1 192.168.3.11
# Execution results
UP
According to the different Instance The state of , As long as the result is not “UP” Then alarm .
The final script
#!/usr/local/miniconda/bin/python
#-*- coding:utf-8 -*-
#comment:
#1.zabbix Auto discovery eureka instance
#2. Yes instance Monitor and alarm the status of
import requests
import json
import sys
from copy import deepcopy
# return json Format data , Otherwise return to xml Format data
headers = {'Accept':'text/html, application/xhtml+xml, application/json;q=0.9, */*;q=0.8'}
def instance_discovery():
app_list = []
url="http://192.168.3.123:1180/eureka/apps/"
try:
response=requests.get(url, headers=headers)
if response.status_code == 200:
instance_dic = {}
#for app in response.json()["applications"]["application"][1:2]:
for app in response.json()["applications"]["application"]:
for instance in app['instance']:
instance_dic['{#APP}'] = instance['app']
instance_dic['{#HOSTNAME}'] = instance['hostName']
# deep copy
app_list.append(deepcopy(instance_dic))
#print(app_list)
#json serialize
discovery_app_info = {"data":app_list}
print(json.dumps(discovery_app_info, sort_keys=True, indent=4, separators=(',', ':')))
except Exception as e:
print(e)
def instance_status():
if len(sys.argv) == 4:
try:
url="http://192.168.3.123:1180/eureka/apps/%s/" % (sys.argv[2])
response=requests.get(url, headers=headers)
if response.status_code == 200:
instance_dic = {}
for instance in response.json()["application"]["instance"]:
if sys.argv[3] == instance["hostName"]:
print(instance["status"])
except Exception as e:
print(e)
else:
print("Usage: python eureka-instance.py status app hostName")
if __name__ == '__main__':
if sys.argv[1] == 'discovery':
instance_discovery()
elif sys.argv[1] == 'status':
instance_status()
else:
print("Usage: python eureka-instance.py [discovery]|[status app hostName]")
Access Zabbix
1. The configuration file
vim eureka.conf
UserParameter=instance_discovery,/usr/local/miniconda/bin/python /etc/zabbix/monitor_scripts/eureka-instance.py discovery
UserParameter=instance_status[*],/usr/local/miniconda/bin/python /etc/zabbix/monitor_scripts/eureka-instance.py status "$1" "$2"
2. Auto discovery

3. Monitor item configuration

4. The alarm information
# 1. Status as DOWN, Alarm occurs
Alarm host : middleware _eureka_192.168.3.123
host IP: 192.168.3.123
Host group : middleware _eureka
Alarm time :2022.06.01 14:58:23
recovery time :2022.06.01 15:13:24
Alarm level :High
The alarm information :Eureka/TEST1/192.168.3.10: Status as DOWN
Alarm items :instance_status[TEST1,192.168.3.10]
Details of the problem :
TEST1/192.168.3.10: DOWN
current state :
Alarm occurs
# 2. Status as UP, Restore alarm
Alarm host : middleware _eureka_192.168.3.123
host IP: 192.168.3.123
Host group : middleware _eureka
Alarm time :2022.06.01 14:58:23
recovery time :2022.06.01 15:13:24
Alarm level :High
The alarm information :Eureka/TEST1/192.168.3.10: Status as DOWN
Alarm items :instance_status[TEST1,192.168.3.10]
Details of the problem :
TEST1/192.168.3.10: UP
current state :
Alarm recovery : UP
边栏推荐
猜你喜欢

C语言_双创建、前插,尾插,遍历,删除

kubernetes集群搭建Zabbix监控平台

Biomedical English contract translation, characteristics of Vocabulary Translation

A method to measure the similarity of time series: from Euclidean distance to DTW and its variants

Lesson 7 tensorflow realizes convolutional neural network
![[advanced software testing step 1] basic knowledge of automated testing](/img/3d/f83f792e24efc39f00c0dc33936ce8.png)
[advanced software testing step 1] basic knowledge of automated testing

Visitor tweets about how you can layout the metauniverse

Windows Server 2016 standard installing Oracle
![[English] Verb Classification of grammatical reconstruction -- English rabbit learning notes (2)](/img/3c/c25e7cbef9be1860842e8981f72352.png)
[English] Verb Classification of grammatical reconstruction -- English rabbit learning notes (2)

My seven years with NLP
随机推荐
hydra常用命令
详解SQL中Groupings Sets 语句的功能和底层实现逻辑
SQL Server manager studio(SSMS)安装教程
Simple use of MySQL database: add, delete, modify and query
[brush questions] how can we correctly meet the interview?
Attributeerror: can 't get attribute' sppf 'on < module' models. Common 'from' / home / yolov5 / Models / comm
Blue Bridge Cup zero Foundation National Championship - day 20
【Hot100】739. 每日溫度
【刷题】怎么样才能正确的迎接面试?
The registration password of day 239/300 is 8~14 alphanumeric and punctuation, and at least 2 checks are included
After sharing the clone remote project, NPM install reports an error - CB () never called! This is an error with npm itself.
What is the difference between int (1) and int (10)? Senior developers can't tell!
接口自动化测试框架:Pytest+Allure+Excel
Leetcode - 152 product maximum subarray
Machine learning plant leaf recognition
【服务器数据恢复】IBM服务器raid5两块硬盘离线数据恢复案例
Entity Developer数据库应用程序的开发
Bitcoinwin (BCW): 借贷平台Celsius隐瞒亏损3.5万枚ETH 或资不抵债
SAP SD发货流程中托盘的管理
librosa音频处理教程