Missing monitoring: ZABBIX monitors the status of Eureka instance

Before formally introducing monitoring , Let's first understand the two components used in our microservice architecture .


Eureka As Spring Cloud Registration Center for , It mainly provides the ability of service registration and discovery .

Eureka use CS(Client/Server, client / The server ) framework , It includes the following two components :

  • Eureka Server
    Eureka Service registry , It is mainly used to provide service registration function . When the microservice starts , Will register their services to Eureka Server.Eureka Server Maintains a list of available services , Store all registered to Eureka Server Information about available services , These available services can be found in Eureka Server It can be seen intuitively in the management interface of .

  • Eureka Client
    Eureka client , It usually refers to the micro services in the micro service system , Mainly used for and Eureka Server Interact . After the microservice application starts ,Eureka Client Will send to Eureka Server Send a heartbeat ( The default period is 30 second ). if Eureka Server A message was not received in multiple heartbeat cycles Eureka Client The heart of ,Eureka Server Remove it from the list of available services ( Default 90 second ).


Apollo( Apollo ) It is a reliable distributed configuration management center , Born in Ctrip framework R & D department , Able to centralize the management of different environments 、 Configuration of different clusters , After configuration modification, it can be pushed to the application end in real time , And have standard authority 、 Process governance and other features , Applicable to microservice configuration management scenarios .

and Spring Cloud Configuration center integration git Warehouse 、svn The warehouse and other configuration sources obtain the system configuration parameters uniformly , from Spring Cloud Config Client consumption . Its configuration file must follow a strict syntax format , Even extra spaces can cause Client Unable to read the corresponding configuration parameters , Once something goes wrong, it's easy to ignore .

be based on Spring Cloud The defect of configuration center , We use Apollo Has been replaced step by step , Make operation and maintenance 、 Development is more convenient for configuration file management between different environments .


With the help of Apollo Can achieve Configuration changes take effect in real time ( Hot release ), That is, the user Apollo After modifying the configuration and Publishing , The client can be real-time (1 second ) Received the latest configuration , And notify the application .

Apollo The hot release function meets our requirements when changing configuration attributes , Applications can perceive in real time . But in use , If Eureka Client When automatically updating the configuration , Full update , Will lead to Eureka Server The health check status during the heartbeat cycle is as follows :

  • client Service discovery status “UNKNOWN”
  • client The background is still running normally , Unable to contact Eureka Server Send a normal heartbeat

because Eureka Server Service discovery status is abnormal , At this time, it is impossible to provide external services normally . If the operation and maintenance does not check in time Eureka Manage each client In the state of , Then there will be a production accident .

Be careful : Every client Corresponding to one instance, Now we call it instance.


For the above situations , Although we have been right instance Access to health check , But because of instance There is no alarm during normal operation , It seems that there are still loopholes in our monitoring , So we need to pass Zabbix Yes Eureka instance State monitoring to achieve full coverage of application monitoring ..


Eureka Server Registered services with application Dimensions are grouped , Every application There are multiple under instance. So we use Zabbix Autodiscover for , adopt Eureka API You can get all the grouping information , Instead of manually adding monitoring items again every time .

because Zabbix Monitoring items cannot be repeated , So we passed application name /Instance ip Address Name it , Distinguish between different instance, This requires that our application cannot deploy multiple applications on one server , Otherwise, the monitored items will be repeated .

Be careful : We can actually get by InstanceId As a distinction, it is more reasonable , however InstanceId The use of is often not standardized , If included ip、 Host name and so on , Because the characters are too long, it may cause unnecessary trouble .

Eureka API

#  obtain Eureka  be-all application

#  Get a application All of them instance name /

Concrete realization

Because you need to parse Eureka API Returned data , So we use python Parsing json data .

instance Auto discovery

#  Execute the script to automatically discover application Name and Instance ip Address 
python eureka-instance.py discovery

By acquiring {#APP} and {#HOSTNAME}, We can combine them into monitoring items corresponding to the naming rules .

#  Monitoring item combination 

Get monitor item status

Data after automatic discovery , We can further obtain the status of the monitored items .

# 1. obtain instance 10 state 
python eureka-instance.py status TEST1
#  Execution results 

# 2. obtain instance 11 state 
python eureka-instance.py status TEST1
#  Execution results 

According to the different Instance The state of , As long as the result is not “UP” Then alarm .

The final script

#-*- coding:utf-8 -*-
#1.zabbix Auto discovery eureka instance
#2. Yes instance Monitor and alarm the status of 

import requests
import json
import sys
from copy import deepcopy

#  return json Format data , Otherwise return to xml Format data 
headers = {'Accept':'text/html, application/xhtml+xml, application/json;q=0.9, */*;q=0.8'}

def instance_discovery():
    app_list = []
       response=requests.get(url, headers=headers)
       if response.status_code == 200:
           instance_dic = {}
           #for app in response.json()["applications"]["application"][1:2]:
           for app in response.json()["applications"]["application"]:
               for instance in app['instance']:
                   instance_dic['{#APP}'] = instance['app']
                   instance_dic['{#HOSTNAME}'] = instance['hostName']
                   #  deep copy
           #json serialize 
           discovery_app_info = {"data":app_list}
           print(json.dumps(discovery_app_info, sort_keys=True, indent=4, separators=(',', ':')))
    except Exception as e:

def instance_status():
    if len(sys.argv) == 4:
            url="" % (sys.argv[2])
            response=requests.get(url, headers=headers)
            if response.status_code == 200:
                instance_dic = {}
                for instance in response.json()["application"]["instance"]:
                    if sys.argv[3] == instance["hostName"]:
        except Exception as e:
        print("Usage: python eureka-instance.py status app hostName")

if __name__ == '__main__':
    if sys.argv[1] == 'discovery':
    elif sys.argv[1] == 'status':
        print("Usage: python eureka-instance.py [discovery]|[status app hostName]")

Access Zabbix

1. The configuration file

vim eureka.conf
UserParameter=instance_discovery,/usr/local/miniconda/bin/python /etc/zabbix/monitor_scripts/eureka-instance.py discovery
UserParameter=instance_status[*],/usr/local/miniconda/bin/python /etc/zabbix/monitor_scripts/eureka-instance.py status "$1" "$2"

2. Auto discovery

 Insert picture description here

3. Monitor item configuration

 Insert picture description here

4. The alarm information

# 1. Status as DOWN, Alarm occurs 
 Alarm host : middleware _eureka_192.168.3.123
 host IP:
 Host group :  middleware _eureka
 Alarm time :2022.06.01 14:58:23
 recovery time :2022.06.01 15:13:24
 Alarm level :High
 The alarm information :Eureka/TEST1/ Status as DOWN
 Alarm items :instance_status[TEST1,]
 Details of the problem :

 current state :
 Alarm occurs 

# 2. Status as UP, Restore alarm 
 Alarm host : middleware _eureka_192.168.3.123
 host IP:
 Host group :  middleware _eureka
 Alarm time :2022.06.01 14:58:23
 recovery time :2022.06.01 15:13:24
 Alarm level :High
 The alarm information :Eureka/TEST1/ Status as DOWN
 Alarm items :instance_status[TEST1,]
 Details of the problem :

 current state :
 Alarm recovery : UP


