当前位置:网站首页>How to monitor micro web services
How to monitor micro web services
2022-07-29 04:52:00 【nginx】
At first, I didn't completely know how to monitor these websites , So I want to quickly write down how I did it .
I'm not going to talk about how to monitor large 、 Serious mission critical websites , Just talk about small and unimportant websites .

The goal is : It takes little time to operate I hope the website can work normally most of the time , But I also hope not to spend time on continuous operation .
I was initially very vigilant about running servers , Because in my last job , I am a 24/7 Take turns on duty , Responsible for some key services , In my impression ,“ Responsible for the server ” signify “ In the morning 2 Click to be called up to repair the server ” and “ There are many complex dashboards ”.
So for a while I only did static websites , So I don't have to think about the server .
But finally I realized , The risk of any server I want to write is very low , If they go down occasionally 2 Hours is no big deal , I just need to set up some very simple monitoring to help them keep running .
It's bad not to monitor At first , I haven't set up any monitoring for my server at all . The result of this is very predictable : Sometimes the website breaks down , But I didn't find , Until someone told me !
step 1:uptime The viewer The first step is to establish a uptime The viewer . There are many such things outside , I'm using updown.io and uptime robot. I prefer updown User interface and pricing structure ( It is on request, not on a monthly basis ), but uptime Robots have a more generous free package .
They will :
- Check whether the website is normal
- If there is a fault , It will email me
step 2: End to end health check Next , Let's talk about “ Check whether the website is normal ” What does it mean .
At first , I just turn one of my health check endpoints into a function , It will come back anyway 200 OK.
This is quite useful – It tells me that the server is started !
But as expected , I have a problem , Because it didn't check API Is it true that Work – Sometimes the health check-up succeeds , Although other parts of the service have actually entered a bad state .
So I updated it , Let it really send API request , And make sure it succeeds .
All my services have done very little (nginx playground There's only one endpoint ), So setting up a health check-up is very easy , It actually runs through most of the actions that services should do .
Here is nginx playground What the end-to-end health check handler looks like . It's very basic : It just sends out a POST request ( Give yourself ), And check whether the request succeeds or fails .
Now? , Most of my health checks run every hour , Some every 30 Run every minute .
func healthHandler(w http.ResponseWriter, r *http.Request) {// make a request to localhost:8080 with `healthcheckJSON` as the body// if it works, return 200// if it doesn't, return 500client := http.Client{}resp, err := client.Post("(healthcheckJSON))if err != nil {log.Println(err)w.WriteHeader(http.StatusInternalServerError)return}if resp.StatusCode != http.StatusOK {log.Println(resp.StatusCode)w.WriteHeader(http.StatusInternalServerError)return}w.WriteHeader(http.StatusOK)}
I run every hour , because updown.io The price of is calculated according to the number of health checks , I'm monitoring 18 Different URL, And I want to keep my health examination budget at 5 dollar / The lowest level of the year .
Take an hour to find that one of these websites is down , It's OK for me – If there are questions , I can't guarantee to repair it soon .
If you can run them more often , I may every 5-10 Run every minute .
step 3: The third step : If the health check fails , Automatic restart Some of my websites are fly.io On ,fly It has a fairly standard function , I can configure one for a service HTTP health examination , If the health check fails , Just restart the service .
“ Frequent restart ” It's a very useful strategy to make up for what I haven't fixed bug, For a while ,nginx playground There is a process leak ,nginx The process was not terminated , So the memory of the server has been running out .
Pass the health check , As a result, , This happens every other day or so :
- The server is out of memory
- The health check began to fail
- It was restarted
- It's all right again
- After a few hours, repeat the whole legend again
These health checks used to decide whether to restart the service run more frequently : Every time 5 About minutes .
This is not the best way to monitor large Services This may be obvious , I have said it at the beginning , however “ Write a HTTP health examination ” Not the best way to monitor large and complex services . But I won't discuss it in depth , Because this is not the subject of this article .
It has been running well so far ! I was at first 3 I wrote this article in April three months ago , But I waited until now to release it to make sure the whole setup works .
This makes a big difference – Before I encounter some very stupid downtime problems , Now in the past few months , The running time of the website has reached 99.95%!
From the original :
边栏推荐
- Recyclerview switches the focus up and down through the dpad key. When switching to the control outside the interface, the focus will jump left and right
- Basic grammar of C language
- Take you to understand JS array
- Star a pathfinding in LAYA
- [C language] PTA 7-52 finding the sum of the first n terms of a simple interleaved sequence
- [QT learning notes] * insert pictures in the window
- def fasterrcnn_ resnet50_ FPN () instance test
- Using jupyter (I), install jupyter under windows, open the browser, and modify the default opening address
- pulsar起client客户端时(client,producer,consumer)各个配置
- Correct user dragging method
猜你喜欢

让你的正则表达式可读性提高一百倍

Flutter 手势监听和画板实现

STL source code analysis (Hou Jie) notes -- Classification and testing of stl containers

Solution to the fourth game of 2022 Hangzhou Electric Multi school league

Classes and objects (II)

C language implementation of three chess

Corresponding order of 18 and 25coco data of openpose and joint points

Climbing the pit of traffic flow prediction (II): the simplest LSTM predicts traffic flow using tensorflow2

The most complete NLP Chinese and English stop words list in the whole station (including punctuation marks, which can be copied directly)

Detailed comparison of break and continue functions
随机推荐
New year's greetings from programmers
Mysql:The user specified as a definer (‘root‘@‘%‘) does not exist 的解决办法
Data Lake: spark, a distributed open source processing engine
(heap sort) heap sort is super detailed, I don't believe you can't (C language code implementation)
File operation (Advanced C language)
金达威董秘回复:公司看好NMN产品的市场前景,已推出系列产品
Conv2d of torch
Reveal installation configuration debugging
2022杭电多校联赛第四场 题解
[express connection to MySQL database]
Software test interview questions (4)
Common rules of makefile (make) (II)
Leetcode 763. partition labels divide alphabetic intervals (medium)
iOS面试准备 - 其他篇
使用更灵活、更方便的罗氏线圈
Climbing the pit of traffic flow prediction (II): the simplest LSTM predicts traffic flow using tensorflow2
PHP判断用户是否已经登录,如果登录则显示首页,如果未登录则进入登录页面或注册页面
Common current limiting methods
EF Core: 一对一,多对多的配置
img 响应式图片的实现(含srcset属性、sizes属性的使用方法,设备像素比详解)