当前位置:网站首页>How to monitor micro web services
How to monitor micro web services
2022-07-29 04:52:00 【nginx】
At first, I didn't completely know how to monitor these websites , So I want to quickly write down how I did it .
I'm not going to talk about how to monitor large 、 Serious mission critical websites , Just talk about small and unimportant websites .

The goal is : It takes little time to operate I hope the website can work normally most of the time , But I also hope not to spend time on continuous operation .
I was initially very vigilant about running servers , Because in my last job , I am a 24/7 Take turns on duty , Responsible for some key services , In my impression ,“ Responsible for the server ” signify “ In the morning 2 Click to be called up to repair the server ” and “ There are many complex dashboards ”.
So for a while I only did static websites , So I don't have to think about the server .
But finally I realized , The risk of any server I want to write is very low , If they go down occasionally 2 Hours is no big deal , I just need to set up some very simple monitoring to help them keep running .
It's bad not to monitor At first , I haven't set up any monitoring for my server at all . The result of this is very predictable : Sometimes the website breaks down , But I didn't find , Until someone told me !
step 1:uptime The viewer The first step is to establish a uptime The viewer . There are many such things outside , I'm using updown.io and uptime robot. I prefer updown User interface and pricing structure ( It is on request, not on a monthly basis ), but uptime Robots have a more generous free package .
They will :
- Check whether the website is normal
- If there is a fault , It will email me
step 2: End to end health check Next , Let's talk about “ Check whether the website is normal ” What does it mean .
At first , I just turn one of my health check endpoints into a function , It will come back anyway 200 OK.
This is quite useful – It tells me that the server is started !
But as expected , I have a problem , Because it didn't check API Is it true that Work – Sometimes the health check-up succeeds , Although other parts of the service have actually entered a bad state .
So I updated it , Let it really send API request , And make sure it succeeds .
All my services have done very little (nginx playground There's only one endpoint ), So setting up a health check-up is very easy , It actually runs through most of the actions that services should do .
Here is nginx playground What the end-to-end health check handler looks like . It's very basic : It just sends out a POST request ( Give yourself ), And check whether the request succeeds or fails .
Now? , Most of my health checks run every hour , Some every 30 Run every minute .
func healthHandler(w http.ResponseWriter, r *http.Request) {// make a request to localhost:8080 with `healthcheckJSON` as the body// if it works, return 200// if it doesn't, return 500client := http.Client{}resp, err := client.Post("(healthcheckJSON))if err != nil {log.Println(err)w.WriteHeader(http.StatusInternalServerError)return}if resp.StatusCode != http.StatusOK {log.Println(resp.StatusCode)w.WriteHeader(http.StatusInternalServerError)return}w.WriteHeader(http.StatusOK)}
I run every hour , because updown.io The price of is calculated according to the number of health checks , I'm monitoring 18 Different URL, And I want to keep my health examination budget at 5 dollar / The lowest level of the year .
Take an hour to find that one of these websites is down , It's OK for me – If there are questions , I can't guarantee to repair it soon .
If you can run them more often , I may every 5-10 Run every minute .
step 3: The third step : If the health check fails , Automatic restart Some of my websites are fly.io On ,fly It has a fairly standard function , I can configure one for a service HTTP health examination , If the health check fails , Just restart the service .
“ Frequent restart ” It's a very useful strategy to make up for what I haven't fixed bug, For a while ,nginx playground There is a process leak ,nginx The process was not terminated , So the memory of the server has been running out .
Pass the health check , As a result, , This happens every other day or so :
- The server is out of memory
- The health check began to fail
- It was restarted
- It's all right again
- After a few hours, repeat the whole legend again
These health checks used to decide whether to restart the service run more frequently : Every time 5 About minutes .
This is not the best way to monitor large Services This may be obvious , I have said it at the beginning , however “ Write a HTTP health examination ” Not the best way to monitor large and complex services . But I won't discuss it in depth , Because this is not the subject of this article .
It has been running well so far ! I was at first 3 I wrote this article in April three months ago , But I waited until now to release it to make sure the whole setup works .
This makes a big difference – Before I encounter some very stupid downtime problems , Now in the past few months , The running time of the website has reached 99.95%!
From the original :
边栏推荐
- STL source code analysis (Hou Jie) notes -- Classification and testing of stl containers
- Implementation of flutter gesture monitoring and Sketchpad
- Tower of Hanoi classic recursion problem (C language implementation)
- [express connection to MySQL database]
- GCC Basics
- How to open IE browser by running win command
- 怎样监测微型的网站服务
- Numpy basic learning
- EF core: one to one, many to many configuration
- [C language] PTA 7-52 finding the sum of the first n terms of a simple interleaved sequence
猜你喜欢

There are objections and puzzles about joinpoint in afterreturning notice (I hope someone will leave a message)

Reveal安装配置调试

spinning up安装完使用教程测试是否成功,出现Library“GLU“ not found和‘from pyglet.gl import *错误解决办法

Idea small settings

Auto.js脚本开发环境搭建

iOS面试准备 - ios篇
![学术 | [LaTex]超详细Texlive2022+Tex Studio下载安装配置](/img/4d/f8c60c0fbbd98c4da198cfac7989fa.png)
学术 | [LaTex]超详细Texlive2022+Tex Studio下载安装配置

Un7.28: common commands of redis client.

谷歌浏览器 打开网页出现 out of memory

Review key points and data sorting of information metrology in the second semester of 2022 (teacher zhaorongying of Wuhan University)
随机推荐
金达威董秘回复:公司看好NMN产品的市场前景,已推出系列产品
[C language] PTA 7-52 finding the sum of the first n terms of a simple interleaved sequence
New year's greetings from programmers
UE plays video in scene or UMG
Simply change the picture color
C语言之基础语法
Go memory model for concurrency
Deep analysis of data storage in memory (Advanced C language)
删除word文档中的空白页
谷歌浏览器 打开网页出现 out of memory
Implementation of img responsive pictures (including the usage of srcset attribute and sizes attribute, and detailed explanation of device pixel ratio)
Leetcode (Sword finger offer) - 53 - I. find the number I in the sorted array
Vscode configuration makefile compilation
Vscode one click compilation and debugging
Pyqt5 learning pit encounter and pit drainage (2) buttons in qformlayout layout cannot be displayed
[c language] PTA 7-63 falling ball
数据湖:分布式开源处理引擎Spark
Flutter 手势监听和画板实现
[c language] PTA 7-48 find the number of combinations
Solution to the fourth game of 2022 Hangzhou Electric Multi school league