当前位置:网站首页>Service online governance
Service online governance
2022-06-30 13:39:00 【51CTO】
Online governance is based on the results of quantitative analysis , Adjust the operation status of online services through corresponding plans , Ensure the normal operation of online services , Next, we will discuss the common plans for online services , And how to ensure the automatic triggering and adjustment of the plan .
The ideal way to quickly locate the fault and stop the loss is to connect the fault location with the implementation of the plan , When something goes wrong , Be able to judge the general type of fault and the corresponding plan , And trigger the automatic implementation of the plan .
Of course, there are many aspects to consider in the actual fault handling process , Although not all faults can be prepared in advance , But we can classify faults according to historical faults and some prior knowledge , Establish corresponding plans .
in addition , When establishing the plan, it should be easy to implement and trigger , If it is not convenient to execute , It is difficult to deal with the fault in a short time , The most critical issue is to judge when the plan is triggered , And whether the plan should be implemented at present . Online service stability failures can be broadly classified as the following causes .
- Failure caused by change
Change is the main source of stability failure , There are many sources of change in the broad sense of the system , The most common service changes generally include application changes 、 Configuration change and data change . In addition to service changes , Changes in environment and hardware , For example, the network bandwidth changes 、 Link change of machine room, etc , It can also be classified into the broad category of change .
- Faults caused by flow and capacity changes
This type of failure corresponds to the sudden change of input flow analyzed in the previous stability guarantee , If the service doesn't have enough response mechanism in advance , It will lead to certain potential stability hazards .
- Dependency failure
Dependent service failure will affect the upstream services that call dependent services , Dependent service failures can be divided into strong dependent service failures and weak dependent service failures , These two will have corresponding treatment methods .
- Computer room 、 Network and other hardware and environment failures
Hardware and environmental failures are characterized by the inability to predict , Randomness and contingency are very big , And once it happens, it is often a system level problem , There will be serious consequences .
- other
such as ID Failure caused by generator overflow .
The scenario of fault refers to the category of stability fault according to the above , And then refine some scenes that are convenient for identification and judgment , Such as sudden increase of inlet flow 、 Access layer failure 、 Strongly dependent service failure 、 Weak dependency service failure and other subdivided scenarios . The purpose of dividing these scenarios is to , Further identify the root cause of the fault ( Not necessarily the most fundamental reason , It is classified from the perspective of stop loss ). Therefore, it can be applied to those scenarios that are easy to judge the fault through the observability index , And it is convenient to formulate the corresponding scenario plan , Make scene classification .
For downgrade 、 The current limiting and redundant cut-off scenarios are relatively clear , Can be based on Metric Make fault diagnosis , And automatically get through with the plan , Take the dependent service failure as an example , According to Metric The success rate index changes month on month , Determine whether the dependent service is abnormal , If you pass Metric It is found that there are indeed exceptions at present , First query the change management platform , Depends on whether the service currently has relevant change operations .
If there are changes , It is recommended that the person who relies on the service interface immediately roll back the change ; If there is no change operation , Then judge whether the current invocation of dependent services is strongly dependent or weakly dependent , If it is a weak dependency , You can start the automatic degradation plan to degrade dependent services , If it's a strong dependence , Demotion certainly won't solve the problem , The redundancy switching plan can be prepared in advance , Start service level 、 Cluster level or machine room level traffic switching .
be based on Metric Get through with the plan , The goal is to evolve towards automation and intelligence of fault location , But it needs to be pushed forward step by step according to the actual situation , For some scenes that are not easy to judge , Caution is recommended , Avoid possible miscalculation , At the same time, the plan shall be rehearsed regularly , Ensure the effectiveness of the plan trigger .
边栏推荐
- There is no utf8 option for creating tables in Navicat database.
- 目录相关命令
- MySQL queries the data within the radius according to the longitude and latitude, and draws a circle to query the database
- 【系统分析师之路】第五章 复盘软件工程(软件过程改进)
- 【刷题篇】避免洪水泛滥
- IM即时通讯应用开发中无法解决的“顽疾”
- 60 个神级 VS Code 插件!!
- App wechat payment unicloud version of uniapp payment (with source code)
- 一篇文章读懂关于企业IM的所有知识点
- 数据湖(十一):Iceberg表数据组织与查询
猜你喜欢

Loss function: Diou loss handwriting implementation

MFQE 2.0: A New Approach for Multi-FrameQuality Enhancement on Compressed Video

Matlab tips (22) matrix analysis -- stepwise regression

All the abnormal knowledge you want is here

On the simplification and acceleration of join operation

这个编辑器即将开源!

Defi "where does the money come from"? A problem that most people don't understand

【刷题篇】避免洪水泛滥

防火墙基础之总部双机热备与分支基础配置

60 个神级 VS Code 插件!!
随机推荐
DeFi“钱从哪来”?一个大多数人都没搞清楚的问题
Hangzhou E-Commerce Research Institute: the official website (website) is the only form of private domain
产品经理专业知识50篇(七)-如何建立一套完整的用户成长体系?
MySQL如何将列合并?
(8) JMeter component detailed once only controller
Loss function: Diou loss handwriting implementation
Google Earth Engine(GEE)——GHSL:全球人类住区层,建成网格 1975-1990-2000-2015 (P2016) 数据集
ABAP toolbox v1.0 (with implementation ideas)
Observable, seulement fiable: première bombe de salon de la série cloudops d'exploitation et d'entretien automatisés dans le nuage
Introduction to two types of rxjs observable operators
数字时代,XDR(扩展检测与响应)的无限可能
PG基础篇--逻辑结构管理(表继承、分区表)
【科学文献计量】外文文献及中文文献关键词的挖掘与可视化
exlipse同时操作多行。比如同时在多行同列输入相同的文字
Unity animator parameter
Basic syntax of unity script (2) -record time in unity
60 个神级 VS Code 插件!!
Why can't the database table be written into data
An interesting thing happened in the project
STM32 porting the fish component of RT thread Standard Edition