当前位置:网站首页>Service online governance
Service online governance
2022-06-30 13:39:00 【51CTO】
Online governance is based on the results of quantitative analysis , Adjust the operation status of online services through corresponding plans , Ensure the normal operation of online services , Next, we will discuss the common plans for online services , And how to ensure the automatic triggering and adjustment of the plan .
The ideal way to quickly locate the fault and stop the loss is to connect the fault location with the implementation of the plan , When something goes wrong , Be able to judge the general type of fault and the corresponding plan , And trigger the automatic implementation of the plan .
Of course, there are many aspects to consider in the actual fault handling process , Although not all faults can be prepared in advance , But we can classify faults according to historical faults and some prior knowledge , Establish corresponding plans .
in addition , When establishing the plan, it should be easy to implement and trigger , If it is not convenient to execute , It is difficult to deal with the fault in a short time , The most critical issue is to judge when the plan is triggered , And whether the plan should be implemented at present . Online service stability failures can be broadly classified as the following causes .
- Failure caused by change
Change is the main source of stability failure , There are many sources of change in the broad sense of the system , The most common service changes generally include application changes 、 Configuration change and data change . In addition to service changes , Changes in environment and hardware , For example, the network bandwidth changes 、 Link change of machine room, etc , It can also be classified into the broad category of change .
- Faults caused by flow and capacity changes
This type of failure corresponds to the sudden change of input flow analyzed in the previous stability guarantee , If the service doesn't have enough response mechanism in advance , It will lead to certain potential stability hazards .
- Dependency failure
Dependent service failure will affect the upstream services that call dependent services , Dependent service failures can be divided into strong dependent service failures and weak dependent service failures , These two will have corresponding treatment methods .
- Computer room 、 Network and other hardware and environment failures
Hardware and environmental failures are characterized by the inability to predict , Randomness and contingency are very big , And once it happens, it is often a system level problem , There will be serious consequences .
- other
such as ID Failure caused by generator overflow .
The scenario of fault refers to the category of stability fault according to the above , And then refine some scenes that are convenient for identification and judgment , Such as sudden increase of inlet flow 、 Access layer failure 、 Strongly dependent service failure 、 Weak dependency service failure and other subdivided scenarios . The purpose of dividing these scenarios is to , Further identify the root cause of the fault ( Not necessarily the most fundamental reason , It is classified from the perspective of stop loss ). Therefore, it can be applied to those scenarios that are easy to judge the fault through the observability index , And it is convenient to formulate the corresponding scenario plan , Make scene classification .
For downgrade 、 The current limiting and redundant cut-off scenarios are relatively clear , Can be based on Metric Make fault diagnosis , And automatically get through with the plan , Take the dependent service failure as an example , According to Metric The success rate index changes month on month , Determine whether the dependent service is abnormal , If you pass Metric It is found that there are indeed exceptions at present , First query the change management platform , Depends on whether the service currently has relevant change operations .
If there are changes , It is recommended that the person who relies on the service interface immediately roll back the change ; If there is no change operation , Then judge whether the current invocation of dependent services is strongly dependent or weakly dependent , If it is a weak dependency , You can start the automatic degradation plan to degrade dependent services , If it's a strong dependence , Demotion certainly won't solve the problem , The redundancy switching plan can be prepared in advance , Start service level 、 Cluster level or machine room level traffic switching .
be based on Metric Get through with the plan , The goal is to evolve towards automation and intelligence of fault location , But it needs to be pushed forward step by step according to the actual situation , For some scenes that are not easy to judge , Caution is recommended , Avoid possible miscalculation , At the same time, the plan shall be rehearsed regularly , Ensure the effectiveness of the plan trigger .
边栏推荐
- Lucky hash quiz system development (source code deployment) fun investment hash game play development (case requirements)
- Google Earth Engine(GEE)——GHSL:全球人类住区层,建成网格 1975-1990-2000-2015 (P2016) 数据集
- WTM major updates, multi tenancy and single sign on
- 智慧运维:基于 BIM 技术的可视化管理系统
- Golang template (text/template)
- Rk356x u-boot Institute (command section) 3.3 env related command usage
- [recruitment (Guangzhou)] Chenggong Yi (Guangzhou) Net core middle and Senior Development Engineer
- 嵌入式开发:5个可能不再被禁止的C特征
- RK356x U-Boot研究所(命令篇)3.2 help命令的用法
- postman 自動生成 curl 代碼片段
猜你喜欢

Introduction to two types of rxjs observable operators

Today's sleep quality record 80 points

Pytorch查看模型参数量和计算量

Kaniko official documents - build images in kubernetes

Defi "where does the money come from"? A problem that most people don't understand

Data Lake (11): Iceberg table data organization and query

MFQE 2.0: A New Approach for Multi-FrameQuality Enhancement on Compressed Video

ABAP toolbox v1.0 (with implementation ideas)
![[Title brushing] coco, who likes bananas](/img/66/5646ac7e644025ccaee7c17f62ce17.png)
[Title brushing] coco, who likes bananas

MySQL access denied, opened as Administrator
随机推荐
In the digital age, XDR (extended detection and response) has unlimited possibilities
Resource realization applet opening traffic main tutorial
PG基础篇--逻辑结构管理(表继承、分区表)
MySQL如何将列合并?
Introduction to two types of rxjs observable operators
Google Earth Engine(GEE)——GHSL:全球人类住区层,建成网格 1975-1990-2000-2015 (P2016) 数据集
How can c write an SQL parser
Postman génère automatiquement des fragments de code Curl
Goods and services - platform properties
[the path of system analyst] Chapter 5 Software Engineering (Agile Development)
The independent station is Web3.0. The national "14th five year plan" requires enterprises to build digital websites!
(8) JMeter component detailed once only controller
Common UI components
[Title brushing] coco, who likes bananas
[Title brushing] heater
SQL attendance statistics monthly report
嵌入式开发:5个可能不再被禁止的C特征
Basic syntax of unity script (3) - accessing game object components
优思学院:六西格玛不只是统计!
Unity 频繁切换分支 结果模型出现莫名其妙的错误