当前位置:网站首页>Service online governance
Service online governance
2022-06-30 13:39:00 【51CTO】
Online governance is based on the results of quantitative analysis , Adjust the operation status of online services through corresponding plans , Ensure the normal operation of online services , Next, we will discuss the common plans for online services , And how to ensure the automatic triggering and adjustment of the plan .
The ideal way to quickly locate the fault and stop the loss is to connect the fault location with the implementation of the plan , When something goes wrong , Be able to judge the general type of fault and the corresponding plan , And trigger the automatic implementation of the plan .
Of course, there are many aspects to consider in the actual fault handling process , Although not all faults can be prepared in advance , But we can classify faults according to historical faults and some prior knowledge , Establish corresponding plans .
in addition , When establishing the plan, it should be easy to implement and trigger , If it is not convenient to execute , It is difficult to deal with the fault in a short time , The most critical issue is to judge when the plan is triggered , And whether the plan should be implemented at present . Online service stability failures can be broadly classified as the following causes .
- Failure caused by change
Change is the main source of stability failure , There are many sources of change in the broad sense of the system , The most common service changes generally include application changes 、 Configuration change and data change . In addition to service changes , Changes in environment and hardware , For example, the network bandwidth changes 、 Link change of machine room, etc , It can also be classified into the broad category of change .
- Faults caused by flow and capacity changes
This type of failure corresponds to the sudden change of input flow analyzed in the previous stability guarantee , If the service doesn't have enough response mechanism in advance , It will lead to certain potential stability hazards .
- Dependency failure
Dependent service failure will affect the upstream services that call dependent services , Dependent service failures can be divided into strong dependent service failures and weak dependent service failures , These two will have corresponding treatment methods .
- Computer room 、 Network and other hardware and environment failures
Hardware and environmental failures are characterized by the inability to predict , Randomness and contingency are very big , And once it happens, it is often a system level problem , There will be serious consequences .
- other
such as ID Failure caused by generator overflow .
The scenario of fault refers to the category of stability fault according to the above , And then refine some scenes that are convenient for identification and judgment , Such as sudden increase of inlet flow 、 Access layer failure 、 Strongly dependent service failure 、 Weak dependency service failure and other subdivided scenarios . The purpose of dividing these scenarios is to , Further identify the root cause of the fault ( Not necessarily the most fundamental reason , It is classified from the perspective of stop loss ). Therefore, it can be applied to those scenarios that are easy to judge the fault through the observability index , And it is convenient to formulate the corresponding scenario plan , Make scene classification .
For downgrade 、 The current limiting and redundant cut-off scenarios are relatively clear , Can be based on Metric Make fault diagnosis , And automatically get through with the plan , Take the dependent service failure as an example , According to Metric The success rate index changes month on month , Determine whether the dependent service is abnormal , If you pass Metric It is found that there are indeed exceptions at present , First query the change management platform , Depends on whether the service currently has relevant change operations .
If there are changes , It is recommended that the person who relies on the service interface immediately roll back the change ; If there is no change operation , Then judge whether the current invocation of dependent services is strongly dependent or weakly dependent , If it is a weak dependency , You can start the automatic degradation plan to degrade dependent services , If it's a strong dependence , Demotion certainly won't solve the problem , The redundancy switching plan can be prepared in advance , Start service level 、 Cluster level or machine room level traffic switching .
be based on Metric Get through with the plan , The goal is to evolve towards automation and intelligence of fault location , But it needs to be pushed forward step by step according to the actual situation , For some scenes that are not easy to judge , Caution is recommended , Avoid possible miscalculation , At the same time, the plan shall be rehearsed regularly , Ensure the effectiveness of the plan trigger .
边栏推荐
- Observable, seulement fiable: première bombe de salon de la série cloudops d'exploitation et d'entretien automatisés dans le nuage
- Apache Doris Compaction優化百科全書
- 腾讯二面:@Bean 与 @Component 用在同一个类上,会怎么样?
- 60 divine vs Code plug-ins!!
- 一条查询SQL是如何执行的
- RK356x U-Boot研究所(命令篇)3.3 env相关命令的用法
- [the path of system analyst] Chapter V software engineering (software process improvement)
- Resource realization applet opening wechat official small store tutorial
- 华为帐号多端协同,打造美好互联生活
- 【招聘(广州)】成功易(广州).Net Core中高级开发工程师
猜你喜欢
【刷题篇】供暖器
Google Earth Engine(GEE)——GHSL:全球人类住区层,建成网格 1975-1990-2000-2015 (P2016) 数据集
Data Lake (11): Iceberg table data organization and query
SQL考勤统计月报表
Rk356x u-boot Institute (command section) 3.2 usage of help command
这个编辑器即将开源!
商品服务-平台属性
一文讲清楚什么是类型化数组、ArrayBuffer、TypedArray、DataView等概念
JMeter learning notes
Postman génère automatiquement des fragments de code Curl
随机推荐
60 divine vs Code plug-ins!!
kaniko官方文档 - Build Images In Kubernetes
Pytorch查看模型参数量和计算量
Exlipse operates on multiple rows at the same time. For example, input the same text in multiple lines and columns at the same time
Basic syntax of unity script (4) - access to other game objects
This article explains the concepts of typed array, arraybuffer, typedarray, DataView, etc
Basic syntax of unity script (2) -record time in unity
RK356x U-Boot研究所(命令篇)3.3 env相关命令的用法
Golang Basics - string and int, Int64 inter conversion
[KALI] KALI系统、软件更新(附带镜像源)
Apache Doris Compaction優化百科全書
MFQE 2.0: A New Approach for Multi-FrameQuality Enhancement on Compressed Video
【系统分析师之路】第五章 复盘软件工程(软件过程改进)
防火墙基础之总部双机热备与分支基础配置
2022-06-23 sail soft part formula and SQL generation (month and quarter retrieval)
【科研数据处理】[实践]类别变量频数分析图表、数值变量分布图表与正态性检验(包含对数正态)
DNS resolution home network access public DNS practice
Apache Doris Compaction优化百科全书
步骤详解 | 助您轻松提交 Google Play 数据安全表单
优思学院:六西格玛不只是统计!