当前位置:网站首页>Microservice link risk analysis
Microservice link risk analysis
2022-06-30 21:45:00 【51CTO】
Link risk analysis starts from the historical data of link communication , Analyze the current risks of the link , Reduce the hidden trouble of link communication , Improve the overall stability of the system . Link risk analysis can solve many problems , Such as whether the timeout setting is reasonable 、 Whether the retry number setting is reasonable 、 service SLA Whether the indicator setting is reasonable 、 Whether the strength dependency of the service meets the expectation, etc , A large proportion of failures related to service communication are caused by link risks , It can be found and solved in advance through link risk analysis , Avoid failure .
One 、 Overtime and SLA risk
Timeout configuration of client access server , It is inconsistent with the actual visit , Is a very common link risk . The timeout configuration of upstream services is too small , This will cause some requests that could have been returned normally to time out , Affecting service SLA And normal service experience ; Timeout configuration is too large , It will cause the upstream service to wait for too long when the downstream service fails , In serious cases, it will bring down the whole system . therefore , The timeout setting is directly related to the stability of the system , There should be a corresponding mechanism to guide the timeout setting of the service , And timely discover the hidden trouble of overtime configuration in the online system .
There are two main types of overtime configuration risks : One is that the timeout does not match the actual time ; There is also a mismatch between the upstream and downstream timeout settings , Such as the A、B、C 3 A service , service A Access the service B, service B Access the service C, However, services are often encountered in actual business A Access the service B The timeout time is longer than that of the service B Access the service C When the timeout time is small .
Two 、 Strong or weak dependency or retry risk
Communication between microservices , If the link communication fails, the entire request processing will fail , Generally, the relationship between these two microservices is called strong dependency , On the contrary, it is called weak dependence . We can rely on the strength of the service , To downgrade 、 Fuse, etc .
Strength depends on service risk , It means that the relationship between link communication does not match the expectation . such as , service A Call the service B The link is weakly dependent , But as the requirements iterate, the business logic changes , May inadvertently serve A Call the service B In fact, the link becomes strongly dependent , But we still follow the prior knowledge , Think of it as a weak dependency , This is a big risk point . Especially when the link fails , When performing operations such as degradation based on the premise of weak dependency , It may lead to tragedy , Make the whole system unavailable . Therefore, there needs to be a corresponding mechanism , Regularly detect the risk points of the current link relationship .
3、 ... and 、 Cluster or topology risk
Cluster or topology risk is a major source of risk analysis . such as , Some machines in an online cluster are temporarily offline for a period of time due to warranty , However, the machine is not mounted after repair , As a result, some machines are idle for nothing ; service A Call the service B It was originally called by the same computer room , Temporarily switch the calling relationship to calling the services of other machine rooms due to failures or traffic switching drills B service , But I didn't cut it back afterwards , Lead to service A Call the service B Always cross machine room access , Affect user experience and system stability ; A service S The geographical location is not taken into account when deploying online , Deploy too many service nodes to the same switch , When the switch fails, multiple nodes of the service are unavailable at the same time , Insufficient number of available nodes leads to service avalanche .
Four 、 Link call risk
Link real-time topology data is a treasure , Many risks at the link call level can be slowly discovered . such as , The current service call exceeds 20 Downstream services , Fan out too much , Not quite in line with the design criteria of microservices , Consider whether further splitting is necessary .
In microservice Architecture , The link of a single request is particularly long , There will be some performance problems , Therefore, from the global link topology TOP10 Long link , Or the link depth exceeds 6 The links are listed , Feedback to business personnel , See if it is necessary to make architectural adjustments .
In the process of microservice splitting and Design , It is not recommended that two microservices be interdependent , You can find out whether there are currently looped links through the link topology , If it forms a ring , It shows that there are interdependencies between services , Similar risks can be fed back to business personnel for rectification .
Link risk analysis is the process of discovering risks, abstracting risks and establishing automatic detection mechanism , In essence, it is a systematic project for fine management of stability risk , It needs long-term and sustained construction .
Discovering risk is the first step of link risk analysis , In order to continuously discover new risks in the system , It is suggested to combine risk analysis with stability antipattern :① According to the major faults in the system , And some typical problems prone to failure accumulated before , Sort out stability anti pattern , That is, it is easy to make mistakes in stability practice , Some patterns that should not appear ;② Determine whether these anti patterns can be detected in an automated way .
At the same time, in order to facilitate the detection of new risks , A perfect risk analysis framework can be established , Specifically, it includes the risk status quo 、 Risk improvement closed loop 、 Risk Report 、 Automatic risk notification mechanism, etc , The new risk analysis is directly based on the framework development , It is equivalent to adding a plug-in , It can greatly improve the efficiency of risk analysis .
边栏推荐
- 看阿里云 CIPU 的 10 大能力
- ceshi deces
- clickhouse原生監控項,系統錶描述
- 漫谈Clickhouse Join
- 测试媒资缓存问题
- sdfsdf
- Ml & DL: introduction to hyperparametric optimization in machine learning and deep learning, evaluation index, over fitting phenomenon, and detailed introduction to commonly used parameter adjustment
- Auto-created primary key used when not defining a primary key
- Open source internship experience sharing: openeuler software package reinforcement test
- 京东与腾讯续签三年战略合作协议;起薪涨至26万元,韩国三星SK争相加薪留住半导体人才;Firefox 102 发布|极客头条
猜你喜欢

USBCAN分析仪的配套CAN和CANFD综合测试软件LKMaster软件解决工程师CAN总线测试难题

asp. Net core JWT delivery

Go Web 编程入门: 一探优秀测试库 GoConvey

Rethink healthy diet based on intestinal microbiome

Troubleshooting the problem of pytorch geometric torch scatter and torch spark installation errors

《ClickHouse原理解析与应用实践》读书笔记(1)

Prediction and regression of stacking integrated model

Open source internship experience sharing: openeuler software package reinforcement test

5G 在智慧医疗中的需求

Akk bacteria - the next generation of beneficial bacteria
随机推荐
Summary of errors reported when using YML file to migrate CONDA environment
Clickhouse native monitoring item, system table description
1-14 express managed static resources
物联网僵尸网络Gafgyt家族与物联网设备后门漏洞利用
PyTorch量化实践(2)
.netcore redis GEO类型
Look at the top 10 capabilities of alicloud cipu
1-10 根据不同的url响应客户端的内容
1-16 路由的概念
Text recognition svtr paper interpretation
布隆过滤器
升级kube出现unknown flag: --network-plugin
Reading notes of Clickhouse principle analysis and Application Practice (1)
1-2 install and configure MySQL related software
1-13 express监听GET和POST请求&处理请求
Coefficient of variation method matlab code [easy to understand]
Test medal 1234
Side sleep ha ha ha
Anaconda下安装Jupyter notebook
1-15 nodemon