当前位置：网站首页>Microservice link risk analysis

Microservice link risk analysis

2022-06-30 21:45:00 【51CTO】

Link risk analysis starts from the historical data of link communication , Analyze the current risks of the link , Reduce the hidden trouble of link communication , Improve the overall stability of the system . Link risk analysis can solve many problems , Such as whether the timeout setting is reasonable 、 Whether the retry number setting is reasonable 、 service SLA Whether the indicator setting is reasonable 、 Whether the strength dependency of the service meets the expectation, etc , A large proportion of failures related to service communication are caused by link risks , It can be found and solved in advance through link risk analysis , Avoid failure .

One 、 Overtime and SLA risk

Timeout configuration of client access server , It is inconsistent with the actual visit , Is a very common link risk . The timeout configuration of upstream services is too small , This will cause some requests that could have been returned normally to time out , Affecting service SLA And normal service experience ; Timeout configuration is too large , It will cause the upstream service to wait for too long when the downstream service fails , In serious cases, it will bring down the whole system . therefore , The timeout setting is directly related to the stability of the system , There should be a corresponding mechanism to guide the timeout setting of the service , And timely discover the hidden trouble of overtime configuration in the online system .

There are two main types of overtime configuration risks ： One is that the timeout does not match the actual time ; There is also a mismatch between the upstream and downstream timeout settings , Such as the A、B、C 3 A service , service A Access the service B, service B Access the service C, However, services are often encountered in actual business A Access the service B The timeout time is longer than that of the service B Access the service C When the timeout time is small .

Two 、 Strong or weak dependency or retry risk

Communication between microservices , If the link communication fails, the entire request processing will fail , Generally, the relationship between these two microservices is called strong dependency , On the contrary, it is called weak dependence . We can rely on the strength of the service , To downgrade 、 Fuse, etc .

Strength depends on service risk , It means that the relationship between link communication does not match the expectation . such as , service A Call the service B The link is weakly dependent , But as the requirements iterate, the business logic changes , May inadvertently serve A Call the service B In fact, the link becomes strongly dependent , But we still follow the prior knowledge , Think of it as a weak dependency , This is a big risk point . Especially when the link fails , When performing operations such as degradation based on the premise of weak dependency , It may lead to tragedy , Make the whole system unavailable . Therefore, there needs to be a corresponding mechanism , Regularly detect the risk points of the current link relationship .

3、 ... and 、 Cluster or topology risk

Cluster or topology risk is a major source of risk analysis . such as , Some machines in an online cluster are temporarily offline for a period of time due to warranty , However, the machine is not mounted after repair , As a result, some machines are idle for nothing ; service A Call the service B It was originally called by the same computer room , Temporarily switch the calling relationship to calling the services of other machine rooms due to failures or traffic switching drills B service , But I didn't cut it back afterwards , Lead to service A Call the service B Always cross machine room access , Affect user experience and system stability ; A service S The geographical location is not taken into account when deploying online , Deploy too many service nodes to the same switch , When the switch fails, multiple nodes of the service are unavailable at the same time , Insufficient number of available nodes leads to service avalanche .

Four 、 Link call risk

Link real-time topology data is a treasure , Many risks at the link call level can be slowly discovered . such as , The current service call exceeds 20 Downstream services , Fan out too much , Not quite in line with the design criteria of microservices , Consider whether further splitting is necessary .

In microservice Architecture , The link of a single request is particularly long , There will be some performance problems , Therefore, from the global link topology TOP10 Long link , Or the link depth exceeds 6 The links are listed , Feedback to business personnel , See if it is necessary to make architectural adjustments .

In the process of microservice splitting and Design , It is not recommended that two microservices be interdependent , You can find out whether there are currently looped links through the link topology , If it forms a ring , It shows that there are interdependencies between services , Similar risks can be fed back to business personnel for rectification .

Link risk analysis is the process of discovering risks, abstracting risks and establishing automatic detection mechanism , In essence, it is a systematic project for fine management of stability risk , It needs long-term and sustained construction .

Discovering risk is the first step of link risk analysis , In order to continuously discover new risks in the system , It is suggested to combine risk analysis with stability antipattern ：① According to the major faults in the system , And some typical problems prone to failure accumulated before , Sort out stability anti pattern , That is, it is easy to make mistakes in stability practice , Some patterns that should not appear ;② Determine whether these anti patterns can be detected in an automated way .

At the same time, in order to facilitate the detection of new risks , A perfect risk analysis framework can be established , Specifically, it includes the risk status quo 、 Risk improvement closed loop 、 Risk Report 、 Automatic risk notification mechanism, etc , The new risk analysis is directly based on the framework development , It is equivalent to adding a plug-in , It can greatly improve the efficiency of risk analysis .

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206302136495711.html