author : Mingshao
What is chaos Engineering
The system architecture has experienced from stand-alone to distributed , And now the cloud native architecture , Its complexity is increasing , The difficulty of problem location also rises . In the face of faults that may occur at any time , Is there any way to solve this dilemma .
Chaos Engineering (Chaos Engineering) The discipline of conducting experiments on distributed systems , By actively injecting faults , Identify weaknesses in the system ahead of time , Promote the improvement of Architecture , Ultimately, business resilience . So as to avoid failure in the online operation environment .

Here is an example of cloud native architecture , Why can chaos engineering solve the problems existing in the system architecture . The correspondence between cloud native architecture principle and chaos engineering principle can be found , Explain with the principle of service , The fundamental principle of service is how to manage services , That is, the problem of judging the strong and weak dependence between upstream and downstream Services . Through chaos Engineering , You can locate the request to a specific machine , Then reduce to the application of specific machines , Constantly minimize the explosion radius , By injecting faults between applications , Judge whether the upstream and downstream services are normal , To judge its strong and weak dependence .

The goal of chaos engineering is to achieve a resilient architecture , There are two parts here : Ductile system and ductile structure . Resilient systems have redundancy 、 Extensibility 、 Immutable infrastructure 、 Stateless application 、 Avoid cascading failures . Resilient organizations include efficient delivery 、 Failure plan 、 Emergency response mechanism . Highly resilient systems can also have unexpected failures , So the tough organization can make up for the missing part of the tough system , Building the ultimate resilience architecture through chaos Engineering .

Chaos engineering is a way of injecting faults actively , Identify weaknesses in the system ahead of time , Promote architecture improvement , Ultimately, business resilience . For people with different functions, chaos engineering is introduced , Its business value is different :

How to implement chaos Engineering
For enterprises or businesses, how to implement chaos Engineering ? Is there any tool or platform that can help it land quickly ?
ChaosBlade It is a tool of chaos experiment execution that follows chaos experiment model , It has high scene richness , Simple and easy to use , Support for multiple platforms 、 Multilingual environment , Include Linux、Kubernetes and Docker platform , Support Java、NodeJS、C++、Golang Language application . Support 200 Multiple scenes ,3000 Multiple parameters . It is a fault injection tool for the end side , But when the business is launched , There will be the following problems :
So in ChaosBlade It also needs a platform layer , Manage and rehearse the implementation tools of chaotic Engineering .

ChaosBlade-Box It is oriented to multiple clusters 、 Multilingual 、 Multiple environments , Open source cloud native chaos engineering console .
The overall architecture of the open source platform and injection tool is as follows , It mainly includes several constituent modules :

new edition ChaosBlade-Box The platform is a multi cluster oriented 、 Multiple environments 、 Multi language cloud native chaos engineering platform . Support international Chinese English switching , Support global namespace , So that the same user can according to their own needs , Set different global namespaces , Such as : Test space 、 Sandbox space and online space . Provide automated tool deployment , Simplify tool installation steps , Improve execution efficiency . The platform supports probe installation and drilling in different environments , Such as host and Kubernetes, among Kubernetes Support in the environment Node、Pod、Container Drill under dimension . stay Kubernetes In the environment, it will automatically collect Pod related data , And unified management in application management , This simplifies the user's query steps , There is no need to go to the cluster to view the applications to be rehearsed Pod Name or Container name . And support one click migration to enterprise , Synchronize the drill data of the community version to the enterprise version on demand .




The following is in the new version ChaosBlade-Box The whole process of a drill on the platform , Support sequential execution 、 Stage performs two process choreography , Sequential execution means that multiple drill scenarios take effect in turn , The value of stage execution is that multiple drill scenarios take effect at the same time . Ensure that the drill is restored through a variety of security strategies , Such as manual punishment and automatic stop , Automatic stop is configured by setting the timeout parameter during the drill configuration , So even if the platform and probe (Agent) Out of contact , When manual stop is not possible , Also when the timeout time arrives , Automatic recovery of faults .


What are the advantages of the new version
Compared with the old version , The front-end interface is unified with the enterprise version , Simplify the switching cost of using habits , More perfect international Chinese English switching , And support global namespace switching ; The back end provides a smoother rehearsal , Perfect application management , And strengthened the control of the probe , And support one click migration to enterprise ; The function of the probe is strengthened , It provides a more perfect API, It supports multi environment deployment and can be used as a drill channel in different environments , Support automatic installation and uninstallation , And collect and report data to simplify the drill .

Related links
Middleware developer conference address ( Speech PDF Downloadable ):
https://developer.aliyun.com/topic/middleware/developer/summit
MSE First purchase of professional edition of registration configuration center 9 A discount ,MSE Full specification of cloud native gateway prepaid 85 A discount .
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/182/202207011833206461.html