当前位置:网站首页>Disaster recovery series (IV) - disaster recovery construction of business application layer

Disaster recovery series (IV) - disaster recovery construction of business application layer

2022-06-24 04:46:00 Kaiyuan

The business application layer is the traffic hub of the whole system , The core business has a single point or weak self-healing ability , Will seriously affect the business stability . for example , Core business modules and non core business modules are highly coupled , In terms of resource cost , In fact, not all businesses need disaster recovery , Labor costs need to be added to transform the business ; For delay sensitive services , Cannot accept cross region delay , More manpower is needed to transform the architecture and business . in summary , This paper expounds the application layer business disaster recovery construction from the perspective of cloud platform , It is mainly divided into the latitude considered in the scheme design 、 Complexity and customer cases on the cloud .

1. Overview of application disaster recovery

1.1 Application deployment

  • Whether the application meets the requirements of cross region / Zone deployment ?

Whether the application layer call chain can accept cross region delay , If the business cannot accept cross region , This business can only do disaster recovery set Chemical deployment , A strong middleware team is needed to develop a data synchronization system .

The application layer call chain can accept cross region delay , Generally, the pilot business is observed first , Build disaster recovery capability step by step in small steps . From experience , It is difficult for the business party to tolerate accurate delay , In the business type, there may be cross region situations only when data is received , Here are two suggestions :

1) Application internal call chain flow is chimney type , Complete in the same zone .

2) Application layer data read / write , It can accept cross region writing at most , Read nearby mode .

  • The scale of application deployment in different availability zones ?

The application layer can accept cross region delay , But acceptance , The customer business on the cloud varies .

1) The business can fully accept cross region delay , Different availability zone application deployment scales (1:1), Each bearing 50% Business flow of ;

2) The business cannot fully accept cross region delay , Some compromises have been made for disaster recovery , The scale of business deployment in the two availability zones (5:1), The main services are hosted in the primary availability zone , Ensure that most traffic is not accessed across regions , A small part of the flow is only for hot standby , When it breaks down , Perform temporary capacity expansion to restore business . Pay special attention to the requirements of this kind of business for rapid business expansion and resource reserve .

  • Where to start with complex business applications ?

It takes a lot of manpower to sort out all businesses , There may also be omissions ; It is suggested that there are several principles : From the known to the unknown , From simple to complex , The iteration from gray scale to full scale effectively combs the business .

1.2 Application scheduling

Application scheduling is divided into north-south and east-west flow scheduling . The main consideration here is if the business is abnormal , How to quickly schedule to restore business .

  • North South flow .

1) Access layer traffic , From CLB/nignix/ingress etc. . Access layer load balancing or nignix Flow through 443/80 Ports introduce traffic to the application layer .

2) Middle layer flow ,redis/ckafka/es etc. . General situation , The application layer generates data through dns/ The registry finds the corresponding middleware , Write or read data .

3) Data side traffic , database mysql/tdsql etc. . General situation , The application layer generates data through dns/ The registry finds the corresponding middleware , Write or read data .

  • East West flow . It is mainly the call chain between applications .

Usually, calls between application layers pass through api gateway / Intranet dns/ Registry to make mutual calls .

  • system stability . Mainly the stability of dispatching system and configuration system .

Disaster tolerant switching strongly depends on the stability of the dispatching system and the configuration system . Here, stability mainly includes system disaster tolerance capability and performance ; Large scale failure is encountered , A large number of information configuration change requests the dispatching system and configuration system to be able to withstand the flood peak , It is the foundation of this disaster recovery plan .

2. Application disaster recovery complexity

Calculate the application layer disaster tolerance , Mainly consider the following two aspects :

  • Which nodes perform tasks .

Here, it is necessary to distinguish which nodes execute the core business , Different complexities will be introduced here . for example :

The first kind of scene : Application layer services are hosted on a single machine , If the machine is abnormal , Directly affect the business , Business recovery strongly depends on the machine recovery capability .

The second kind of scene : Application layer services are hosted on multiple machines , To reduce the strong dependence on machines ; So add load balancing clusters , Real time perception of machine conditions through health check . But increase the complexity of load balancing stability , Business recovery strongly depends on the stability of load balancing .

The third kind of scene : In order to improve the capacity expansion capability of the application layer , At the same time, reduce the dependence on load balancing clusters , The application layer adopts dual availability zone deployment to further improve business stability .

Node complexity
  • Task execution exception , How to deal with it

There are usually two kinds of situations : Application layer task execution exception , Re launch the request ; If there is a task manager , Reissue the failed Queue task .

Different processing methods will affect the business exception recovery scheme . If it is a relaunch , No special treatment is required , If the task manager , You need to resend the task to recover .

3. Classic cases on the cloud

3.1 Application layer traffic ( A single application is hosted by a single zone 50%)

The application layer call chain should be completed in the same zone , Try to avoid cross zone calls , Write to the database , Read nearby .

Double living in the same city

3.2 Application layer business disaster recovery ( A single application is hosted by a single zone 100%)

The application layer call chain should be completed in the same zone ; Traffic between different regions is only used to synchronize data . See the following scenario :

1) Resource deployment : Several game services are available in Guangzhou / Shanghai and Shanghai will build one set respectively , Different game services pass through dns Resolve to different regions .

2) Business flow : Usually , A single game service traffic is hosted in a single region , Another area is cold standby .

3) Data reading and writing : database redis/cdb Both reading and writing are in a single area , Bi directional synchronization of databases between regions . Currently Tencent cloud dts Has supported mysql Bidirectional synchronization function , For details, see https://cloud.tencent.com/document/product/571/59386.

ps: Tencent cloud redis Bidirectional synchronization function is expected 11 Monthly online , Please wait .

Remote disaster recovery
原网站

版权声明
本文为[Kaiyuan]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/09/20210904162703054g.html