当前位置:网站首页>How to perform disaster recovery and recovery for kubernetes cluster? (22)
How to perform disaster recovery and recovery for kubernetes cluster? (22)
2022-06-12 21:56:00 【wzlinux】
Kubernetes Hides all the complex details of container choreography , Let's focus on the application itself , There is no need to pay too much attention to how to deploy and maintain . Besides ,Kubernetes Multiple copies are also supported , It can guarantee the high availability of our business . For the cluster itself , We also need to ensure its high availability , You can refer to the official documents : utilize Kubeadm To create a highly available cluster .
But these are not enough to let us rest easy , because Kubernetes While helping us choreograph the scheduling container , Many key data are often saved , For example, the cluster's own key data 、 secret key 、 Business configuration information 、 Business data, etc . We are using Kubernetes When , It is very necessary to perform disaster recovery , Prevent operational errors ( For example, large-scale non deletion )、 Natural disasters 、 Disk damage cannot be repaired 、 Network anomalies 、 Data loss caused by power failure in the computer room , In severe cases, the entire cluster may even become unavailable .
So it's using Kubernetes When , We'd better do a disaster recovery to facilitate the recovery of the cluster , Rollback to an earlier stable state .
Kubernetes What needs to be backed up
In the face of Kubernetes Before the cluster is backed up , We first need to know what to back up .
We start from the whole Kubernetes The starting point is the architecture of , Take a look at the components of the entire cluster :

As can be seen from the above figure , Whole Kubernetes Clusters can be divided into Master node ( left ) and Node node ( On the right side ).
stay Master Node , We are running Etcd Cluster and Kubernetes Several major components of the control surface , such as kube-apiserver、kube-controller-manager、kube-scheduler and cloud-controller-manager( Optional ) etc. .
In these components , except Etcd, Others are stateless services . Just promise Etcd The data is normal , No matter what happens to the other components , We can solve this problem by restarting or creating new instances , Will not be affected in any way . So we Just backup Etcd Data in .
It's over Master node , Let's see Node node .
Node Running on the node kubelet、kube-proxy Etc .Kubelet Responsible for maintaining each container instance , And the storage used by the container . To ensure the persistent storage of data , For key business critical data , I suggest that it be passed PV(Persistent Volume) To save and use . In view of this , We Also need to PV Make a backup .
If there is a problem with the node , We can add new nodes to the cluster , Replace the faulty node .
After watching Kubernetes After the official architecture of , Let's take a look at how to back up Etcd Data in and PV.
Yes Etcd Data backup and recovery
Etcd The government also provided Backed up documents , If you are interested, you can read . Here I have summarized some practical operations , So that you can use it for reference and conduct manual backup and recovery . Some certificate paths in the command line and endpoint The address needs to be changed according to the cluster parameters . The actual operation code is as follows :
These backups , You need to run the command line manually . If your Etcd The cluster is running on Kubernetes In the cluster , You can use the following timing Job (CronJob) To help you automate 、 periodic ( As follows YAML The file will be updated every minute Etcd Make a backup ) Local backup Etcd The data of . About CronJob Partial content , We will introduce it in a separate chapter later . The automatic backup code is as follows :
Yes PV Backup your data
about PV Speaking of , Backup is troublesome .Kubernetes It does not provide storage capacity , It relies on various storage plug-ins to manage and use storage . So for stored backup operations , In especial PV Backup operations for , We need to rely on the... Of various cloud providers API To do it snapshot.
But the above for Etcd and PV The backup operation of is not very convenient , I recommend that you pass Velero To backup Kubernetes.Velero Powerful , But it's easy to operate , It can help you do the following 3 spot :
- Yes Kubernets Cluster backup and recovery .
- Migrate the cluster .
- Copy the configuration and objects of the cluster , For example, copy to other development and test clusters .
and Velero Also available for individual Namespace The ability to back up , If you only want to back up some key business and data , This is a very convenient function .
Said so much , Let's have a look Velero How to back up Kubernetes Of .
Use Velero Yes Kubernetes Make a backup
This is a Velero The architecture of the figure :

Velero It's made up of two parts :
- A command line client , You can run locally , Through the command line to complete the Etcd as well as PV Backup operations for ; You can use it as well kubectl operation Kubernetes Back up as a resource Kubernetes.
- One runs on kubernetes Services in the cluster (BackupController), Responsible for performing specific backup and recovery operations .
Let's take a look at the specific process :
- Via local Velero The client sends a backup command , Like in the picture
velero backup create test-project-s2i --include-namespaces test, This command will send to APIServer Create a Backup object . - BackupController Will monitor and verify this Backup The legitimacy of the object , For example, the definition of parameters .
- BackupController Through to the APIServer Query the relevant data and start the backup work .
- BackupController Back up the queried data to the remote object store .
Velero stay Kubernetes A lot of CRD (Custome Resource Definition) And related controllers , Through these operations, such as backup and recovery . therefore , Backup and recovery of the cluster , In essence, it is related to these CRD The operation of .BackupController Will be based on CRD To determine what to do .
Velero Supports two kinds of back-end storage CRD, Namely BackupStorageLocation and VolumeSnapshotLocation.
- BackupStorageLocation Mainly used to define Kubernetes Data storage location of cluster resources , Cluster object data , instead of PVC and PV The data of . You can get from this Support List Find the current official and third-party supported back-end storage services , Mainly to support S3 Compatible storage is primary , such as AWS S3、 Alibaba cloud OSS、Minio etc. .
- VolumeSnapshotLocation Mainly for PV Take a snapshot , The snapshot function is usually provided by Amazon EBS Volumes、Azure Managed Disks、Google Persistent Disks And so on , You can choose to use the services of various cloud vendors according to your needs . Or you use a special backup tool Restic, hold PV Data backup to Azure Files、 Alibaba cloud OSS In the middle . Alibaba cloud has provided be based on Velero Plug in for .
besides ,BackupController In the course of work , Other... Will also be created CRD, It is mainly used for internal logic processing . You can refer to Alibaba cloud file Further study .
If you don't have Alibaba cloud OSS, Or the cluster is an offline internal cluster , You can also build it yourself Minio, As an object storage service to replace Alibaba cloud OSS. You can refer to the official file Carry out detailed installation and configuration .
Summary
In a distributed world , It's hard for us to guarantee that everything is safe . When you are in Kubernetes When more and more businesses are deployed in the cluster , Disaster recovery for clusters and data is very necessary . In this year 7 month , Our common code hosting platform Github It happened Kubernetes fault , It leads to continuous 4 A serious breakdown of half an hour . therefore , I suggest that for critical business data , Remember to back up frequently .
Welcome to scan the code to pay attention to , For more information

边栏推荐
- Digraph deep copy
- How to ensure thread safety?
- NiO User Guide
- Vagrantbox reinstalling the vboxsf driver
- [medium] 78 Subset (backtracking shall be supplemented later)
- Ansible playbook and ansible roles (III)
- ICML2022 | GALAXY:極化圖主動學習
- Jdbctemplate inserts and returns the primary key
- 回文链表及链表相交问题(和心怡的人相交)你真的会了吗?
- Oracle数据库中查询执行计划的权限
猜你喜欢

建立高可用的数据库
![[target detection] |dive detector into box for object detection new training method based on fcos](/img/ac/c54c2733dceffea086b772f35f128a.png)
[target detection] |dive detector into box for object detection new training method based on fcos

Cloning PDB with ADG standby

How to write a vscode plug-in by yourself to realize plug-in freedom!

You can move forward or backward. This function in idea is amazing!

Icml2022 | Galaxy: apprentissage actif des cartes de polarisation

Leetcode: the maximum number of building change requests that can be reached (if you see the amount of data, you should be mindless)

Producer consumer model under multithreading model

Recommended Chinese font in the code input box of Oracle SQL developer

Have you really learned the common ancestor problem recently?
随机推荐
[Jianzhi offer] Jianzhi offer 09 Implementing queues with two stacks
[QNX hypervisor 2.2 user manual] 4.4 build host
ORM implements the mapping relationship between classes and tables, class attributes and fields
What is your understanding of thread priority?
CVPR 2022 | 应对噪声标签,西安大略大学、字节跳动等提出对比正则化方法
SQL调优指南笔记8:Optimizer Access Paths
多线程模型下的生产者消费者模式
selenium操作元素遇到的异常
How to abstract a problem into a 0-1 knapsack problem in dynamic programming
Recommended Chinese font in the code input box of Oracle SQL developer
How do complex systems detect anomalies? North Carolina UNCC and others' latest overview of graph based deep learning anomaly detection methods in complex distributed systems describes the latest prog
SQL调优指南笔记16:Managing Historical Optimizer Statistics
Lambda expression and flow optimization code
DRF receives nested data and creates objects. Solution: DRF not NULL constraint failed
Kdd2022 | graphmae: self supervised mask map self encoder
建立高可用的数据库
Design and practice of Hudi bucket index in byte skipping
SQL tuning guide notes 13:gathering optimizer statistics
[simple] 155 Minimum stack
Graphics2d class basic use