当前位置:网站首页>How to perform disaster recovery and recovery for kubernetes cluster? (22)
How to perform disaster recovery and recovery for kubernetes cluster? (22)
2022-06-12 21:56:00 【wzlinux】
Kubernetes Hides all the complex details of container choreography , Let's focus on the application itself , There is no need to pay too much attention to how to deploy and maintain . Besides ,Kubernetes Multiple copies are also supported , It can guarantee the high availability of our business . For the cluster itself , We also need to ensure its high availability , You can refer to the official documents : utilize Kubeadm To create a highly available cluster .
But these are not enough to let us rest easy , because Kubernetes While helping us choreograph the scheduling container , Many key data are often saved , For example, the cluster's own key data 、 secret key 、 Business configuration information 、 Business data, etc . We are using Kubernetes When , It is very necessary to perform disaster recovery , Prevent operational errors ( For example, large-scale non deletion )、 Natural disasters 、 Disk damage cannot be repaired 、 Network anomalies 、 Data loss caused by power failure in the computer room , In severe cases, the entire cluster may even become unavailable .
So it's using Kubernetes When , We'd better do a disaster recovery to facilitate the recovery of the cluster , Rollback to an earlier stable state .
Kubernetes What needs to be backed up
In the face of Kubernetes Before the cluster is backed up , We first need to know what to back up .
We start from the whole Kubernetes The starting point is the architecture of , Take a look at the components of the entire cluster :
As can be seen from the above figure , Whole Kubernetes Clusters can be divided into Master node ( left ) and Node node ( On the right side ).
stay Master Node , We are running Etcd Cluster and Kubernetes Several major components of the control surface , such as kube-apiserver、kube-controller-manager、kube-scheduler and cloud-controller-manager( Optional ) etc. .
In these components , except Etcd, Others are stateless services . Just promise Etcd The data is normal , No matter what happens to the other components , We can solve this problem by restarting or creating new instances , Will not be affected in any way . So we Just backup Etcd Data in .
It's over Master node , Let's see Node node .
Node Running on the node kubelet、kube-proxy Etc .Kubelet Responsible for maintaining each container instance , And the storage used by the container . To ensure the persistent storage of data , For key business critical data , I suggest that it be passed PV(Persistent Volume) To save and use . In view of this , We Also need to PV Make a backup .
If there is a problem with the node , We can add new nodes to the cluster , Replace the faulty node .
After watching Kubernetes After the official architecture of , Let's take a look at how to back up Etcd Data in and PV.
Yes Etcd Data backup and recovery
Etcd The government also provided Backed up documents , If you are interested, you can read . Here I have summarized some practical operations , So that you can use it for reference and conduct manual backup and recovery . Some certificate paths in the command line and endpoint The address needs to be changed according to the cluster parameters . The actual operation code is as follows :
These backups , You need to run the command line manually . If your Etcd The cluster is running on Kubernetes In the cluster , You can use the following timing Job (CronJob) To help you automate 、 periodic ( As follows YAML The file will be updated every minute Etcd Make a backup ) Local backup Etcd The data of . About CronJob Partial content , We will introduce it in a separate chapter later . The automatic backup code is as follows :
Yes PV Backup your data
about PV Speaking of , Backup is troublesome .Kubernetes It does not provide storage capacity , It relies on various storage plug-ins to manage and use storage . So for stored backup operations , In especial PV Backup operations for , We need to rely on the... Of various cloud providers API To do it snapshot.
But the above for Etcd and PV The backup operation of is not very convenient , I recommend that you pass Velero To backup Kubernetes.Velero Powerful , But it's easy to operate , It can help you do the following 3 spot :
- Yes Kubernets Cluster backup and recovery .
- Migrate the cluster .
- Copy the configuration and objects of the cluster , For example, copy to other development and test clusters .
and Velero Also available for individual Namespace The ability to back up , If you only want to back up some key business and data , This is a very convenient function .
Said so much , Let's have a look Velero How to back up Kubernetes Of .
Use Velero Yes Kubernetes Make a backup
This is a Velero The architecture of the figure :
Velero It's made up of two parts :
- A command line client , You can run locally , Through the command line to complete the Etcd as well as PV Backup operations for ; You can use it as well kubectl operation Kubernetes Back up as a resource Kubernetes.
- One runs on kubernetes Services in the cluster (BackupController), Responsible for performing specific backup and recovery operations .
Let's take a look at the specific process :
- Via local Velero The client sends a backup command , Like in the picture
velero backup create test-project-s2i --include-namespaces test
, This command will send to APIServer Create a Backup object . - BackupController Will monitor and verify this Backup The legitimacy of the object , For example, the definition of parameters .
- BackupController Through to the APIServer Query the relevant data and start the backup work .
- BackupController Back up the queried data to the remote object store .
Velero stay Kubernetes A lot of CRD (Custome Resource Definition) And related controllers , Through these operations, such as backup and recovery . therefore , Backup and recovery of the cluster , In essence, it is related to these CRD The operation of .BackupController Will be based on CRD To determine what to do .
Velero Supports two kinds of back-end storage CRD, Namely BackupStorageLocation and VolumeSnapshotLocation.
- BackupStorageLocation Mainly used to define Kubernetes Data storage location of cluster resources , Cluster object data , instead of PVC and PV The data of . You can get from this Support List Find the current official and third-party supported back-end storage services , Mainly to support S3 Compatible storage is primary , such as AWS S3、 Alibaba cloud OSS、Minio etc. .
- VolumeSnapshotLocation Mainly for PV Take a snapshot , The snapshot function is usually provided by Amazon EBS Volumes、Azure Managed Disks、Google Persistent Disks And so on , You can choose to use the services of various cloud vendors according to your needs . Or you use a special backup tool Restic, hold PV Data backup to Azure Files、 Alibaba cloud OSS In the middle . Alibaba cloud has provided be based on Velero Plug in for .
besides ,BackupController In the course of work , Other... Will also be created CRD, It is mainly used for internal logic processing . You can refer to Alibaba cloud file Further study .
If you don't have Alibaba cloud OSS, Or the cluster is an offline internal cluster , You can also build it yourself Minio, As an object storage service to replace Alibaba cloud OSS. You can refer to the official file Carry out detailed installation and configuration .
Summary
In a distributed world , It's hard for us to guarantee that everything is safe . When you are in Kubernetes When more and more businesses are deployed in the cluster , Disaster recovery for clusters and data is very necessary . In this year 7 month , Our common code hosting platform Github It happened Kubernetes fault , It leads to continuous 4 A serious breakdown of half an hour . therefore , I suggest that for critical business data , Remember to back up frequently .
Welcome to scan the code to pay attention to , For more information
边栏推荐
- Okio source code analysis
- jsonUtils
- Lambda expression and flow optimization code
- SQL tuning guide notes 17:importing and exporting optimizer statistics
- [qnx hypervisor 2.2 manuel de l'utilisateur] 4.2 environnement de construction pris en charge
- 2021 rust survey results released: 9354 questionnaires collected
- 六月集训(第12天) —— 链表
- SQL调优指南笔记13:Gathering Optimizer Statistics
- 建立高可用的数据库
- SQL调优指南笔记17:Importing and Exporting Optimizer Statistics
猜你喜欢
图灵奖得主:想要在学术生涯中获得成功,需要注意哪些问题?
Icml2022 | Galaxy: apprentissage actif des cartes de polarisation
PCB封装下载网站推荐及其详细使用方法
“Oracle数据库并行执行”技术白皮书读书笔记
SQL调优指南笔记18:Analyzing Statistics Using Optimizer Statistics Advisor
Ansible playbook and variable (II)
多线程模型下的生产者消费者模式
NiO User Guide
SQL调优指南笔记9:Joins
SQL tuning guide notes 13:gathering optimizer statistics
随机推荐
OceanBase 社区版 OCP 功能解读
MySQL体系结构及基础管理(二)
SQL tuning guide notes 8:optimizer access paths
【QNX Hypervisor 2.2 用戶手册】4.2 支持的構建環境
A puzzle about + =
Unity 常用3D数学计算
selenium操作元素遇到的异常
DRF receives nested data and creates objects. Solution: DRF not NULL constraint failed
[Jianzhi offer simple] Jianzhi offer 06 Print linked list from end to end
ICML2022 | GALAXY:極化圖主動學習
建立高可用的数据库
leetcodeSQL:574. Elected
How to implement a simple publish subscribe mode
Ansible playbook和变量(二)
SQL调优指南笔记12:Configuring Options for Optimizer Statistics Gathering
Jin AI her power | impact tech, she can
Exception encountered by selenium operation element
User guide for JUC concurrency Toolkit
MySQL介绍和安装(一)
图灵奖得主:想要在学术生涯中获得成功,需要注意哪些问题?