当前位置：网站首页>Use br to back up tidb cluster data to S3 compatible storage

Use br to back up tidb cluster data to S3 compatible storage

2022-07-06 08:02:00 【Tianxiang shop】

This article describes how to run on AWS Kubernetes In the environment TiDB The cluster data is backed up to AWS On the storage of .

The backup method used in this article is based on TiDB Operator Of Custom Resource Definition(CRD) Realization , Bottom use BR Get cluster data , Then upload the data to AWS On the storage of .BR Its full name is Backup & Restore, yes TiDB Command line tools for distributed backup and recovery , Used to deal with TiDB Cluster for data backup and recovery .

Use scenarios

If you have the following requirements for data backup , Consider using BR take TiDB Cluster data in Ad-hoc Backup or Scheduled full backup Backup to compatible S3 On the storage of ：

The amount of data that needs to be backed up is large （ Greater than 1 TB）, And it requires faster backup
Need to backup data directly SST file （ Key value pair ）

If there are other backup requirements , Please refer to Introduction to backup and recovery Choose the right backup method .

Be careful

BR Only support TiDB v3.1 And above .
Use BR The data backed up can only be restored to TiDB In the database , Cannot recover to other databases .

Ad-hoc Backup

Ad-hoc Backup supports full backup and incremental backup .

To carry out Ad-hoc Backup , You need to create a custom Backup custom resource (CR) Object to describe this backup . Create good Backup After the object ,TiDB Operator Automatically complete the specific backup process according to this object . If an error occurs during the backup , The program will not automatically retry , At this time, it needs to be handled manually .

This article assumes that for deployment in Kubernetes test1 This namespace Medium TiDB colony demo1 Data backup . The following is the specific operation process .

The first 1 Step ： Get ready Ad-hoc Backup environment

Download the file backup-rbac.yaml, And execute the following command in test1 This namespace To create a backup RBAC Related resources ：
kubectl apply -f backup-rbac.yaml -n test1
Grant remote storage access .
- If you use Amazon S3 To back up the cluster , There are three ways to grant permissions , Refer to the documentation AWS Account Authorization .
- If you use other compatible S3 To back up the cluster , for example Ceph、MinIO, have access to AccessKey and SecretKey The way to delegate , Refer to the documentation adopt AccessKey and SecretKey to grant authorization .
If you use it TiDB Version below v4.0.8, You also need to complete the following steps . If you use it TiDB by v4.0.8 And above , Please skip these steps .
1. Make sure you have a backup database mysql.tidb Tabular SELECT and UPDATE jurisdiction , Used to adjust before and after backup GC Time .
2. establish backup-demo1-tidb-secret secret For storing access TiDB Password corresponding to the user of the cluster .
  kubectl create secret generic backup-demo1-tidb-secret --from-literal=password=${password} --namespace=test1

The first 2 Step ： Back up data to compatible S3 The storage

According to the remote storage access authorization method selected in the previous step , You need to use the corresponding method below to export data to compatible S3 On the storage of ：

Method 1： If it passes accessKey and secretKey Method of authorization , You can create Backup CR Backup cluster data :
kubectl apply -f backup-aws-s3.yaml
backup-aws-s3.yaml The contents of the document are as follows ：
--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: test1 spec: backupType: full br: cluster: demo1 clusterNamespace: test1 # logLevel: info # statusAddr: ${status_addr} # concurrency: 4 # rateLimit: 0 # timeAgo: ${time} # checksum: true # sendCredToTikv: true # options: # - --lastbackupts=420134118382108673 # Only needed for TiDB Operator < v1.1.10 or TiDB < v4.0.8 from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws secretName: s3-secret region: us-west-1 bucket: my-bucket prefix: my-folder
Method 2： If it passes IAM binding Pod Method of authorization , You can create Backup CR Backup cluster data :
kubectl apply -f backup-aws-s3.yaml
backup-aws-s3.yaml The contents of the document are as follows ：
--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: test1 annotations: iam.amazonaws.com/role: arn:aws:iam::123456789012:role/user spec: backupType: full br: cluster: demo1 sendCredToTikv: false clusterNamespace: test1 # logLevel: info # statusAddr: ${status_addr} # concurrency: 4 # rateLimit: 0 # timeAgo: ${time} # checksum: true # options: # - --lastbackupts=420134118382108673 # Only needed for TiDB Operator < v1.1.10 or TiDB < v4.0.8 from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: us-west-1 bucket: my-bucket prefix: my-folder
Method 3： If it passes IAM binding ServiceAccount Method of authorization , You can create Backup CR Backup cluster data :
kubectl apply -f backup-aws-s3.yaml
backup-aws-s3.yaml The contents of the document are as follows ：
--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: test1 spec: backupType: full serviceAccount: tidb-backup-manager br: cluster: demo1 sendCredToTikv: false clusterNamespace: test1 # logLevel: info # statusAddr: ${status_addr} # concurrency: 4 # rateLimit: 0 # timeAgo: ${time} # checksum: true # options: # - --lastbackupts=420134118382108673 # Only needed for TiDB Operator < v1.1.10 or TiDB < v4.0.8 from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: us-west-1 bucket: my-bucket prefix: my-folder

In the configuration backup-aws-s3.yaml When you file , Please refer to the following information ：

since TiDB Operator v1.1.6 Since version , If you need incremental backup , Only need spec.br.options Specifies the last backup timestamp --lastbackupts that will do . Restrictions on incremental backups , May refer to Use BR Back up and restore .
Amazon S3 Of acl、endpoint、storageClass Configuration items can be omitted . compatible S3 Storage related configuration , Please refer to S3 Storage field introduction .
.spec.br Some parameters in are optional , for example logLevel、statusAddr etc. . complete .spec.br Detailed explanation of fields , Please refer to BR Field is introduced .
If you use it TiDB by v4.0.8 And above , BR Will automatically adjust tikv_gc_life_time Parameters , No configuration required spec.tikvGCLifeTime and spec.from Field .
more Backup CR For detailed explanation of fields, please refer to Backup CR Field is introduced .

Create good Backup CR after ,TiDB Operator Will be based on Backup CR Automatically start backup . You can check the backup status through the following commands ：

kubectl get bk -n test1 -o wide

Backup example

Back up all cluster data

Backing up data from a single database

Back up the data of a single table

Use the table library filtering function to back up the data of multiple tables

Scheduled full backup

You can set the backup policy to TiDB The cluster performs scheduled backup , At the same time, set the retention policy of backup to avoid too many backups . Scheduled full backup through customized BackupSchedule CR Object to describe . A full backup will be triggered every time the backup time point , The bottom layer of scheduled full backup passes Ad-hoc Full backup .

The first 1 Step ： Prepare a scheduled full backup environment

Same as Get ready Ad-hoc Backup environment .

The first 2 Step ： Regularly back up data to compatible S3 The storage

Basis preparation Ad-hoc The remote storage access authorization method selected when backing up the environment , You need to use the corresponding method below to regularly back up data to Amazon S3 On storage ：

Method 1： If it passes accessKey and secretKey Method of authorization , You can create BackupSchedule CR, Turn on TiDB Cluster regular full backup ：
kubectl apply -f backup-scheduler-aws-s3.yaml
backup-scheduler-aws-s3.yaml The contents of the document are as follows ：
--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-s3 namespace: test1 spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" backupTemplate: backupType: full br: cluster: demo1 clusterNamespace: test1 # logLevel: info # statusAddr: ${status_addr} # concurrency: 4 # rateLimit: 0 # timeAgo: ${time} # checksum: true # sendCredToTikv: true # Only needed for TiDB Operator < v1.1.10 or TiDB < v4.0.8 from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws secretName: s3-secret region: us-west-1 bucket: my-bucket prefix: my-folder
Method 2： If it passes IAM binding Pod Method of authorization , You can create BackupSchedule CR, Turn on TiDB Cluster regular full backup ：
kubectl apply -f backup-scheduler-aws-s3.yaml
backup-scheduler-aws-s3.yaml The contents of the document are as follows ：
--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-s3 namespace: test1 annotations: iam.amazonaws.com/role: arn:aws:iam::123456789012:role/user spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" backupTemplate: backupType: full br: cluster: demo1 sendCredToTikv: false clusterNamespace: test1 # logLevel: info # statusAddr: ${status_addr} # concurrency: 4 # rateLimit: 0 # timeAgo: ${time} # checksum: true # Only needed for TiDB Operator < v1.1.10 or TiDB < v4.0.8 from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: us-west-1 bucket: my-bucket prefix: my-folder
Method 3： If it passes IAM binding ServiceAccount Method of authorization , You can create BackupSchedule CR, Turn on TiDB Cluster regular full backup ：
kubectl apply -f backup-scheduler-aws-s3.yaml
backup-scheduler-aws-s3.yaml The contents of the document are as follows ：
--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-s3 namespace: test1 spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" serviceAccount: tidb-backup-manager backupTemplate: backupType: full br: cluster: demo1 sendCredToTikv: false clusterNamespace: test1 # logLevel: info # statusAddr: ${status_addr} # concurrency: 4 # rateLimit: 0 # timeAgo: ${time} # checksum: true # Only needed for TiDB Operator < v1.1.10 or TiDB < v4.0.8 from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: us-west-1 bucket: my-bucket prefix: my-folder

From above backup-scheduler-aws-s3.yaml The file configuration example shows ,backupSchedule The configuration of consists of two parts . Part of it is backupSchedule Unique configuration , The other part is backupTemplate.

About backupSchedule Specific introduction to unique configuration items , Please refer to BackupSchedule CR Field is introduced .
backupTemplate Used to specify the configuration related to cluster and remote storage , Fields and Backup CR Medium spec equally , Please refer to Backup CR Field is introduced .

After the scheduled full backup is created , You can view the status of scheduled full backup through the following commands ：

kubectl get bks -n test1 -o wide

Check all the backup pieces below the scheduled full backup ：

kubectl get bk -l tidb.pingcap.com/backup-schedule=demo1-backup-schedule-s3 -n test1