当前位置:网站首页>Use dumping to back up tidb cluster data to S3 compatible storage
Use dumping to back up tidb cluster data to S3 compatible storage
2022-07-06 08:03:00 【Tianxiang shop】
This document describes how to put Kubernetes On TiDB The data of the cluster is backed up to compatible S3 On the storage of . In this document “ Backup ”, All refer to full backup ( namely Ad-hoc Full backup and scheduled full backup ).
The backup method described in this document is based on TiDB Operator(v1.1 And above ) Of CustomResourceDefinition (CRD) Realization , Bottom use Dumpling The tool obtains the logical backup of the cluster , Then upload the backup data to compatible S3 On the storage of .
Dumpling Is a data export tool , The tool can store in TiDB/MySQL The data in is exported as SQL perhaps CSV Format , It can be used to complete logical full backup or export .
Use scenarios
If you need to TiDB Cluster data in Ad-hoc Full volume backup or Scheduled full backup Backup to compatible S3 On the storage of , And there are the following requirements for data backup , Consider the backup scheme introduced in this article :
- export SQL or CSV Formatted data
- For single SQL Limit the memory of the statement
- export TiDB Snapshot of historical data
Ad-hoc Full volume backup
Ad-hoc Full backup by creating a custom Backup
custom resource (CR) Object to describe a backup .TiDB Operator According to this Backup
Object to complete the specific backup process . If an error occurs during the backup , The program will not automatically retry , At this time, it needs to be handled manually .
Current compatibility S3 In storage ,Ceph and Amazon S3 It can work normally after testing . The following provides how to TiDB The data of the cluster is backed up to Ceph and Amazon S3 Examples of these two types of storage . The example assumes that the pair is deployed in Kubernetes tidb-cluster
This namespace Medium TiDB colony demo1
Data backup , The following is the specific operation process .
precondition
Use Dumpling Backup TiDB Cluster data to S3 front , Make sure you have the following permissions to back up the database :
mysql.tidb
TabularSELECT
andUPDATE
jurisdiction : Before and after backup ,Backup CR You need a database account with this permission , Used to adjust GC Time .- Global permissions :
SELECT
、RELOAD
、LOCK TABLES
、 andREPLICATION CLIENT
.
The following is an example of how to create a backup user :
CREATE USER 'backup'@'%' IDENTIFIED BY '...'; GRANT SELECT, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'backup'@'%'; GRANT UPDATE, SELECT ON mysql.tidb TO 'backup'@'%';
The first 1 Step :Ad-hoc Full backup environment preparation
Execute the following command , according to backup-rbac.yaml stay
tidb-cluster
Namespace creates role-based access control (RBAC) resources .kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.6/manifests/backup/backup-rbac.yaml -n tidb-cluster
Remote storage access authorization .
If you use Amazon S3 To back up the cluster , There are three ways to grant permissions , Reference resources AWS Account Authorization Authorized access is compatible S3 Remote storage of ; Use Ceph When testing backup as back-end storage , It's through AccessKey and SecretKey Mode Authorization , Please refer to adopt AccessKey and SecretKey to grant authorization .
establish
backup-demo1-tidb-secret
secret. The secret Store for access TiDB Clustered root Account and key .kubectl create secret generic backup-demo1-tidb-secret --from-literal=password=${password} --namespace=tidb-cluster
The first 2 Step : Back up data to compatible S3 The storage
Be careful
because
rclone
There is problem , If you use Amazon S3 Store backup , also Amazon S3 Open theAWS-KMS
encryption , You need yaml Add the following to the filespec.s3.options
Configure to ensure successful backup :spec: ... s3: ... options: - --ignore-checksum
This section provides a variety of ways to store access . Just use the method that suits your situation .
- By importing AccessKey and SecretKey Backup to Amazon S3 Methods
- By importing AccessKey and SecretKey Backup to Ceph Methods
- By binding IAM And Pod Backup to Amazon S3 Methods
- By binding IAM And ServiceAccount Backup to Amazon S3 Methods
Method 1: establish
Backup
CR, adopt AccessKey and SecretKey Back up the data to Amazon S3.kubectl apply -f backup-s3.yaml
backup-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: tidb-cluster spec: from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws secretName: s3-secret region: ${region} bucket: ${bucket} # prefix: ${prefix} # storageClass: STANDARD_IA # acl: private # endpoint: # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
Method 2: establish
Backup
CR, adopt AccessKey and SecretKey Back up the data to Ceph.kubectl apply -f backup-s3.yaml
backup-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: tidb-cluster spec: from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: ceph secretName: s3-secret endpoint: ${endpoint} # prefix: ${prefix} bucket: ${bucket} # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
Method 3: establish
Backup
CR, adopt IAM binding Pod Back up the data to Amazon S3.kubectl apply -f backup-s3.yaml
backup-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: tidb-cluster annotations: iam.amazonaws.com/role: arn:aws:iam::123456789012:role/user spec: backupType: full from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: ${region} bucket: ${bucket} # prefix: ${prefix} # storageClass: STANDARD_IA # acl: private # endpoint: # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
Method 4: establish
Backup
CR, adopt IAM binding ServiceAccount Back up the data to Amazon S3.kubectl apply -f backup-s3.yaml
backup-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: demo1-backup-s3 namespace: tidb-cluster spec: backupType: full serviceAccount: tidb-backup-manager from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: ${region} bucket: ${bucket} # prefix: ${prefix} # storageClass: STANDARD_IA # acl: private # endpoint: # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
The above example will TiDB The data of the cluster is fully exported and backed up to Amazon S3 and Ceph On .Amazon S3 Of acl
、endpoint
、storageClass
Configuration items can be omitted . The rest are not Amazon S3 But compatible S3 Storage can be used and Amazon S3 Similar configuration . Refer to the example above Ceph Configuration of , Omit fields that do not need to be configured . More compatible S3 Storage related configuration reference S3 Storage field introduction .
In the example above ,.spec.dumpling
Express Dumpling Related configuration , Can be in options
Field assignment Dumpling Operation parameters of , For details, see Dumpling Using document ; By default, this field can be configured without . When you don't specify Dumpling The configuration of ,options
The default values of the fields are as follows :
options: - --threads=16 - --rows=10000
more Backup
CR For detailed explanation of fields, please refer to Backup CR Field is introduced .
Create good Backup
CR after , You can view the backup status through the following commands :
kubectl get bk -n tidb-cluster -owide
To get a Backup job Details of , Please use the following command . For $backup_job_name
, Please use the name in the output of the previous command .
kubectl describe bk -n tidb-cluster $backup_job_name
If you want to run again Ad-hoc Backup , You need Delete the backed up Backup CR And recreate .
Scheduled full backup
Users set backup policies to TiDB The cluster performs scheduled backup , At the same time, set the retention policy of backup to avoid too many backups . Scheduled full backup through customized BackupSchedule
CR Object to describe . A full backup will be triggered every time the backup time point , The bottom layer of scheduled full backup passes Ad-hoc Full backup . The following are the specific steps to create a scheduled full backup :
The first 1 Step : Regular full backup environment preparation
Same as Ad-hoc Full backup environment preparation .
The first 2 Step : Back up data in full on a regular basis to S3 Compatible storage
Be careful
because rclone
There is problem , If you use Amazon S3 Store backup , also Amazon S3 Open the AWS-KMS
encryption , You need yaml Add the following to the file spec.backupTemplate.s3.options
Configure to ensure successful backup :
spec: ... backupTemplate: ... s3: ... options: - --ignore-checksum
Method 1: establish
BackupSchedule
CR Turn on TiDB Scheduled full backup of the cluster , adopt AccessKey and SecretKey Back up the data to Amazon S3:kubectl apply -f backup-schedule-s3.yaml
backup-schedule-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-s3 namespace: tidb-cluster spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" backupTemplate: from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws secretName: s3-secret region: ${region} bucket: ${bucket} # prefix: ${prefix} # storageClass: STANDARD_IA # acl: private # endpoint: # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
Method 2: establish
BackupSchedule
CR Turn on TiDB Scheduled full backup of the cluster , adopt AccessKey and SecretKey Back up the data to Ceph:kubectl apply -f backup-schedule-s3.yaml
backup-schedule-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-ceph namespace: tidb-cluster spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" backupTemplate: from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: ceph secretName: s3-secret endpoint: ${endpoint} bucket: ${bucket} # prefix: ${prefix} # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
Method 3: establish
BackupSchedule
CR Turn on TiDB Scheduled full backup of the cluster , adopt IAM binding Pod Back up the data to Amazon S3:kubectl apply -f backup-schedule-s3.yaml
backup-schedule-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-s3 namespace: tidb-cluster annotations: iam.amazonaws.com/role: arn:aws:iam::123456789012:role/user spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" backupTemplate: from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: ${region} bucket: ${bucket} # prefix: ${prefix} # storageClass: STANDARD_IA # acl: private # endpoint: # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
Method 4: establish
BackupSchedule
CR Turn on TiDB Scheduled full backup of the cluster , adopt IAM binding ServiceAccount Back up the data to Amazon S3:kubectl apply -f backup-schedule-s3.yaml
backup-schedule-s3.yaml
The contents of the document are as follows :--- apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo1-backup-schedule-s3 namespace: tidb-cluster spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" serviceAccount: tidb-backup-manager backupTemplate: from: host: ${tidb_host} port: ${tidb_port} user: ${tidb_user} secretName: backup-demo1-tidb-secret s3: provider: aws region: ${region} bucket: ${bucket} # prefix: ${prefix} # storageClass: STANDARD_IA # acl: private # endpoint: # dumpling: # options: # - --threads=16 # - --rows=10000 # tableFilter: # - "test.*" # storageClassName: local-storage storageSize: 10Gi
After the scheduled full backup is created , You can view the status of scheduled full backup through the following commands :
kubectl get bks -n tidb-cluster -owide
Check all the backup pieces below the scheduled full backup :
kubectl get bk -l tidb.pingcap.com/backup-schedule=demo1-backup-schedule-s3 -n tidb-cluster
From the above example ,backupSchedule
The configuration of consists of two parts . Part of it is backupSchedule
Unique configuration , The other part is backupTemplate
.backupTemplate
Specify the configuration related to cluster and remote storage , Fields and Backup CR Medium spec
equally , Please refer to Backup CR Field is introduced .backupSchedule
For the introduction of unique configuration items, please refer to BackupSchedule CR Field is introduced .
Be careful
TiDB Operator Will create a PVC, This PVC At the same time Ad-hoc Full backup and scheduled full backup , The backup data will be stored in PV, Then upload to remote storage . If you want to delete this after the backup PVC, You can refer to Delete resources First back up Pod Delete , Then take it. PVC Delete .
If the backup and upload to the remote storage are successful ,TiDB Operator Will automatically delete the local backup file . If the upload fails , Then the local backup file will be preserved .
Delete the backed up Backup CR
When the backup is complete , You may need to delete the backup Backup CR. Please refer to Delete the backed up Backup CR.
边栏推荐
- Esrally domestic installation and use pit avoidance Guide - the latest in the whole network
- What are the ways to download network pictures with PHP
- [非线性控制理论]9_非线性控制理论串讲
- 21. Delete data
- Nacos Development Manual
- [KMP] template
- 从 SQL 文件迁移数据到 TiDB
- Qualitative risk analysis of Oracle project management system
- The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
- ESP系列引脚说明图汇总
猜你喜欢
Risk planning and identification of Oracle project management system
ESP系列引脚说明图汇总
面向个性化需求的在线云数据库混合调优系统 | SIGMOD 2022入选论文解读
[untitled]
Qualitative risk analysis of Oracle project management system
From monomer structure to microservice architecture, introduction to microservices
Data governance: 3 characteristics, 4 transcendence and 3 28 principles of master data
Understanding of law of large numbers and central limit theorem
How to prevent Association in cross-border e-commerce multi account operations?
24. Query table data (basic)
随机推荐
Migrate data from CSV files to tidb
Mex related learning
二叉树创建 & 遍历
Codeforces Global Round 19(A~D)
MES, APS and ERP are essential to realize fine production
Nc204382 medium sequence
数据治理:元数据管理篇
PHP - Common magic method (nanny level teaching)
07- [istio] istio destinationrule (purpose rule)
图像融合--挑战、机遇与对策
备份与恢复 CR 介绍
hcip--mpls
[count] [combined number] value series
Chinese Remainder Theorem (Sun Tzu theorem) principle and template code
Artcube information of "designer universe": Guangzhou implements the community designer system to achieve "great improvement" of urban quality | national economic and Information Center
Easy to use tcp-udp_ Debug tool download and use
ESP系列引脚說明圖匯總
Helm install Minio
Migrate data from a tidb cluster to another tidb cluster
Circuit breaker: use of hystrix