当前位置:网站首页>Migrate data from a tidb cluster to another tidb cluster
Migrate data from a tidb cluster to another tidb cluster
2022-07-06 08:01:00 【Tianxiang shop】
This document describes how to transform data from a TiDB The cluster migrates to another TiDB. In the following scenario , You can take data from a TiDB The cluster migrates to another TiDB colony :
- Open library : primary TiDB Cluster volume is too large , Or to avoid the original TiDB The several services carried by the cluster affect each other , The original TiDB Some tables in the cluster are moved to another TiDB colony .
- Library relocation : It is to migrate the physical location of the database , For example, replace the data center .
- upgrade : In a scenario with strict requirements for data correctness , Data can be migrated to a higher version TiDB colony , Ensure data security .
This article will simulate the entire migration process , Specifically, it includes the following four steps :
- Set up the environment
- Migrate full data
- Migrating incremental data
- Smooth switching business
The first 1 Step : Set up the environment
Deployment cluster .
Use tiup playground Quickly deploy upstream and downstream test clusters . More deployment information , Please refer to tiup Official documents .
# Create an upstream cluster tiup --tag upstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # Create a downstream cluster tiup --tag downstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # View the cluster status tiup status
Initialization data .
In the test cluster, by default test database , So you can use sysbench The tool generates test data , Used to simulate historical data in real clusters .
sysbench oltp_write_only --config-file=./tidb-config --tables=10 --table-size=10000 prepare
Through here sysbench function oltp_write_only Script , It will be generated in the test database 10 A watch , Each table contains 10000 Row initial data .tidb-config The configuration is as follows :
mysql-host=172.16.6.122 # It needs to be replaced by the actual upstream cluster ip mysql-port=4000 mysql-user=root mysql-password= db-driver=mysql # Set the database driver to mysql mysql-db=test # Set the test database to test report-interval=10 # Set the interval of periodic statistics as 10 second threads=10 # Set up worker The number of threads is 10 time=0 # Set the total execution time of the script ,0 Means unrestricted rate=100 # Set the average transaction rate tps = 100
Simulate business load .
During the data migration of the actual production cluster , Usually, the original cluster will also write new business data , In this article, you can use sysbench The tool simulates a continuous write load , The following command will use 10 individual worker In the database sbtest1、sbtest2 and sbtest3 Data is continuously written in the three tables , Its total tps Limit to 100.
sysbench oltp_write_only --config-file=./tidb-config --tables=3 run
Prepare external storage .
In full data backup , Both upstream and downstream clusters need to access backup files , Therefore, it is recommended to use External storage Store backup files , In this paper, through the Minio Analog compatibility S3 Storage services for :
wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio # Configure access minio Of access-key access-screct-id export HOST_IP='172.16.6.122' # Replace with the actual upstream cluster ip export MINIO_ROOT_USER='minio' export MINIO_ROOT_PASSWORD='miniostorage' # Create a data directory , among backup by bucket The name of mkdir -p data/backup # start-up minio, The exposed port is 6060 ./minio server ./data --address :6060 &
The above command line starts a single node minio server simulation S3 service , The relevant parameters are :
- Endpoint: http://${HOST_IP}:6060/
- Access-key: minio
- Secret-access-key: miniostorage
- Bucket: backup
The corresponding access link is :
s3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true
The first 2 Step : Migrate full data
After setting up the test environment , have access to BR The backup and recovery function of the tool migrates the full amount of data .BR There are many tools Usage mode , Used in this article SQL sentence BACKUP and RESTORE Backup recovery .
Be careful
When the versions of upstream and downstream clusters are inconsistent , Should check BR The tool Compatibility . This article assumes that the upstream and downstream clusters have the same version .
close GC.
To ensure that newly written data is not lost during incremental migration , Before starting a backup , You need to turn off garbage collection in the upstream cluster (GC) Mechanism , To ensure that the system no longer cleans up historical data .
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+: | @@global.tidb_gc_enable | +-------------------------+ | 0 | +-------------------------+ 1 row in set (0.00 sec)
Be careful
In a production cluster , close GC Mechanism and backup operation will reduce the read performance of the cluster to a certain extent , It is recommended to back up during the low peak period , And set the appropriate RATE_LIMIT Limit the impact of backup operations on online business .
The backup data .
Execute in the upstream cluster BACKUP Statement backup data :
MySQL [(none)]> BACKUP DATABASE * TO 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true' RATE_LIMIT = 120 MB/SECOND; +---------------+----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +---------------+----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434047157698561 | 2022-02-25 19:57:59 | 2022-02-25 19:57:59 | +---------------+----------+--------------------+---------------------+---------------------+ 1 row in set (2.11 sec)
After the backup statement is submitted successfully ,TiDB Meta information about the backup data will be returned , Here we need to focus on BackupTS, It means that the data will be backed up before this point in time , In the following tutorial , This article will use BackupTS As Data verification deadline and TiCDC Start time of incremental scan .
Restore data .
Execute in the downstream cluster RESTORE Statement to recover data :
mysql> RESTORE DATABASE * FROM 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true'; +--------------+-----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +--------------+-----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434141450371074 | 2022-02-25 20:03:59 | 2022-02-25 20:03:59 | +--------------+-----------+--------------------+---------------------+---------------------+ 1 row in set (41.85 sec)
( Optional ) Check the data .
adopt sync-diff-inspector Tools , It can verify the consistency of upstream and downstream data at a certain time point . You can see from the output of the above backup and restore commands , The time point of upstream cluster backup is 431434047157698561, The time point for the downstream cluster to complete data recovery is 431434141450371074.
sync_diff_inspector -C ./config.yaml
About sync-diff-inspector Configuration method of , Please refer to Profile description , In this paper , The corresponding configuration is :
# Diff Configuration. ######################### Datasource config ######################### [data-sources] [data-sources.upstream] host = "172.16.6.122" # It needs to be replaced by the actual upstream cluster ip port = 4000 user = "root" password = "" snapshot = "431434047157698561" # Configure as the actual backup point in time ( See 「 Backup 」 Section of the BackupTS) [data-sources.downstream] host = "172.16.6.125" # It needs to be replaced by the actual downstream cluster ip port = 4000 user = "root" password = "" ######################### Task config ######################### [task] output-dir = "./output" source-instances = ["upstream"] target-instance = "downstream" target-check-tables = ["*.*"]
The first 3 Step : Migrating incremental data
Deploy TiCDC.
After full data migration , You can deploy and configure TiCDC The cluster synchronizes incremental data , Please refer to TiCDC Deploy . When creating a test cluster , We've started a TiCDC node , Therefore, it can be carried out directly changefeed Configuration of .
Create synchronization tasks .
In the upstream cluster , Execute the following command to create a synchronization link from the upstream to the downstream cluster :
tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561"
Among the above orders :
- --pd: The address of the actual upstream cluster
- --sink-uri: Address downstream of synchronization task
- --changefeed-id: Syncing tasks ID, The format needs to conform to regular expressions ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
- --start-ts:TiCDC Starting point of synchronization , It needs to be set as the actual backup time point ( That is the second chapter 「 Backup 」 In this section BackupTS)
More about changefeed Configuration of , Please refer to Synchronization task profile description .
Re open GC.
TiCDC Can guarantee GC Only the synchronized historical data is recycled . therefore , Create a cluster from upstream to downstream changefeed after , You can execute the following commands to restore the garbage collection function of the cluster . Please refer to TiCDC GC safepoint The complete behavior of .
MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+ | @@global.tidb_gc_enable | +-------------------------+ | 1 | +-------------------------+ 1 row in set (0.00 sec)
The first 4 Step : Smooth switching business
adopt TiCDC After the upstream and downstream synchronization links are created , The write data of the original cluster will be synchronized to the new cluster with a very low delay , At this point, you can gradually migrate the read traffic to the new cluster . Observe for a while , If the new cluster performs stably , You can connect the write traffic to the new cluster , There are three main steps :
Stop the writing business of the upstream cluster . Confirm that the upstream data has been synchronized to the downstream , Stop upstream to downstream clusters changefeed.
# Stop the old cluster to the new cluster changefeed tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379 # see changefeed state tiup cdc cli changefeed list [ { "id": "upstream-to-downstream", "summary": { "state": "stopped", # It needs to be confirmed that the status here is stopped "tso": 431747241184329729, "checkpoint": "2022-03-11 15:50:20.387", # Make sure that the time here is later than the stop time "error": null } } ]
Create downstream to upstream clusters changefeed. At this time, the upstream and downstream data are consistent , And no new data is written , Therefore, you can not specify start-ts, The default is the current time :
tiup cdc cli changefeed create --pd=http://172.16.6.125:2379 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream"
Migrate the write business to the downstream cluster , After a period of observation , Wait for the new cluster to perform stably , You can discard the original cluster .
边栏推荐
- Convolution, pooling, activation function, initialization, normalization, regularization, learning rate - Summary of deep learning foundation
- CAD ARX 获取当前的视口设置
- DataX self check error /datax/plugin/reader/_ drdsreader/plugin. Json] does not exist
- Asia Pacific Financial Media | female pattern ladyvision: forced the hotel to upgrade security. The drunk woman died in the guest room, and the hotel was sentenced not to pay compensation | APEC secur
- MySQL view tablespace and create table statements
- 解决方案:智慧工地智能巡檢方案視頻監控系統
- Notes on software development
- NFT smart contract release, blind box, public offering technology practice -- contract
- 649. Dota2 Senate
- [Yugong series] creation of 009 unity object of U3D full stack class in February 2022
猜你喜欢
Esrally domestic installation and use pit avoidance Guide - the latest in the whole network
NFT smart contract release, blind box, public offering technology practice -- jigsaw puzzle
Risk planning and identification of Oracle project management system
好用的TCP-UDP_debug工具下载和使用
[untitled]
File upload of DVWA range
How to prevent Association in cross-border e-commerce multi account operations?
hcip--mpls
Asia Pacific Financial Media | designer universe | Guangdong responds to the opinions of the national development and Reform Commission. Primary school students incarnate as small community designers
解决方案:智慧工地智能巡检方案视频监控系统
随机推荐
Esrally domestic installation and use pit avoidance Guide - the latest in the whole network
CAD ARX gets the current viewport settings
"Designer universe" Guangdong responds to the opinions of the national development and Reform Commission. Primary school students incarnate as small community designers | national economic and Informa
【Redis】NoSQL数据库和redis简介
[1. Delphi foundation] 1 Introduction to Delphi Programming
[nonlinear control theory]9_ A series of lectures on nonlinear control theory
Circuit breaker: use of hystrix
Learn Arduino with examples
Linked list interview questions (Graphic explanation)
Transformer principle and code elaboration
指针和数组笔试题解析
[research materials] 2022 China yuancosmos white paper - Download attached
MFC 给列表控件发送左键单击、双击、以及右键单击消息
从 SQL 文件迁移数据到 TiDB
flask返回文件下载
二叉树创建 & 遍历
Helm install Minio
Parameter self-tuning of relay feedback PID controller
PHP - Common magic method (nanny level teaching)
Webrtc series-h.264 estimated bit rate calculation