当前位置：网站首页>Migrate data from a tidb cluster to another tidb cluster

Migrate data from a tidb cluster to another tidb cluster

2022-07-06 08:01:00 【Tianxiang shop】

This document describes how to transform data from a TiDB The cluster migrates to another TiDB. In the following scenario , You can take data from a TiDB The cluster migrates to another TiDB colony ：

Open library ： primary TiDB Cluster volume is too large , Or to avoid the original TiDB The several services carried by the cluster affect each other , The original TiDB Some tables in the cluster are moved to another TiDB colony .
Library relocation ： It is to migrate the physical location of the database , For example, replace the data center .
upgrade ： In a scenario with strict requirements for data correctness , Data can be migrated to a higher version TiDB colony , Ensure data security .

This article will simulate the entire migration process , Specifically, it includes the following four steps ：

Set up the environment
Migrate full data
Migrating incremental data
Smooth switching business

The first 1 Step ： Set up the environment

Deployment cluster .
Use tiup playground Quickly deploy upstream and downstream test clusters . More deployment information , Please refer to tiup Official documents .
# Create an upstream cluster tiup --tag upstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # Create a downstream cluster tiup --tag downstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # View the cluster status tiup status
Initialization data .
In the test cluster, by default test database , So you can use sysbench The tool generates test data , Used to simulate historical data in real clusters .
sysbench oltp_write_only --config-file=./tidb-config --tables=10 --table-size=10000 prepare
Through here sysbench function oltp_write_only Script , It will be generated in the test database 10 A watch , Each table contains 10000 Row initial data .tidb-config The configuration is as follows ：
mysql-host=172.16.6.122 # It needs to be replaced by the actual upstream cluster ip mysql-port=4000 mysql-user=root mysql-password= db-driver=mysql # Set the database driver to mysql mysql-db=test # Set the test database to test report-interval=10 # Set the interval of periodic statistics as 10 second threads=10 # Set up worker The number of threads is 10 time=0 # Set the total execution time of the script ,0 Means unrestricted rate=100 # Set the average transaction rate tps = 100
Simulate business load .
During the data migration of the actual production cluster , Usually, the original cluster will also write new business data , In this article, you can use sysbench The tool simulates a continuous write load , The following command will use 10 individual worker In the database sbtest1、sbtest2 and sbtest3 Data is continuously written in the three tables , Its total tps Limit to 100.
sysbench oltp_write_only --config-file=./tidb-config --tables=3 run
Prepare external storage .
In full data backup , Both upstream and downstream clusters need to access backup files , Therefore, it is recommended to use External storage Store backup files , In this paper, through the Minio Analog compatibility S3 Storage services for ：
wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio # Configure access minio Of access-key access-screct-id export HOST_IP='172.16.6.122' # Replace with the actual upstream cluster ip export MINIO_ROOT_USER='minio' export MINIO_ROOT_PASSWORD='miniostorage' # Create a data directory , among backup by bucket The name of mkdir -p data/backup # start-up minio, The exposed port is 6060 ./minio server ./data --address :6060 &
The above command line starts a single node minio server simulation S3 service , The relevant parameters are ：
- Endpoint: http://${HOST_IP}:6060/
- Access-key: minio
- Secret-access-key: miniostorage
- Bucket: backup
The corresponding access link is ：
s3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true

The first 2 Step ： Migrate full data

After setting up the test environment , have access to BR The backup and recovery function of the tool migrates the full amount of data .BR There are many tools Usage mode , Used in this article SQL sentence BACKUP and RESTORE Backup recovery .

Be careful

When the versions of upstream and downstream clusters are inconsistent , Should check BR The tool Compatibility . This article assumes that the upstream and downstream clusters have the same version .

close GC.
To ensure that newly written data is not lost during incremental migration , Before starting a backup , You need to turn off garbage collection in the upstream cluster (GC) Mechanism , To ensure that the system no longer cleans up historical data .
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+： | @@global.tidb_gc_enable | +-------------------------+ | 0 | +-------------------------+ 1 row in set (0.00 sec)
Be careful
In a production cluster , close GC Mechanism and backup operation will reduce the read performance of the cluster to a certain extent , It is recommended to back up during the low peak period , And set the appropriate RATE_LIMIT Limit the impact of backup operations on online business .
The backup data .
Execute in the upstream cluster BACKUP Statement backup data ：
MySQL [(none)]> BACKUP DATABASE * TO 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true' RATE_LIMIT = 120 MB/SECOND; +---------------+----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +---------------+----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434047157698561 | 2022-02-25 19:57:59 | 2022-02-25 19:57:59 | +---------------+----------+--------------------+---------------------+---------------------+ 1 row in set (2.11 sec)
After the backup statement is submitted successfully ,TiDB Meta information about the backup data will be returned , Here we need to focus on BackupTS, It means that the data will be backed up before this point in time , In the following tutorial , This article will use BackupTS As Data verification deadline and TiCDC Start time of incremental scan .
Restore data .
Execute in the downstream cluster RESTORE Statement to recover data ：
mysql> RESTORE DATABASE * FROM 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true'; +--------------+-----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +--------------+-----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434141450371074 | 2022-02-25 20:03:59 | 2022-02-25 20:03:59 | +--------------+-----------+--------------------+---------------------+---------------------+ 1 row in set (41.85 sec)
（ Optional ） Check the data .
adopt sync-diff-inspector Tools , It can verify the consistency of upstream and downstream data at a certain time point . You can see from the output of the above backup and restore commands , The time point of upstream cluster backup is 431434047157698561, The time point for the downstream cluster to complete data recovery is 431434141450371074.
sync_diff_inspector -C ./config.yaml
About sync-diff-inspector Configuration method of , Please refer to Profile description , In this paper , The corresponding configuration is ：
# Diff Configuration. ######################### Datasource config ######################### [data-sources] [data-sources.upstream] host = "172.16.6.122" # It needs to be replaced by the actual upstream cluster ip port = 4000 user = "root" password = "" snapshot = "431434047157698561" # Configure as the actual backup point in time （ See 「 Backup 」 Section of the BackupTS） [data-sources.downstream] host = "172.16.6.125" # It needs to be replaced by the actual downstream cluster ip port = 4000 user = "root" password = "" ######################### Task config ######################### [task] output-dir = "./output" source-instances = ["upstream"] target-instance = "downstream" target-check-tables = ["*.*"]

The first 3 Step ： Migrating incremental data

Deploy TiCDC.
After full data migration , You can deploy and configure TiCDC The cluster synchronizes incremental data , Please refer to TiCDC Deploy . When creating a test cluster , We've started a TiCDC node , Therefore, it can be carried out directly changefeed Configuration of .
Create synchronization tasks .
In the upstream cluster , Execute the following command to create a synchronization link from the upstream to the downstream cluster ：
tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561"
Among the above orders ：
- --pd： The address of the actual upstream cluster
- --sink-uri： Address downstream of synchronization task
- --changefeed-id： Syncing tasks ID, The format needs to conform to regular expressions ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
- --start-ts：TiCDC Starting point of synchronization , It needs to be set as the actual backup time point （ That is the second chapter 「 Backup 」 In this section BackupTS）
More about changefeed Configuration of , Please refer to Synchronization task profile description .
Re open GC.
TiCDC Can guarantee GC Only the synchronized historical data is recycled . therefore , Create a cluster from upstream to downstream changefeed after , You can execute the following commands to restore the garbage collection function of the cluster . Please refer to TiCDC GC safepoint The complete behavior of .
MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+ | @@global.tidb_gc_enable | +-------------------------+ | 1 | +-------------------------+ 1 row in set (0.00 sec)

The first 4 Step ： Smooth switching business

adopt TiCDC After the upstream and downstream synchronization links are created , The write data of the original cluster will be synchronized to the new cluster with a very low delay , At this point, you can gradually migrate the read traffic to the new cluster . Observe for a while , If the new cluster performs stably , You can connect the write traffic to the new cluster , There are three main steps ：

Stop the writing business of the upstream cluster . Confirm that the upstream data has been synchronized to the downstream , Stop upstream to downstream clusters changefeed.
# Stop the old cluster to the new cluster changefeed tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379 # see changefeed state tiup cdc cli changefeed list [ { "id": "upstream-to-downstream", "summary": { "state": "stopped", # It needs to be confirmed that the status here is stopped "tso": 431747241184329729, "checkpoint": "2022-03-11 15:50:20.387", # Make sure that the time here is later than the stop time "error": null } } ]
Create downstream to upstream clusters changefeed. At this time, the upstream and downstream data are consistent , And no new data is written , Therefore, you can not specify start-ts, The default is the current time ：
tiup cdc cli changefeed create --pd=http://172.16.6.125:2379 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream"
Migrate the write business to the downstream cluster , After a period of observation , Wait for the new cluster to perform stably , You can discard the original cluster .

原网站

版权声明
本文为[Tianxiang shop]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060758157350.html