当前位置:网站首页>Migrate data from a tidb cluster to another tidb cluster
Migrate data from a tidb cluster to another tidb cluster
2022-07-06 08:01:00 【Tianxiang shop】
This document describes how to transform data from a TiDB The cluster migrates to another TiDB. In the following scenario , You can take data from a TiDB The cluster migrates to another TiDB colony :
- Open library : primary TiDB Cluster volume is too large , Or to avoid the original TiDB The several services carried by the cluster affect each other , The original TiDB Some tables in the cluster are moved to another TiDB colony .
- Library relocation : It is to migrate the physical location of the database , For example, replace the data center .
- upgrade : In a scenario with strict requirements for data correctness , Data can be migrated to a higher version TiDB colony , Ensure data security .
This article will simulate the entire migration process , Specifically, it includes the following four steps :
- Set up the environment
- Migrate full data
- Migrating incremental data
- Smooth switching business
The first 1 Step : Set up the environment
Deployment cluster .
Use tiup playground Quickly deploy upstream and downstream test clusters . More deployment information , Please refer to tiup Official documents .
# Create an upstream cluster tiup --tag upstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # Create a downstream cluster tiup --tag downstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # View the cluster status tiup status
Initialization data .
In the test cluster, by default test database , So you can use sysbench The tool generates test data , Used to simulate historical data in real clusters .
sysbench oltp_write_only --config-file=./tidb-config --tables=10 --table-size=10000 prepare
Through here sysbench function oltp_write_only Script , It will be generated in the test database 10 A watch , Each table contains 10000 Row initial data .tidb-config The configuration is as follows :
mysql-host=172.16.6.122 # It needs to be replaced by the actual upstream cluster ip mysql-port=4000 mysql-user=root mysql-password= db-driver=mysql # Set the database driver to mysql mysql-db=test # Set the test database to test report-interval=10 # Set the interval of periodic statistics as 10 second threads=10 # Set up worker The number of threads is 10 time=0 # Set the total execution time of the script ,0 Means unrestricted rate=100 # Set the average transaction rate tps = 100
Simulate business load .
During the data migration of the actual production cluster , Usually, the original cluster will also write new business data , In this article, you can use sysbench The tool simulates a continuous write load , The following command will use 10 individual worker In the database sbtest1、sbtest2 and sbtest3 Data is continuously written in the three tables , Its total tps Limit to 100.
sysbench oltp_write_only --config-file=./tidb-config --tables=3 run
Prepare external storage .
In full data backup , Both upstream and downstream clusters need to access backup files , Therefore, it is recommended to use External storage Store backup files , In this paper, through the Minio Analog compatibility S3 Storage services for :
wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio # Configure access minio Of access-key access-screct-id export HOST_IP='172.16.6.122' # Replace with the actual upstream cluster ip export MINIO_ROOT_USER='minio' export MINIO_ROOT_PASSWORD='miniostorage' # Create a data directory , among backup by bucket The name of mkdir -p data/backup # start-up minio, The exposed port is 6060 ./minio server ./data --address :6060 &
The above command line starts a single node minio server simulation S3 service , The relevant parameters are :
- Endpoint: http://${HOST_IP}:6060/
- Access-key: minio
- Secret-access-key: miniostorage
- Bucket: backup
The corresponding access link is :
s3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true
The first 2 Step : Migrate full data
After setting up the test environment , have access to BR The backup and recovery function of the tool migrates the full amount of data .BR There are many tools Usage mode , Used in this article SQL sentence BACKUP and RESTORE Backup recovery .
Be careful
When the versions of upstream and downstream clusters are inconsistent , Should check BR The tool Compatibility . This article assumes that the upstream and downstream clusters have the same version .
close GC.
To ensure that newly written data is not lost during incremental migration , Before starting a backup , You need to turn off garbage collection in the upstream cluster (GC) Mechanism , To ensure that the system no longer cleans up historical data .
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+: | @@global.tidb_gc_enable | +-------------------------+ | 0 | +-------------------------+ 1 row in set (0.00 sec)
Be careful
In a production cluster , close GC Mechanism and backup operation will reduce the read performance of the cluster to a certain extent , It is recommended to back up during the low peak period , And set the appropriate RATE_LIMIT Limit the impact of backup operations on online business .
The backup data .
Execute in the upstream cluster BACKUP Statement backup data :
MySQL [(none)]> BACKUP DATABASE * TO 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true' RATE_LIMIT = 120 MB/SECOND; +---------------+----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +---------------+----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434047157698561 | 2022-02-25 19:57:59 | 2022-02-25 19:57:59 | +---------------+----------+--------------------+---------------------+---------------------+ 1 row in set (2.11 sec)
After the backup statement is submitted successfully ,TiDB Meta information about the backup data will be returned , Here we need to focus on BackupTS, It means that the data will be backed up before this point in time , In the following tutorial , This article will use BackupTS As Data verification deadline and TiCDC Start time of incremental scan .
Restore data .
Execute in the downstream cluster RESTORE Statement to recover data :
mysql> RESTORE DATABASE * FROM 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true'; +--------------+-----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +--------------+-----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434141450371074 | 2022-02-25 20:03:59 | 2022-02-25 20:03:59 | +--------------+-----------+--------------------+---------------------+---------------------+ 1 row in set (41.85 sec)
( Optional ) Check the data .
adopt sync-diff-inspector Tools , It can verify the consistency of upstream and downstream data at a certain time point . You can see from the output of the above backup and restore commands , The time point of upstream cluster backup is 431434047157698561, The time point for the downstream cluster to complete data recovery is 431434141450371074.
sync_diff_inspector -C ./config.yaml
About sync-diff-inspector Configuration method of , Please refer to Profile description , In this paper , The corresponding configuration is :
# Diff Configuration. ######################### Datasource config ######################### [data-sources] [data-sources.upstream] host = "172.16.6.122" # It needs to be replaced by the actual upstream cluster ip port = 4000 user = "root" password = "" snapshot = "431434047157698561" # Configure as the actual backup point in time ( See 「 Backup 」 Section of the BackupTS) [data-sources.downstream] host = "172.16.6.125" # It needs to be replaced by the actual downstream cluster ip port = 4000 user = "root" password = "" ######################### Task config ######################### [task] output-dir = "./output" source-instances = ["upstream"] target-instance = "downstream" target-check-tables = ["*.*"]
The first 3 Step : Migrating incremental data
Deploy TiCDC.
After full data migration , You can deploy and configure TiCDC The cluster synchronizes incremental data , Please refer to TiCDC Deploy . When creating a test cluster , We've started a TiCDC node , Therefore, it can be carried out directly changefeed Configuration of .
Create synchronization tasks .
In the upstream cluster , Execute the following command to create a synchronization link from the upstream to the downstream cluster :
tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561"
Among the above orders :
- --pd: The address of the actual upstream cluster
- --sink-uri: Address downstream of synchronization task
- --changefeed-id: Syncing tasks ID, The format needs to conform to regular expressions ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
- --start-ts:TiCDC Starting point of synchronization , It needs to be set as the actual backup time point ( That is the second chapter 「 Backup 」 In this section BackupTS)
More about changefeed Configuration of , Please refer to Synchronization task profile description .
Re open GC.
TiCDC Can guarantee GC Only the synchronized historical data is recycled . therefore , Create a cluster from upstream to downstream changefeed after , You can execute the following commands to restore the garbage collection function of the cluster . Please refer to TiCDC GC safepoint The complete behavior of .
MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+ | @@global.tidb_gc_enable | +-------------------------+ | 1 | +-------------------------+ 1 row in set (0.00 sec)
The first 4 Step : Smooth switching business
adopt TiCDC After the upstream and downstream synchronization links are created , The write data of the original cluster will be synchronized to the new cluster with a very low delay , At this point, you can gradually migrate the read traffic to the new cluster . Observe for a while , If the new cluster performs stably , You can connect the write traffic to the new cluster , There are three main steps :
Stop the writing business of the upstream cluster . Confirm that the upstream data has been synchronized to the downstream , Stop upstream to downstream clusters changefeed.
# Stop the old cluster to the new cluster changefeed tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379 # see changefeed state tiup cdc cli changefeed list [ { "id": "upstream-to-downstream", "summary": { "state": "stopped", # It needs to be confirmed that the status here is stopped "tso": 431747241184329729, "checkpoint": "2022-03-11 15:50:20.387", # Make sure that the time here is later than the stop time "error": null } } ]
Create downstream to upstream clusters changefeed. At this time, the upstream and downstream data are consistent , And no new data is written , Therefore, you can not specify start-ts, The default is the current time :
tiup cdc cli changefeed create --pd=http://172.16.6.125:2379 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream"
Migrate the write business to the downstream cluster , After a period of observation , Wait for the new cluster to perform stably , You can discard the original cluster .
边栏推荐
- Oracle time display adjustment
- matplotlib. Widgets are easy to use
- Linked list interview questions (Graphic explanation)
- 面向个性化需求的在线云数据库混合调优系统 | SIGMOD 2022入选论文解读
- MES, APS and ERP are essential to realize fine production
- 软件开发的一点随记
- Data governance: misunderstanding sorting
- C语言自定义类型:结构体
- How to prevent Association in cross-border e-commerce multi account operations?
- Document 2 Feb 12 16:54
猜你喜欢
How to prevent Association in cross-border e-commerce multi account operations?
面向个性化需求的在线云数据库混合调优系统 | SIGMOD 2022入选论文解读
Secure captcha (unsafe verification code) of DVWA range
好用的TCP-UDP_debug工具下载和使用
Wireshark grabs packets to understand its word TCP segment
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
Risk planning and identification of Oracle project management system
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
Codeforces Global Round 19(A~D)
Linked list interview questions (Graphic explanation)
随机推荐
07- [istio] istio destinationrule (purpose rule)
让学指针变得更简单(三)
esRally国内安装使用避坑指南-全网最新
Pangolin Library: control panel, control components, shortcut key settings
Database basic commands
File upload of DVWA range
C语言自定义类型:结构体
Hcip day 16
Iterator Foundation
Uibehavior, a comprehensive exploration of ugui source code
On why we should program for all
[Yugong series] February 2022 U3D full stack class 010 prefabricated parts
Flash return file download
"Friendship and righteousness" of the center for national economy and information technology: China's friendship wine - the "unparalleled loyalty and righteousness" of the solidarity group released th
Risk planning and identification of Oracle project management system
Learn Arduino with examples
"Designer universe": "benefit dimension" APEC public welfare + 2022 the latest slogan and the new platform will be launched soon | Asia Pacific Financial Media
matplotlib. Widgets are easy to use
Epoll and IO multiplexing of redis
使用 BR 恢复 S3 兼容存储上的备份数据