当前位置:网站首页>Migrate data from a tidb cluster to another tidb cluster
Migrate data from a tidb cluster to another tidb cluster
2022-07-06 08:01:00 【Tianxiang shop】
This document describes how to transform data from a TiDB The cluster migrates to another TiDB. In the following scenario , You can take data from a TiDB The cluster migrates to another TiDB colony :
- Open library : primary TiDB Cluster volume is too large , Or to avoid the original TiDB The several services carried by the cluster affect each other , The original TiDB Some tables in the cluster are moved to another TiDB colony .
- Library relocation : It is to migrate the physical location of the database , For example, replace the data center .
- upgrade : In a scenario with strict requirements for data correctness , Data can be migrated to a higher version TiDB colony , Ensure data security .
This article will simulate the entire migration process , Specifically, it includes the following four steps :
- Set up the environment
- Migrate full data
- Migrating incremental data
- Smooth switching business
The first 1 Step : Set up the environment
Deployment cluster .
Use tiup playground Quickly deploy upstream and downstream test clusters . More deployment information , Please refer to tiup Official documents .
# Create an upstream cluster tiup --tag upstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # Create a downstream cluster tiup --tag downstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # View the cluster status tiup statusInitialization data .
In the test cluster, by default test database , So you can use sysbench The tool generates test data , Used to simulate historical data in real clusters .
sysbench oltp_write_only --config-file=./tidb-config --tables=10 --table-size=10000 prepareThrough here sysbench function oltp_write_only Script , It will be generated in the test database 10 A watch , Each table contains 10000 Row initial data .tidb-config The configuration is as follows :
mysql-host=172.16.6.122 # It needs to be replaced by the actual upstream cluster ip mysql-port=4000 mysql-user=root mysql-password= db-driver=mysql # Set the database driver to mysql mysql-db=test # Set the test database to test report-interval=10 # Set the interval of periodic statistics as 10 second threads=10 # Set up worker The number of threads is 10 time=0 # Set the total execution time of the script ,0 Means unrestricted rate=100 # Set the average transaction rate tps = 100Simulate business load .
During the data migration of the actual production cluster , Usually, the original cluster will also write new business data , In this article, you can use sysbench The tool simulates a continuous write load , The following command will use 10 individual worker In the database sbtest1、sbtest2 and sbtest3 Data is continuously written in the three tables , Its total tps Limit to 100.
sysbench oltp_write_only --config-file=./tidb-config --tables=3 runPrepare external storage .
In full data backup , Both upstream and downstream clusters need to access backup files , Therefore, it is recommended to use External storage Store backup files , In this paper, through the Minio Analog compatibility S3 Storage services for :
wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio # Configure access minio Of access-key access-screct-id export HOST_IP='172.16.6.122' # Replace with the actual upstream cluster ip export MINIO_ROOT_USER='minio' export MINIO_ROOT_PASSWORD='miniostorage' # Create a data directory , among backup by bucket The name of mkdir -p data/backup # start-up minio, The exposed port is 6060 ./minio server ./data --address :6060 &The above command line starts a single node minio server simulation S3 service , The relevant parameters are :
- Endpoint: http://${HOST_IP}:6060/
- Access-key: minio
- Secret-access-key: miniostorage
- Bucket: backup
The corresponding access link is :
s3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true
The first 2 Step : Migrate full data
After setting up the test environment , have access to BR The backup and recovery function of the tool migrates the full amount of data .BR There are many tools Usage mode , Used in this article SQL sentence BACKUP and RESTORE Backup recovery .
Be careful
When the versions of upstream and downstream clusters are inconsistent , Should check BR The tool Compatibility . This article assumes that the upstream and downstream clusters have the same version .
close GC.
To ensure that newly written data is not lost during incremental migration , Before starting a backup , You need to turn off garbage collection in the upstream cluster (GC) Mechanism , To ensure that the system no longer cleans up historical data .
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+: | @@global.tidb_gc_enable | +-------------------------+ | 0 | +-------------------------+ 1 row in set (0.00 sec)Be careful
In a production cluster , close GC Mechanism and backup operation will reduce the read performance of the cluster to a certain extent , It is recommended to back up during the low peak period , And set the appropriate RATE_LIMIT Limit the impact of backup operations on online business .
The backup data .
Execute in the upstream cluster BACKUP Statement backup data :
MySQL [(none)]> BACKUP DATABASE * TO 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true' RATE_LIMIT = 120 MB/SECOND; +---------------+----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +---------------+----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434047157698561 | 2022-02-25 19:57:59 | 2022-02-25 19:57:59 | +---------------+----------+--------------------+---------------------+---------------------+ 1 row in set (2.11 sec)After the backup statement is submitted successfully ,TiDB Meta information about the backup data will be returned , Here we need to focus on BackupTS, It means that the data will be backed up before this point in time , In the following tutorial , This article will use BackupTS As Data verification deadline and TiCDC Start time of incremental scan .
Restore data .
Execute in the downstream cluster RESTORE Statement to recover data :
mysql> RESTORE DATABASE * FROM 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true'; +--------------+-----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +--------------+-----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434141450371074 | 2022-02-25 20:03:59 | 2022-02-25 20:03:59 | +--------------+-----------+--------------------+---------------------+---------------------+ 1 row in set (41.85 sec)( Optional ) Check the data .
adopt sync-diff-inspector Tools , It can verify the consistency of upstream and downstream data at a certain time point . You can see from the output of the above backup and restore commands , The time point of upstream cluster backup is 431434047157698561, The time point for the downstream cluster to complete data recovery is 431434141450371074.
sync_diff_inspector -C ./config.yamlAbout sync-diff-inspector Configuration method of , Please refer to Profile description , In this paper , The corresponding configuration is :
# Diff Configuration. ######################### Datasource config ######################### [data-sources] [data-sources.upstream] host = "172.16.6.122" # It needs to be replaced by the actual upstream cluster ip port = 4000 user = "root" password = "" snapshot = "431434047157698561" # Configure as the actual backup point in time ( See 「 Backup 」 Section of the BackupTS) [data-sources.downstream] host = "172.16.6.125" # It needs to be replaced by the actual downstream cluster ip port = 4000 user = "root" password = "" ######################### Task config ######################### [task] output-dir = "./output" source-instances = ["upstream"] target-instance = "downstream" target-check-tables = ["*.*"]
The first 3 Step : Migrating incremental data
Deploy TiCDC.
After full data migration , You can deploy and configure TiCDC The cluster synchronizes incremental data , Please refer to TiCDC Deploy . When creating a test cluster , We've started a TiCDC node , Therefore, it can be carried out directly changefeed Configuration of .
Create synchronization tasks .
In the upstream cluster , Execute the following command to create a synchronization link from the upstream to the downstream cluster :
tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561"Among the above orders :
- --pd: The address of the actual upstream cluster
- --sink-uri: Address downstream of synchronization task
- --changefeed-id: Syncing tasks ID, The format needs to conform to regular expressions ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
- --start-ts:TiCDC Starting point of synchronization , It needs to be set as the actual backup time point ( That is the second chapter 「 Backup 」 In this section BackupTS)
More about changefeed Configuration of , Please refer to Synchronization task profile description .
Re open GC.
TiCDC Can guarantee GC Only the synchronized historical data is recycled . therefore , Create a cluster from upstream to downstream changefeed after , You can execute the following commands to restore the garbage collection function of the cluster . Please refer to TiCDC GC safepoint The complete behavior of .
MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+ | @@global.tidb_gc_enable | +-------------------------+ | 1 | +-------------------------+ 1 row in set (0.00 sec)
The first 4 Step : Smooth switching business
adopt TiCDC After the upstream and downstream synchronization links are created , The write data of the original cluster will be synchronized to the new cluster with a very low delay , At this point, you can gradually migrate the read traffic to the new cluster . Observe for a while , If the new cluster performs stably , You can connect the write traffic to the new cluster , There are three main steps :
Stop the writing business of the upstream cluster . Confirm that the upstream data has been synchronized to the downstream , Stop upstream to downstream clusters changefeed.
# Stop the old cluster to the new cluster changefeed tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379 # see changefeed state tiup cdc cli changefeed list [ { "id": "upstream-to-downstream", "summary": { "state": "stopped", # It needs to be confirmed that the status here is stopped "tso": 431747241184329729, "checkpoint": "2022-03-11 15:50:20.387", # Make sure that the time here is later than the stop time "error": null } } ]Create downstream to upstream clusters changefeed. At this time, the upstream and downstream data are consistent , And no new data is written , Therefore, you can not specify start-ts, The default is the current time :
tiup cdc cli changefeed create --pd=http://172.16.6.125:2379 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream"Migrate the write business to the downstream cluster , After a period of observation , Wait for the new cluster to perform stably , You can discard the original cluster .
边栏推荐
- Introduction to number theory (greatest common divisor, prime sieve, inverse element)
- 将 NFT 设置为 ENS 个人资料头像的分步指南
- Machine learning - decision tree
- It's hard to find a job when the industry is in recession
- What are the ways to download network pictures with PHP
- How to use information mechanism to realize process mutual exclusion, process synchronization and precursor relationship
- 21. Delete data
- [nonlinear control theory]9_ A series of lectures on nonlinear control theory
- Epoll and IO multiplexing of redis
- Qualitative risk analysis of Oracle project management system
猜你喜欢

【T31ZL智能视频应用处理器资料】

链表面试题(图文详解)

Interview Reply of Zhuhai Jinshan
![[1. Delphi foundation] 1 Introduction to Delphi Programming](/img/14/272f7b537eedb0267a795dba78020d.jpg)
[1. Delphi foundation] 1 Introduction to Delphi Programming

Golang DNS write casually
![[Yugong series] February 2022 U3D full stack class 011 unity section 1 mind map](/img/c3/1b6013bfb2441219bf621c3f0726ea.jpg)
[Yugong series] February 2022 U3D full stack class 011 unity section 1 mind map

Asia Pacific Financial Media | designer universe | Guangdong responds to the opinions of the national development and Reform Commission. Primary school students incarnate as small community designers

Artcube information of "designer universe": Guangzhou implements the community designer system to achieve "great improvement" of urban quality | national economic and Information Center

【Redis】NoSQL数据库和redis简介

A Closer Look at How Fine-tuning Changes BERT
随机推荐
MEX有关的学习
Wireshark grabs packets to understand its word TCP segment
ROS learning (IX): referencing custom message types in header files
C语言自定义类型:结构体
Epoll and IO multiplexing of redis
Codeforces Global Round 19(A~D)
Redis list detailed explanation of character types yyds dry goods inventory
Helm install Minio
Yu Xia looks at win system kernel -- message mechanism
649. Dota2 Senate
Solution: intelligent site intelligent inspection scheme video monitoring system
指针和数组笔试题解析
你想知道的ArrayList知识都在这
Nacos Development Manual
Common functions for PHP to process strings
Asia Pacific Financial Media | art cube of "designer universe": Guangzhou community designers achieve "great improvement" in urban quality | observation of stable strategy industry fund
The State Economic Information Center "APEC industry +" Western Silicon Valley will invest 2trillion yuan in Chengdu Chongqing economic circle, which will surpass the observation of Shanghai | stable
Introduction to number theory (greatest common divisor, prime sieve, inverse element)
Description of octomap averagenodecolor function
Data governance: data quality