当前位置:网站首页>Migrate data from a tidb cluster to another tidb cluster

Migrate data from a tidb cluster to another tidb cluster

2022-07-06 08:01:00 Tianxiang shop

This document describes how to transform data from a TiDB The cluster migrates to another TiDB. In the following scenario , You can take data from a TiDB The cluster migrates to another TiDB colony :

  • Open library : primary TiDB Cluster volume is too large , Or to avoid the original TiDB The several services carried by the cluster affect each other , The original TiDB Some tables in the cluster are moved to another TiDB colony .
  • Library relocation : It is to migrate the physical location of the database , For example, replace the data center .
  • upgrade : In a scenario with strict requirements for data correctness , Data can be migrated to a higher version TiDB colony , Ensure data security .

This article will simulate the entire migration process , Specifically, it includes the following four steps :

  1. Set up the environment
  2. Migrate full data
  3. Migrating incremental data
  4. Smooth switching business

The first 1 Step : Set up the environment

  1. Deployment cluster .

    Use tiup playground Quickly deploy upstream and downstream test clusters . More deployment information , Please refer to  tiup Official documents .

    # Create an upstream cluster tiup --tag upstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # Create a downstream cluster tiup --tag downstream playground --host 0.0.0.0 --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1 # View the cluster status tiup status

  2. Initialization data .

    In the test cluster, by default test database , So you can use  sysbench  The tool generates test data , Used to simulate historical data in real clusters .

    sysbench oltp_write_only --config-file=./tidb-config --tables=10 --table-size=10000 prepare

    Through here sysbench function oltp_write_only Script , It will be generated in the test database 10 A watch , Each table contains 10000 Row initial data .tidb-config The configuration is as follows :

    mysql-host=172.16.6.122 # It needs to be replaced by the actual upstream cluster ip mysql-port=4000 mysql-user=root mysql-password= db-driver=mysql # Set the database driver to mysql mysql-db=test # Set the test database to test report-interval=10 # Set the interval of periodic statistics as 10 second threads=10 # Set up worker The number of threads is 10 time=0 # Set the total execution time of the script ,0 Means unrestricted rate=100 # Set the average transaction rate tps = 100

  3. Simulate business load .

    During the data migration of the actual production cluster , Usually, the original cluster will also write new business data , In this article, you can use sysbench The tool simulates a continuous write load , The following command will use 10 individual worker In the database sbtest1、sbtest2 and sbtest3 Data is continuously written in the three tables , Its total tps Limit to 100.

    sysbench oltp_write_only --config-file=./tidb-config --tables=3 run

  4. Prepare external storage .

    In full data backup , Both upstream and downstream clusters need to access backup files , Therefore, it is recommended to use External storage Store backup files , In this paper, through the Minio Analog compatibility S3 Storage services for :

    wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio # Configure access minio Of access-key access-screct-id export HOST_IP='172.16.6.122' # Replace with the actual upstream cluster ip export MINIO_ROOT_USER='minio' export MINIO_ROOT_PASSWORD='miniostorage' # Create a data directory , among backup by bucket The name of mkdir -p data/backup # start-up minio, The exposed port is 6060 ./minio server ./data --address :6060 &

    The above command line starts a single node minio server simulation S3 service , The relevant parameters are :

    The corresponding access link is :

    s3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true

The first 2 Step : Migrate full data

After setting up the test environment , have access to  BR  The backup and recovery function of the tool migrates the full amount of data .BR There are many tools Usage mode , Used in this article SQL sentence  BACKUP  and  RESTORE  Backup recovery .

Be careful

When the versions of upstream and downstream clusters are inconsistent , Should check BR The tool Compatibility . This article assumes that the upstream and downstream clusters have the same version .

  1. close GC.

    To ensure that newly written data is not lost during incremental migration , Before starting a backup , You need to turn off garbage collection in the upstream cluster (GC) Mechanism , To ensure that the system no longer cleans up historical data .

    MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+: | @@global.tidb_gc_enable | +-------------------------+ | 0 | +-------------------------+ 1 row in set (0.00 sec)

    Be careful

    In a production cluster , close GC Mechanism and backup operation will reduce the read performance of the cluster to a certain extent , It is recommended to back up during the low peak period , And set the appropriate RATE_LIMIT Limit the impact of backup operations on online business .

  2. The backup data .

    Execute in the upstream cluster BACKUP Statement backup data :

    MySQL [(none)]> BACKUP DATABASE * TO 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true' RATE_LIMIT = 120 MB/SECOND; +---------------+----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +---------------+----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434047157698561 | 2022-02-25 19:57:59 | 2022-02-25 19:57:59 | +---------------+----------+--------------------+---------------------+---------------------+ 1 row in set (2.11 sec)

    After the backup statement is submitted successfully ,TiDB Meta information about the backup data will be returned , Here we need to focus on BackupTS, It means that the data will be backed up before this point in time , In the following tutorial , This article will use BackupTS As Data verification deadline and  TiCDC Start time of incremental scan .

  3. Restore data .

    Execute in the downstream cluster RESTORE Statement to recover data :

    mysql> RESTORE DATABASE * FROM 's3://backup?access-key=minio&secret-access-key=miniostorage&endpoint=http://${HOST_IP}:6060&force-path-style=true'; +--------------+-----------+--------------------+---------------------+---------------------+ | Destination | Size | BackupTS | Queue Time | Execution Time | +--------------+-----------+--------------------+---------------------+---------------------+ | s3://backup | 10315858 | 431434141450371074 | 2022-02-25 20:03:59 | 2022-02-25 20:03:59 | +--------------+-----------+--------------------+---------------------+---------------------+ 1 row in set (41.85 sec)

  4. ( Optional ) Check the data .

    adopt  sync-diff-inspector  Tools , It can verify the consistency of upstream and downstream data at a certain time point . You can see from the output of the above backup and restore commands , The time point of upstream cluster backup is 431434047157698561, The time point for the downstream cluster to complete data recovery is 431434141450371074.

    sync_diff_inspector -C ./config.yaml

    About sync-diff-inspector Configuration method of , Please refer to Profile description , In this paper , The corresponding configuration is :

    # Diff Configuration. ######################### Datasource config ######################### [data-sources] [data-sources.upstream] host = "172.16.6.122" # It needs to be replaced by the actual upstream cluster ip port = 4000 user = "root" password = "" snapshot = "431434047157698561" # Configure as the actual backup point in time ( See 「 Backup 」 Section of the BackupTS) [data-sources.downstream] host = "172.16.6.125" # It needs to be replaced by the actual downstream cluster ip port = 4000 user = "root" password = "" ######################### Task config ######################### [task] output-dir = "./output" source-instances = ["upstream"] target-instance = "downstream" target-check-tables = ["*.*"]

The first 3 Step : Migrating incremental data

  1. Deploy TiCDC.

    After full data migration , You can deploy and configure TiCDC The cluster synchronizes incremental data , Please refer to  TiCDC Deploy . When creating a test cluster , We've started a TiCDC node , Therefore, it can be carried out directly changefeed Configuration of .

  2. Create synchronization tasks .

    In the upstream cluster , Execute the following command to create a synchronization link from the upstream to the downstream cluster :

    tiup cdc cli changefeed create --pd=http://172.16.6.122:2379 --sink-uri="mysql://root:@172.16.6.125:4000" --changefeed-id="upstream-to-downstream" --start-ts="431434047157698561"

    Among the above orders :

    • --pd: The address of the actual upstream cluster
    • --sink-uri: Address downstream of synchronization task
    • --changefeed-id: Syncing tasks ID, The format needs to conform to regular expressions ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
    • --start-ts:TiCDC Starting point of synchronization , It needs to be set as the actual backup time point ( That is the second chapter 「 Backup 」 In this section BackupTS)

    More about changefeed Configuration of , Please refer to Synchronization task profile description .

  3. Re open GC.

    TiCDC Can guarantee GC Only the synchronized historical data is recycled . therefore , Create a cluster from upstream to downstream changefeed after , You can execute the following commands to restore the garbage collection function of the cluster . Please refer to  TiCDC GC safepoint The complete behavior of .

    MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE; Query OK, 0 rows affected (0.01 sec) MySQL [test]> SELECT @@global.tidb_gc_enable; +-------------------------+ | @@global.tidb_gc_enable | +-------------------------+ | 1 | +-------------------------+ 1 row in set (0.00 sec)

The first 4 Step : Smooth switching business

adopt TiCDC After the upstream and downstream synchronization links are created , The write data of the original cluster will be synchronized to the new cluster with a very low delay , At this point, you can gradually migrate the read traffic to the new cluster . Observe for a while , If the new cluster performs stably , You can connect the write traffic to the new cluster , There are three main steps :

  1. Stop the writing business of the upstream cluster . Confirm that the upstream data has been synchronized to the downstream , Stop upstream to downstream clusters changefeed.

    # Stop the old cluster to the new cluster changefeed tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379 # see changefeed state tiup cdc cli changefeed list [ { "id": "upstream-to-downstream", "summary": { "state": "stopped", # It needs to be confirmed that the status here is stopped "tso": 431747241184329729, "checkpoint": "2022-03-11 15:50:20.387", # Make sure that the time here is later than the stop time "error": null } } ]

  2. Create downstream to upstream clusters changefeed. At this time, the upstream and downstream data are consistent , And no new data is written , Therefore, you can not specify start-ts, The default is the current time :

    tiup cdc cli changefeed create --pd=http://172.16.6.125:2379 --sink-uri="mysql://root:@172.16.6.122:4000" --changefeed-id="downstream -to-upstream"

  3. Migrate the write business to the downstream cluster , After a period of observation , Wait for the new cluster to perform stably , You can discard the original cluster .

原网站

版权声明
本文为[Tianxiang shop]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060758157350.html