当前位置：网站首页>Tidb ecological tools (backup, migration, import / export) collation

Tidb ecological tools (backup, migration, import / export) collation

2022-07-06 03:11:00 【Charlotteck】

One 、 Dumpling

Use the data export tool Dumpling, You can store it in TiDB or MySQL The data in is exported as SQL or CSV Format , Used for logical full backup .Dumpling It also supports exporting data to Amazon S3 in

Two 、 Lightning

TiDB Lightning There are two main usage scenarios ：

rapid Import A lot of new data .
Restore all backup data .

at present ,TiDB Lightning Support ：

Import Dumpling、CSV or Amazon Aurora Parquet Data source of output format .
From local disk or Amazon S3 Cloud disk Reading data .

3、 ... and 、 Data Migration(DM)

TiDB Data Migration (DM) It is a convenient data migration tool , Support from and MySQL Protocol compatible database （MySQL、MariaDB、Aurora MySQL） To TiDB Of Total quantity Data migration and The incremental Data synchronization . Use DM Tools help simplify the data migration process , Reduce the operation and maintenance cost of data migration .

Four 、 Backup & Restore (BR)

BR Its full name is Backup & Restore, yes TiDB Distributed backup and recovery Command line tools for , Used to deal with TiDB Cluster for data backup and recovery .

comparison dumpling,BR More suitable for Large amount of data Scene .

BR In addition to regular backup and recovery , It can also be used for large-scale data migration on the premise of ensuring compatibility .

This paper introduces BR How it works 、 Recommended deployment configuration 、 Use restrictions and several ways of use .

working principle ：BR Issue the backup or recovery operation command to each TiKV node .TiKV After receiving the command, perform the corresponding backup or recovery operation .

In a backup or restore , each TiKV Each node will have a corresponding backup path ,TiKV The backup file generated during backup will be saved in this path , During recovery, the corresponding backup files will also be read from this path .

Backup file type

The following two types of files will be generated under the backup path ：

SST file ： Storage TiKV Data information backed up
backupmeta file ： Store the meta information of this backup , Including the number of backup files 、 Backup file Key Section 、 Backup file size and backup file Hash (sha256) value
backup.lock file ： Used to prevent multiple backups to the same directory

SST File naming format

SST Document to storeID_regionID_regionEpoch_keyHash_cf Format naming . The format name is explained as follows ：

storeID：TiKV Node number
regionID：Region Number
regionEpoch：Region Version number
keyHash：Range startKey Of Hash (sha256) value , Make sure it's unique
cf：RocksDB Of ColumnFamily（ The default is default or write）
source ： Backup and recovery tools BR brief introduction | PingCAP Docs

Usage restriction

Here's how to use BR Several restrictions for backup recovery ：

BR Back to TiCDC / Drainer Upstream cluster , Recovery data cannot be saved by TiCDC / Drainer Sync to downstream .
BR Only in new_collations_enabled_on_first_bootstrap Switch value Operate between the same clusters . This is because BR Backup only KV data . If the backup cluster and the recovery cluster adopt different sorting rules , Data verification will fail . So when restoring the cluster , You need to make sure select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME='new_collation_enabled'; The query result of the switch value of the statement is consistent with the query result at the time of backup , Before you can restore .

5、 ... and 、 Binlog

TiDB Binlog Is a tool for collecting TiDB Of binlog, And provide a commercial tool with quasi real-time backup and synchronization functions .

TiDB Binlog The following function scenarios are supported ：

Data synchronization ： Sync TiDB Cluster data to other databases
Real time backup and recovery ： Backup TiDB Cluster data , It can also be used for TiDB Recovery in case of cluster failure

TiDB Binlog Clusters are mainly divided into Pump and Drainer Two components , as well as binlogctl Tools ：

Pump

Pump For real-time recording TiDB Produced Binlog, And will Binlog Sort by transaction commit time , And then to Drainer Consumption .

Drainer

Drainer From each Pump To collect Binlog To carry on the merge , then Binlog Turn it into SQL Or data in a specified format , Final synchronization to downstream .

binlogctl Tools

binlogctl It's a TiDB Binlog Supporting operation and maintenance tools , It has the following functions ：

obtain TiDB The cluster's current TSO
see Pump/Drainer state
modify Pump/Drainer state
Pause / Offline Pump/Drainer
source ： TiDB Binlog brief introduction | PingCAP Docs

The main features

Multiple Pump Form a cluster , It can be expanded horizontally .
TiDB Through the built-in Pump Client take Binlog Distribute to each Pump.
Pump Responsible for the storage Binlog, And will Binlog Provide to in order Drainer.
Drainer Be responsible for reading Pump Of Binlog, Send to downstream after merging and sorting .
Drainer Support relay log function , adopt relay log Ensure the consistency of downstream clusters .

6、 ... and 、 TiCDC

TiCDC It's a pull through TiKV The change log implements TiDB Incremental data synchronization tool , It has the ability to restore the data to any upstream location TSO The ability to maintain a consistent state , At the same time provide Open data protocol (TiCDC Open Protocol), Support other systems to subscribe to data changes .

System roles

TiKV CDC Components ： Only the output key-value (KV) change log.
- Internal logic assembly KV change log.
- Provide output KV change log The interface of , Sending data includes real-time change log And incremental scanning change log.
capture：TiCDC Run the process , Multiple capture Form a TiCDC colony , be responsible for KV change log Synchronization of .
- Every capture Be responsible for pulling a part KV change log.
- To pull one or more KV change log Sort .
- Restore transactions downstream or according to TiCDC Open Protocol For the output .

Introduction to synchronization function

This section introduces TiCDC Sync function .

sink Support

at present TiCDC sink The module supports synchronizing data to the following downstream ：

MySQL Protocol compatible database , Provide final consistency support .
With TiCDC Open Protocol Output to Kafka, It can be realized that the current level is orderly 、 There are three consistency guarantees: final consistency or strict transaction consistency .

Synchronization sequence guarantee and consistency guarantee

Data synchronization sequence

TiCDC For all DDL/DML Can be exported At least once .
TiCDC stay TiKV/TiCDC During cluster failure, the same message may be sent repeatedly DDL/DML. For repeated DDL/DML：
- MySQL sink Can be repeated DDL, For downstream reentrant DDL （ for example truncate table） Direct execution succeeded ; For downstream non reentrant DDL（ for example create table）, Execution failure ,TiCDC Will ignore the error and continue to synchronize .
- Kafka sink Will send duplicate messages , But repeated messages will not destroy Resolved Ts Constraints , Users can go to Kafka Filter on the consumer side .

Data synchronization consistency

MySQL sink
- TiCDC Do not split single table transactions , Guarantee Atomicity of single table transactions .
- TiCDC No guarantee The execution sequence of downstream transactions is exactly the same as that of upstream transactions .
- TiCDC Split cross table transactions by table , No guarantee Atomicity of cross table transactions .
- TiCDC Guarantee The update sequence of a single line is consistent with that of the upstream .
Kafka sink
- TiCDC Provide different data distribution strategies , You can follow the table 、 Primary key or ts Wait for strategies to distribute data to different Kafka partition.
- Under different distribution strategies consumer The different ways of realizing , Different levels of consistency can be achieved , Including row level order 、 Final consistency or cross table transaction consistency .
- TiCDC Not provided Kafka On the consumer side , Provided only TiCDC Open data protocol , Users can realize Kafka Consumer side of data .

Synchronization limit

Use TiCDC When syncing , Please note the following relevant restrictions and scenarios that are not supported .

Requirements for effective indexing

TiCDC There can only be at least one synchronization Valid index Table of , Valid index Is defined as follows ：

Primary key (PRIMARY KEY) Is a valid index .
A unique index that meets the following conditions (UNIQUE INDEX) Is a valid index ：
- Each column in the index is clearly defined as non empty in the table structure (NOT NULL).
- The virtual build column does not exist in the index (VIRTUAL GENERATED COLUMNS).

TiCDC from 4.0.8 Version start , You can synchronize by modifying the task configuration There is no valid index Table of , However, the guarantee of data consistency has been weakened . For specific usage and precautions, please refer to Synchronize tables without valid indexes .

Temporarily unsupported scenarios

at present TiCDC The currently unsupported scenarios are as follows ：

It is not supported to use alone RawKV Of TiKV colony .
Temporarily not supported in TiDB in establish SEQUENCE Of DDL operation and SEQUENCE function . Upstream TiDB Use SEQUENCE when ,TiCDC The upstream execution will be ignored SEQUENCE DDL operation / function , But use SEQUENCE Functional DML The operation can be synchronized correctly .
Provide partial support for scenarios with large transactions upstream , See

原网站

版权声明
本文为[Charlotteck]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202132332239992.html