当前位置:网站首页>Tidb ecological tools (backup, migration, import / export) collation

Tidb ecological tools (backup, migration, import / export) collation

2022-07-06 03:11:00 Charlotteck

One 、 Dumpling

Use the data export tool  Dumpling, You can store it in TiDB or MySQL The data in is exported as SQL or CSV Format , Used for logical full backup .Dumpling It also supports exporting data to Amazon S3 in

Two 、 Lightning

TiDB Lightning There are two main usage scenarios :

  • rapid Import A lot of new data .
  • Restore all backup data .

at present ,TiDB Lightning Support :

3、 ... and 、 Data Migration(DM)

TiDB Data Migration (DM) It is a convenient data migration tool , Support from and MySQL Protocol compatible database (MySQL、MariaDB、Aurora MySQL) To TiDB Of Total quantity Data migration and The incremental Data synchronization . Use DM Tools help simplify the data migration process , Reduce the operation and maintenance cost of data migration .

Four 、 Backup & Restore (BR)

BR  Its full name is Backup & Restore, yes TiDB  Distributed backup and recovery Command line tools for , Used to deal with TiDB Cluster for data backup and recovery .
comparison  dumpling,BR More suitable for Large amount of data Scene .

BR In addition to regular backup and recovery , It can also be used for large-scale data migration on the premise of ensuring compatibility .

This paper introduces BR How it works 、 Recommended deployment configuration 、 Use restrictions and several ways of use .

working principle :BR Issue the backup or recovery operation command to each TiKV node .TiKV After receiving the command, perform the corresponding backup or recovery operation .

In a backup or restore , each TiKV Each node will have a corresponding backup path ,TiKV The backup file generated during backup will be saved in this path , During recovery, the corresponding backup files will also be read from this path .

Backup file type

The following two types of files will be generated under the backup path :

  • SST file : Storage TiKV Data information backed up
  • backupmeta  file : Store the meta information of this backup , Including the number of backup files 、 Backup file Key Section 、 Backup file size and backup file Hash (sha256) value
  • backup.lock  file : Used to prevent multiple backups to the same directory

SST File naming format

SST Document to  storeID_regionID_regionEpoch_keyHash_cf  Format naming . The format name is explained as follows :

Usage restriction

Here's how to use BR Several restrictions for backup recovery :

  • BR Back to TiCDC / Drainer Upstream cluster , Recovery data cannot be saved by TiCDC / Drainer Sync to downstream .
  • BR Only in  new_collations_enabled_on_first_bootstrap  Switch value Operate between the same clusters . This is because BR Backup only KV data . If the backup cluster and the recovery cluster adopt different sorting rules , Data verification will fail . So when restoring the cluster , You need to make sure  select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME='new_collation_enabled';  The query result of the switch value of the statement is consistent with the query result at the time of backup , Before you can restore .

5、 ... and 、 Binlog

TiDB Binlog Is a tool for collecting TiDB Of binlog, And provide a commercial tool with quasi real-time backup and synchronization functions .

TiDB Binlog The following function scenarios are supported :

  • Data synchronization : Sync TiDB Cluster data to other databases
  • Real time backup and recovery : Backup TiDB Cluster data , It can also be used for TiDB Recovery in case of cluster failure

TiDB Binlog Clusters are mainly divided into Pump and Drainer Two components , as well as binlogctl Tools :

Pump

Pump  For real-time recording TiDB Produced Binlog, And will Binlog  Sort by transaction commit time , And then to Drainer Consumption .

Drainer

Drainer  From each Pump To collect Binlog To carry on the merge , then Binlog Turn it into SQL  Or data in a specified format , Final synchronization to downstream .

binlogctl Tools

binlogctl  It's a TiDB Binlog Supporting operation and maintenance tools , It has the following functions :

The main features

  • Multiple Pump Form a cluster , It can be expanded horizontally .
  • TiDB Through the built-in Pump Client take Binlog Distribute to each Pump.
  • Pump Responsible for the storage Binlog, And will Binlog Provide to in order Drainer.
  • Drainer Be responsible for reading Pump Of Binlog, Send to downstream after merging and sorting .
  • Drainer Support  relay log  function , adopt relay log Ensure the consistency of downstream clusters .

6、 ... and 、 TiCDC

TiCDC  It's a pull through TiKV The change log implements TiDB Incremental data synchronization tool , It has the ability to restore the data to any upstream location TSO The ability to maintain a consistent state , At the same time provide Open data protocol  (TiCDC Open Protocol), Support other systems to subscribe to data changes .

System roles

  • TiKV CDC Components : Only the output key-value (KV) change log.
    • Internal logic assembly KV change log.
    • Provide output KV change log The interface of , Sending data includes real-time change log And incremental scanning change log.
  • capture:TiCDC Run the process , Multiple  capture  Form a TiCDC colony , be responsible for KV change log Synchronization of .
    • Every  capture  Be responsible for pulling a part KV change log.
    • To pull one or more KV change log Sort .
    • Restore transactions downstream or according to TiCDC Open Protocol For the output .

Introduction to synchronization function

This section introduces TiCDC Sync function .

sink Support

at present TiCDC sink The module supports synchronizing data to the following downstream :

  • MySQL Protocol compatible database , Provide final consistency support .
  • With TiCDC Open Protocol Output to Kafka, It can be realized that the current level is orderly 、 There are three consistency guarantees: final consistency or strict transaction consistency .

Synchronization sequence guarantee and consistency guarantee

Data synchronization sequence

  • TiCDC For all DDL/DML Can be exported At least once .
  • TiCDC stay TiKV/TiCDC During cluster failure, the same message may be sent repeatedly DDL/DML. For repeated DDL/DML:
    • MySQL sink Can be repeated DDL, For downstream reentrant DDL ( for example truncate table) Direct execution succeeded ; For downstream non reentrant DDL( for example create table), Execution failure ,TiCDC Will ignore the error and continue to synchronize .
    • Kafka sink Will send duplicate messages , But repeated messages will not destroy Resolved Ts Constraints , Users can go to Kafka Filter on the consumer side .

Data synchronization consistency

  • MySQL sink
    • TiCDC Do not split single table transactions , Guarantee Atomicity of single table transactions .
    • TiCDC  No guarantee The execution sequence of downstream transactions is exactly the same as that of upstream transactions .
    • TiCDC Split cross table transactions by table , No guarantee Atomicity of cross table transactions .
    • TiCDC  Guarantee The update sequence of a single line is consistent with that of the upstream .
  • Kafka sink
    • TiCDC Provide different data distribution strategies , You can follow the table 、 Primary key or ts Wait for strategies to distribute data to different Kafka partition.
    • Under different distribution strategies consumer The different ways of realizing , Different levels of consistency can be achieved , Including row level order 、 Final consistency or cross table transaction consistency .
    • TiCDC Not provided Kafka On the consumer side , Provided only  TiCDC Open data protocol , Users can realize Kafka Consumer side of data .

Synchronization limit

Use TiCDC When syncing , Please note the following relevant restrictions and scenarios that are not supported .

Requirements for effective indexing

TiCDC There can only be at least one synchronization Valid index Table of , Valid index Is defined as follows :

  • Primary key (PRIMARY KEY) Is a valid index .
  • A unique index that meets the following conditions (UNIQUE INDEX) Is a valid index :
    • Each column in the index is clearly defined as non empty in the table structure (NOT NULL).
    • The virtual build column does not exist in the index (VIRTUAL GENERATED COLUMNS).

TiCDC from 4.0.8 Version start , You can synchronize by modifying the task configuration There is no valid index Table of , However, the guarantee of data consistency has been weakened . For specific usage and precautions, please refer to Synchronize tables without valid indexes .

Temporarily unsupported scenarios

at present TiCDC The currently unsupported scenarios are as follows :

  • It is not supported to use alone RawKV Of TiKV colony .
  • Temporarily not supported in TiDB in establish SEQUENCE Of DDL operation and  SEQUENCE function . Upstream TiDB Use SEQUENCE when ,TiCDC The upstream execution will be ignored SEQUENCE DDL operation / function , But use SEQUENCE Functional DML The operation can be synchronized correctly .
  • Provide partial support for scenarios with large transactions upstream , See  

原网站

版权声明
本文为[Charlotteck]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202132332239992.html