当前位置:网站首页>Tidb ecological tools (backup, migration, import / export) collation
Tidb ecological tools (backup, migration, import / export) collation
2022-07-06 03:11:00 【Charlotteck】
One 、 Dumpling

Two 、 Lightning
TiDB Lightning There are two main usage scenarios :
- rapid Import A lot of new data .
- Restore all backup data .
at present ,TiDB Lightning Support :
- Import Dumpling、CSV or Amazon Aurora Parquet Data source of output format .
- From local disk or Amazon S3 Cloud disk Reading data .

3、 ... and 、 Data Migration(DM)

Four 、 Backup & Restore (BR)
BR In addition to regular backup and recovery , It can also be used for large-scale data migration on the premise of ensuring compatibility .
This paper introduces BR How it works 、 Recommended deployment configuration 、 Use restrictions and several ways of use .
working principle :BR Issue the backup or recovery operation command to each TiKV node .TiKV After receiving the command, perform the corresponding backup or recovery operation .
In a backup or restore , each TiKV Each node will have a corresponding backup path ,TiKV The backup file generated during backup will be saved in this path , During recovery, the corresponding backup files will also be read from this path .

Backup file type
The following two types of files will be generated under the backup path :
- SST file : Storage TiKV Data information backed up
backupmetafile : Store the meta information of this backup , Including the number of backup files 、 Backup file Key Section 、 Backup file size and backup file Hash (sha256) valuebackup.lockfile : Used to prevent multiple backups to the same directory
SST File naming format
SST Document to storeID_regionID_regionEpoch_keyHash_cf Format naming . The format name is explained as follows :
- storeID:TiKV Node number
- regionID:Region Number
- regionEpoch:Region Version number
- keyHash:Range startKey Of Hash (sha256) value , Make sure it's unique
- cf:RocksDB Of ColumnFamily( The default is
defaultorwrite)
Usage restriction
Here's how to use BR Several restrictions for backup recovery :
- BR Back to TiCDC / Drainer Upstream cluster , Recovery data cannot be saved by TiCDC / Drainer Sync to downstream .
- BR Only in
new_collations_enabled_on_first_bootstrapSwitch value Operate between the same clusters . This is because BR Backup only KV data . If the backup cluster and the recovery cluster adopt different sorting rules , Data verification will fail . So when restoring the cluster , You need to make sureselect VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME='new_collation_enabled';The query result of the switch value of the statement is consistent with the query result at the time of backup , Before you can restore .
5、 ... and 、 Binlog
TiDB Binlog Is a tool for collecting TiDB Of binlog, And provide a commercial tool with quasi real-time backup and synchronization functions .
TiDB Binlog The following function scenarios are supported :
- Data synchronization : Sync TiDB Cluster data to other databases
- Real time backup and recovery : Backup TiDB Cluster data , It can also be used for TiDB Recovery in case of cluster failure

TiDB Binlog Clusters are mainly divided into Pump and Drainer Two components , as well as binlogctl Tools :
Pump
Pump For real-time recording TiDB Produced Binlog, And will Binlog Sort by transaction commit time , And then to Drainer Consumption .
Drainer
Drainer From each Pump To collect Binlog To carry on the merge , then Binlog Turn it into SQL Or data in a specified format , Final synchronization to downstream .
binlogctl Tools
binlogctl It's a TiDB Binlog Supporting operation and maintenance tools , It has the following functions :
- obtain TiDB The cluster's current TSO
- see Pump/Drainer state
- modify Pump/Drainer state
- Pause / Offline Pump/Drainer
The main features
- Multiple Pump Form a cluster , It can be expanded horizontally .
- TiDB Through the built-in Pump Client take Binlog Distribute to each Pump.
- Pump Responsible for the storage Binlog, And will Binlog Provide to in order Drainer.
- Drainer Be responsible for reading Pump Of Binlog, Send to downstream after merging and sorting .
- Drainer Support relay log function , adopt relay log Ensure the consistency of downstream clusters .
6、 ... and 、 TiCDC
TiCDC It's a pull through TiKV The change log implements TiDB Incremental data synchronization tool , It has the ability to restore the data to any upstream location TSO The ability to maintain a consistent state , At the same time provide Open data protocol (TiCDC Open Protocol), Support other systems to subscribe to data changes .

System roles
- TiKV CDC Components : Only the output key-value (KV) change log.
- Internal logic assembly KV change log.
- Provide output KV change log The interface of , Sending data includes real-time change log And incremental scanning change log.
capture:TiCDC Run the process , MultiplecaptureForm a TiCDC colony , be responsible for KV change log Synchronization of .- Every
captureBe responsible for pulling a part KV change log. - To pull one or more KV change log Sort .
- Restore transactions downstream or according to TiCDC Open Protocol For the output .
- Every
Introduction to synchronization function
This section introduces TiCDC Sync function .
sink Support
at present TiCDC sink The module supports synchronizing data to the following downstream :
- MySQL Protocol compatible database , Provide final consistency support .
- With TiCDC Open Protocol Output to Kafka, It can be realized that the current level is orderly 、 There are three consistency guarantees: final consistency or strict transaction consistency .
Synchronization sequence guarantee and consistency guarantee
Data synchronization sequence
- TiCDC For all DDL/DML Can be exported At least once .
- TiCDC stay TiKV/TiCDC During cluster failure, the same message may be sent repeatedly DDL/DML. For repeated DDL/DML:
- MySQL sink Can be repeated DDL, For downstream reentrant DDL ( for example truncate table) Direct execution succeeded ; For downstream non reentrant DDL( for example create table), Execution failure ,TiCDC Will ignore the error and continue to synchronize .
- Kafka sink Will send duplicate messages , But repeated messages will not destroy Resolved Ts Constraints , Users can go to Kafka Filter on the consumer side .
Data synchronization consistency
- MySQL sink
- TiCDC Do not split single table transactions , Guarantee Atomicity of single table transactions .
- TiCDC No guarantee The execution sequence of downstream transactions is exactly the same as that of upstream transactions .
- TiCDC Split cross table transactions by table , No guarantee Atomicity of cross table transactions .
- TiCDC Guarantee The update sequence of a single line is consistent with that of the upstream .
- Kafka sink
- TiCDC Provide different data distribution strategies , You can follow the table 、 Primary key or ts Wait for strategies to distribute data to different Kafka partition.
- Under different distribution strategies consumer The different ways of realizing , Different levels of consistency can be achieved , Including row level order 、 Final consistency or cross table transaction consistency .
- TiCDC Not provided Kafka On the consumer side , Provided only TiCDC Open data protocol , Users can realize Kafka Consumer side of data .
Synchronization limit
Use TiCDC When syncing , Please note the following relevant restrictions and scenarios that are not supported .
Requirements for effective indexing
TiCDC There can only be at least one synchronization Valid index Table of , Valid index Is defined as follows :
- Primary key (
PRIMARY KEY) Is a valid index . - A unique index that meets the following conditions (
UNIQUE INDEX) Is a valid index :- Each column in the index is clearly defined as non empty in the table structure (
NOT NULL). - The virtual build column does not exist in the index (
VIRTUAL GENERATED COLUMNS).
- Each column in the index is clearly defined as non empty in the table structure (
TiCDC from 4.0.8 Version start , You can synchronize by modifying the task configuration There is no valid index Table of , However, the guarantee of data consistency has been weakened . For specific usage and precautions, please refer to Synchronize tables without valid indexes .
Temporarily unsupported scenarios
at present TiCDC The currently unsupported scenarios are as follows :
- It is not supported to use alone RawKV Of TiKV colony .
- Temporarily not supported in TiDB in establish SEQUENCE Of DDL operation and SEQUENCE function . Upstream TiDB Use SEQUENCE when ,TiCDC The upstream execution will be ignored SEQUENCE DDL operation / function , But use SEQUENCE Functional DML The operation can be synchronized correctly .
- Provide partial support for scenarios with large transactions upstream , See
边栏推荐
- Linear programming matlab
- How to accurately identify master data?
- Technology sharing | what if Undo is too big
- PMP practice once a day | don't get lost in the exam -7.5
- Linear regression and logistic regression
- The real machine cannot access the shooting range of the virtual machine, and the real machine cannot Ping the virtual machine
- [ruoyi] enable Mini navigation bar
- tcpdump: no suitable device found
- ERA5再分析资料下载攻略
- jsscript
猜你喜欢

有没有完全自主的国产化数据库技术

Taobao focus map layout practice
![Buuctf question brushing notes - [geek challenge 2019] easysql 1](/img/37/c38a933ce7fa5d2b8fa597965ffcb2.png)
Buuctf question brushing notes - [geek challenge 2019] easysql 1

codeforces每日5題(均1700)-第六天

MySQL advanced notes

mysqldump数据备份

Introduction to robotframework (I) brief introduction and use

Performance test method of bank core business system

Apt installation ZABBIX

The real machine cannot access the shooting range of the virtual machine, and the real machine cannot Ping the virtual machine
随机推荐
原型图设计
手写数据库客户端
[ruoyi] set theme style
Pat 1046 shortest distance (20 points) simulation
Apt installation ZABBIX
【Unity3D】GUI控件
SD卡报错“error -110 whilst initialising SD card
多态day02
How to do function test well
Summary of Bible story reading
My C language learning record (blue bridge) -- on the pointer
真机无法访问虚拟机的靶场,真机无法ping通虚拟机
IPv6 jobs
Pat 1084 broken keyboard (20 points) string find
深度解析指针与数组笔试题
Installation and use tutorial of cobaltstrike-4.4-k8 modified version
jsscript
【Kubernetes 系列】一文学会Kubernetes Service安全的暴露应用
1003 emergency (25 points), "DIJ deformation"
如何做好功能测试