当前位置:网站首页>Tidb ecological tools (backup, migration, import / export) collation
Tidb ecological tools (backup, migration, import / export) collation
2022-07-06 03:11:00 【Charlotteck】
One 、 Dumpling
Two 、 Lightning
TiDB Lightning There are two main usage scenarios :
- rapid Import A lot of new data .
- Restore all backup data .
at present ,TiDB Lightning Support :
- Import Dumpling、CSV or Amazon Aurora Parquet Data source of output format .
- From local disk or Amazon S3 Cloud disk Reading data .
3、 ... and 、 Data Migration(DM)
Four 、 Backup & Restore (BR)
BR In addition to regular backup and recovery , It can also be used for large-scale data migration on the premise of ensuring compatibility .
This paper introduces BR How it works 、 Recommended deployment configuration 、 Use restrictions and several ways of use .
working principle :BR Issue the backup or recovery operation command to each TiKV node .TiKV After receiving the command, perform the corresponding backup or recovery operation .
In a backup or restore , each TiKV Each node will have a corresponding backup path ,TiKV The backup file generated during backup will be saved in this path , During recovery, the corresponding backup files will also be read from this path .
Backup file type
The following two types of files will be generated under the backup path :
- SST file : Storage TiKV Data information backed up
backupmeta
file : Store the meta information of this backup , Including the number of backup files 、 Backup file Key Section 、 Backup file size and backup file Hash (sha256) valuebackup.lock
file : Used to prevent multiple backups to the same directory
SST File naming format
SST Document to storeID_regionID_regionEpoch_keyHash_cf
Format naming . The format name is explained as follows :
- storeID:TiKV Node number
- regionID:Region Number
- regionEpoch:Region Version number
- keyHash:Range startKey Of Hash (sha256) value , Make sure it's unique
- cf:RocksDB Of ColumnFamily( The default is
default
orwrite
)
Usage restriction
Here's how to use BR Several restrictions for backup recovery :
- BR Back to TiCDC / Drainer Upstream cluster , Recovery data cannot be saved by TiCDC / Drainer Sync to downstream .
- BR Only in
new_collations_enabled_on_first_bootstrap
Switch value Operate between the same clusters . This is because BR Backup only KV data . If the backup cluster and the recovery cluster adopt different sorting rules , Data verification will fail . So when restoring the cluster , You need to make sureselect VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME='new_collation_enabled';
The query result of the switch value of the statement is consistent with the query result at the time of backup , Before you can restore .
5、 ... and 、 Binlog
TiDB Binlog Is a tool for collecting TiDB Of binlog, And provide a commercial tool with quasi real-time backup and synchronization functions .
TiDB Binlog The following function scenarios are supported :
- Data synchronization : Sync TiDB Cluster data to other databases
- Real time backup and recovery : Backup TiDB Cluster data , It can also be used for TiDB Recovery in case of cluster failure
TiDB Binlog Clusters are mainly divided into Pump and Drainer Two components , as well as binlogctl Tools :
Pump
Pump For real-time recording TiDB Produced Binlog, And will Binlog Sort by transaction commit time , And then to Drainer Consumption .
Drainer
Drainer From each Pump To collect Binlog To carry on the merge , then Binlog Turn it into SQL Or data in a specified format , Final synchronization to downstream .
binlogctl Tools
binlogctl It's a TiDB Binlog Supporting operation and maintenance tools , It has the following functions :
- obtain TiDB The cluster's current TSO
- see Pump/Drainer state
- modify Pump/Drainer state
- Pause / Offline Pump/Drainer
The main features
- Multiple Pump Form a cluster , It can be expanded horizontally .
- TiDB Through the built-in Pump Client take Binlog Distribute to each Pump.
- Pump Responsible for the storage Binlog, And will Binlog Provide to in order Drainer.
- Drainer Be responsible for reading Pump Of Binlog, Send to downstream after merging and sorting .
- Drainer Support relay log function , adopt relay log Ensure the consistency of downstream clusters .
6、 ... and 、 TiCDC
TiCDC It's a pull through TiKV The change log implements TiDB Incremental data synchronization tool , It has the ability to restore the data to any upstream location TSO The ability to maintain a consistent state , At the same time provide Open data protocol (TiCDC Open Protocol), Support other systems to subscribe to data changes .
System roles
- TiKV CDC Components : Only the output key-value (KV) change log.
- Internal logic assembly KV change log.
- Provide output KV change log The interface of , Sending data includes real-time change log And incremental scanning change log.
capture
:TiCDC Run the process , Multiplecapture
Form a TiCDC colony , be responsible for KV change log Synchronization of .- Every
capture
Be responsible for pulling a part KV change log. - To pull one or more KV change log Sort .
- Restore transactions downstream or according to TiCDC Open Protocol For the output .
- Every
Introduction to synchronization function
This section introduces TiCDC Sync function .
sink Support
at present TiCDC sink The module supports synchronizing data to the following downstream :
- MySQL Protocol compatible database , Provide final consistency support .
- With TiCDC Open Protocol Output to Kafka, It can be realized that the current level is orderly 、 There are three consistency guarantees: final consistency or strict transaction consistency .
Synchronization sequence guarantee and consistency guarantee
Data synchronization sequence
- TiCDC For all DDL/DML Can be exported At least once .
- TiCDC stay TiKV/TiCDC During cluster failure, the same message may be sent repeatedly DDL/DML. For repeated DDL/DML:
- MySQL sink Can be repeated DDL, For downstream reentrant DDL ( for example truncate table) Direct execution succeeded ; For downstream non reentrant DDL( for example create table), Execution failure ,TiCDC Will ignore the error and continue to synchronize .
- Kafka sink Will send duplicate messages , But repeated messages will not destroy Resolved Ts Constraints , Users can go to Kafka Filter on the consumer side .
Data synchronization consistency
- MySQL sink
- TiCDC Do not split single table transactions , Guarantee Atomicity of single table transactions .
- TiCDC No guarantee The execution sequence of downstream transactions is exactly the same as that of upstream transactions .
- TiCDC Split cross table transactions by table , No guarantee Atomicity of cross table transactions .
- TiCDC Guarantee The update sequence of a single line is consistent with that of the upstream .
- Kafka sink
- TiCDC Provide different data distribution strategies , You can follow the table 、 Primary key or ts Wait for strategies to distribute data to different Kafka partition.
- Under different distribution strategies consumer The different ways of realizing , Different levels of consistency can be achieved , Including row level order 、 Final consistency or cross table transaction consistency .
- TiCDC Not provided Kafka On the consumer side , Provided only TiCDC Open data protocol , Users can realize Kafka Consumer side of data .
Synchronization limit
Use TiCDC When syncing , Please note the following relevant restrictions and scenarios that are not supported .
Requirements for effective indexing
TiCDC There can only be at least one synchronization Valid index Table of , Valid index Is defined as follows :
- Primary key (
PRIMARY KEY
) Is a valid index . - A unique index that meets the following conditions (
UNIQUE INDEX
) Is a valid index :- Each column in the index is clearly defined as non empty in the table structure (
NOT NULL
). - The virtual build column does not exist in the index (
VIRTUAL GENERATED COLUMNS
).
- Each column in the index is clearly defined as non empty in the table structure (
TiCDC from 4.0.8 Version start , You can synchronize by modifying the task configuration There is no valid index Table of , However, the guarantee of data consistency has been weakened . For specific usage and precautions, please refer to Synchronize tables without valid indexes .
Temporarily unsupported scenarios
at present TiCDC The currently unsupported scenarios are as follows :
- It is not supported to use alone RawKV Of TiKV colony .
- Temporarily not supported in TiDB in establish SEQUENCE Of DDL operation and SEQUENCE function . Upstream TiDB Use SEQUENCE when ,TiCDC The upstream execution will be ignored SEQUENCE DDL operation / function , But use SEQUENCE Functional DML The operation can be synchronized correctly .
- Provide partial support for scenarios with large transactions upstream , See
边栏推荐
- [padding] an error is reported in the prediction after loading the model weight attributeerror: 'model' object has no attribute '_ place‘
- Audio-AudioRecord Binder通信机制
- Crazy, thousands of netizens are exploding the company's salary
- PMP practice once a day | don't get lost in the exam -7.5
- Prototype design
- Distributed service framework dobbo
- My C language learning records (blue bridge) -- files and file input and output
- Audio audiorecord binder communication mechanism
- [ruoyi] set theme style
- Sign SSL certificate as Ca
猜你喜欢
XSS challenges绕过防护策略进行 XSS 注入
Recommended foreign websites for programmers to learn
Software design principles
ERA5再分析资料下载攻略
有没有完全自主的国产化数据库技术
微服务注册与发现
Derivation of anti Park transform and anti Clarke transform formulas for motor control
Is there a completely independent localization database technology
Linear programming matlab
Analyze menu analysis
随机推荐
1003 emergency (25 points), "DIJ deformation"
Solution: attributeerror: 'STR' object has no attribute 'decode‘
Redis cache breakdown, cache penetration, cache avalanche
2022工作中遇到的问题四
Jenkins basic knowledge ----- detailed explanation of 03pipeline code
【Unity3D】GUI控件
CobaltStrike-4.4-K8修改版安装使用教程
My C language learning records (blue bridge) -- files and file input and output
2.13 simulation summary
MySQL advanced notes
Leetcode problem solving -- 173 Binary search tree iterator
不赚钱的科大讯飞,投资价值该怎么看?
[network security interview question] - how to penetrate the test file directory through
What is the investment value of iFLYTEK, which does not make money?
JS regular filtering and adding image prefixes in rich text
[unity3d] GUI control
NR modulation 1
[pointer training - eight questions]
Selenium share
Crazy, thousands of netizens are exploding the company's salary