当前位置:网站首页>Sorter evolution of ticdc 6.0 principle
Sorter evolution of ticdc 6.0 principle
2022-07-05 14:04:00 【Tidb community dry goods portal】
author : eastfisher The source of the original :https://tidb.net/blog/2c4c2a44
The background
TiCDC Is a TiDB Incremental data synchronization tool , By pulling upstream TiKV Data change log , TiCDC The data can be parsed into ordered row level change data and output to the downstream . TiCDC Typical application scenarios of include database disaster recovery , Data integration, etc .
TiCDC Handle TiDB Incremental data synchronization , Need to go through CDCKVClient Pull TiKV Change Log, Sorter Data sorting , Mounter After the message format conversion, it passes Sink The process of sending to downstream data sources . Among them Sorter Modules play an important role in ensuring the order of messages .
Processing flow
TiCDC Of CDC The logical unit of the task is Changefeed, The user can go through cdc cli perhaps OpenAPI towards TiCDC Submit Changefeed Mission , TiCDC In the cluster Owner Will handle yes Changefeed Task analysis , Disassemble it into TablePipeline Give it to each Proessor Handle . Processor The interior will be first by Puller By connecting to TiKV Clustered CDC Client Pull KV Change Log (RawKVEntry) And according to OpType Simply convert to PolymorphicEvent, hand Sorter Interface to sort , After the sorting is completed, the Mounter Parse the message , And then to Sink Send to downstream data sources .
Sorter The sorting implementation logic of is encapsulated in EventSorter
Interface .
type EventSorter interface { Run(ctx context.Context) error // Input side , For upstream Actor ( That is to say Puller) call , Disorder CDC Put data in Sorter AddEntry(ctx context.Context, entry *model.PolymorphicEvent) TryAddEntry(ctx context.Context, entry *model.PolymorphicEvent) (bool, error) // Output side , Get ordered CDC data Output() <-chan *model.PolymorphicEvent}
Sorter Module evolution
TiCDC Of Sorter The module has experienced many evolutions , From the original memory based Memory Sorter, Then develop to file based Unified Sorter, Eventually evolved into the present v6.0 Version based on Key-Value Stored DB Sorter.
Memory Sorter
Memory Sorter With two go slice Change unsorted data events and Resolved Events are cached in memory . If you encounter Resolved event , Then it is initiated asynchronously Sort
and Merge
operation .
Sorting operation uses go Standard library sort Fast sorting algorithm in , The collation is defined in ComparePolymorphicEvents
Function , Sort in the following order :
Commited / Resolved TS The smaller ones are in front
Commited / Resolved TS identical , be :
Resolved Events come last
Start TS The smaller ones are in front
Start TS identical , DELETE Events are listed in PUT Before the event
After sorting , from resolvedTsGroup Take the last one as maxResolvedTs, Then start execution merge operation . Merge and sort the events arranged last time and the events arranged this time , If the Commited / Resolved TS Less than maxResolvedTs, Then it is directly sent to the downstream , The remaining events are re cached in memory , Wait for the next Resolved TS The arrival of the event .
because Memory Sorter Use memory entirely to store events waiting to be sorted , When a large amount of data is written upstream , At this time, if the downstream write speed is slow , Lead to Memory Sorter Of Output When messages accumulate in the link , It will cause the data to be in Memory Sorter Accumulated in memory , And in the absence of Back Pressure In the case of mechanisms , Easy to trigger OOM. Besides , TiCDC If there are a large number of incremental scanning links unresolved Data is piled up in Memory Sorter, It is also easy to cause OOM. On the other hand , Memory Sorter yes table Grade , Every Changefeed Each of the TablePipeline I need to create one of them Sorter example , and Sorter Many more will be opened inside goroutine Sort , When there are many tables , goroutine The number will also multiply , to Go Runtime Scheduling brings pressure .
Unified Sorter
Unified Sorter Appearance , To a certain extent solved Memory Sorter The problem of . The Sorter go by the name of Unified
The main reason for this is that the resources required for event sequencing will be managed at the global level , and Memory Sorter The granularity of resources is table Grade .
Unified Sorter At initialization , Will open multiple heapSorter example ( Default 4 individual ), And register to the global heapSorterPool in . Unified Sorter After receiving the... Sent upstream PolymorphicEvent After the event , Different distribution strategies will be implemented according to the message type . about Resolved Type of event , Unified Sorter The event will be broadcast to all heapSorter In the example . And yes DELETE / PUT event , Will take round-robin The policy routes the message to the corresponding heapSorter example .
heapSorter Examples rely on internal heap Sort Events ( Collation and Memory Sorter identical ), When you meet Resolved Event or heap When memory exceeds the threshold , Will execute once flush operation , The whole heap dump come out . flush The operation is performed by the global singleton backEndPool Unified management of storage resources , And by the global singleton heapSorterIOPool Unified management of computing resources .
backEndPool Provides memory based memoryBackEnd And file system based fileBackEnd Two storage implementations , When there is enough memory space , priority of use memoryBackEnd, When memory space is insufficient , A new file will be created , Use this file as fileBackEnd Write ordered event messages . The format of the filename is : ${ Specify the pathname }/sort-pid-${counter}.tmp
, Such as /data/sort-10501-1.tmp
. After writing , Will flushTask Sent to the Merger Waiting for further processing .
After this step of operation , The event passes through memory heap Heap sort for , Then swipe it out to memory or file , Form a static one by one heap ( There is no use of persistence heap To describe ). stay merge Stage , Merger Will create another memory heap, Currently valid flushTask After multiway merging and sorting , Send the event message Output To downstream .
comparison Memory Sorter, Unified Sorter It solves the problem that all sorting events are cached in memory , May cause OOM The problem of , But there are still computing resources and Table The problem that quantity is linear , Another problem is introduced , namely fileBackEnd There is a linear relationship between the number of documents and the number of tables , When there are many synchronization tables , There will be too many open files
The problem of .
DB Sorter
DB Sorter stay v6.0 The version is not enabled by default , stay v6.1 Version is enabled by default , The parameter names of related configuration items are enable-db-sorter . DB Sorter The bottom layer is based on LSM Tree Of Key-Value Realization PeppleDB, And abstracted something similar Level DB The interface of , Include DB, Batch, Iterator this 3 Interface db.go, It is convenient to replace the implementation or test in the future . Several core operations include Put, Delete, Iterator, Compact etc. .
DB Sorter Adopt the new Actor frame , Execute the whole data sorting process in an event driven manner . About Actor More design of the framework can be found by reading actor doc Get to know .
DB Sorter It consists of the following core modules :
- Sorter: Realization EventSorter Interface , As the connection TablePipeline And Sorter Actor The bridge , yes Actor Entrance ; Put the event Output To downstream , It's also Actor The export of .
- Writer: analysis PolymorphicEvent, Conduct key After unified coding, send it to DBActor.
- DBActor: Put the bottom layer DB The interface is encapsulated as Actor, Execute in an event driven manner KV Read and write operations . The quantity can be specified by configuration , Default 16.
- Reader: Read the ordered event messages , Output To downstream , And send the information of these events from DB Delete in .
- CompactActor: Put the bottom layer DB Interface Compact The operation is encapsulated as Actor, And by the CompactScheduler Unified scheduling .
In the above modules , Sorter, Writer, Reader Each table corresponds to 1 individual , DBActor, CompactActor Is the fixed number specified by the configuration , Default 16 individual .
And Unified Sorter similar , DB Sorter It is also the only single case in the whole , System When it starts , Will be created by default 16 individual DB Instance and corresponding Compactor. take N Tabular CDC Event messages are mapped to M individual DB On , also DB Only reading and writing are supported Key-Value data , So you need to Key Code to do a certain design . DB Sorter Of Key The encoding format is :
Use this Key The coding method is closely related to the event sorting rules mentioned earlier , Commited / Resolved TS At the front , Start TS secondly , Finally, the event type . Besides , because DBActor Not every watch is exclusive , Therefore, you also need to divide one for each table namespace, Key Coded unique ID and table ID It only determines the current DBActor The table in corresponds to namespace.
The whole sorting process is similar to Unified Sorter Similar but slightly different , The main difference is that , DB Sorter All event messages of the same table will be routed to the same DB For instance , In this way, it is no longer necessary to Output Before the multi merge sort .
DB Sorter It solves the problem that the use of sorting resources is linear with the number of tables, resulting in large resource occupation , The problem of low resource utilization , Official performance tests have verified that 100000 meters can operate stably when synchronized to the downstream . But at the moment DB Sorter Did not like Unified Sorter Use memory cache , This causes the synchronization delay to increase by milliseconds . I believe it can be used in the future Unified Sorter A similar implementation mechanism solves this problem .
Reference material
TiDB 6.0 Book Rush Article idea guide
TiCDC Series sharing -02- Analyze the synchronization model and basic architecture
边栏推荐
- [buuctf.reverse] 152-154
- Request + BS4 crawl Netease cloud music popular comments
- Leetcode array question brushing notes
- Comparison of several distributed databases
- 国富氢能冲刺科创板:拟募资20亿 应收账款3.6亿超营收
- Why do I support bat to dismantle "AI research institute"
- 2022 driller (drilling) examination question bank and simulation examination
- Etcd database source code analysis -- rawnode simple package
- Matlab learning 2022.7.4
- matlab学习2022.7.4
猜你喜欢
Set up a website with a sense of ceremony, and post it to the public 2/2 through the intranet
Comparison of several distributed databases
如何将 DevSecOps 引入企业?
如何深入理解“有限状态机”的设计思想?
Recommendation number | what are interesting people looking at?
TiFlash 面向编译器的自动向量化加速
-Web direction attack and defense world
PHP basic syntax
Convolutional Neural Networks简述
让秒杀狂欢更从容:大促背后的数据库(下篇)
随机推荐
What is the future development trend of neural network Internet of things
Primary code audit [no dolls (modification)] assessment
2022 machine fitter (Advanced) test question simulation test question bank simulation test platform operation
Require, require in PHP_ once、include、include_ Detailed explanation of the efficiency of repeated introduction of once class library
Kunlun Taike rushes to the scientific innovation board: the annual revenue is 130million, and it plans to raise 500million. CETC Taiji holds 40% of the shares
js 从一个数组对象中取key 和value组成一个新的对象
Rk3566 add LED
matlab学习2022.7.4
Financial one account Hong Kong listed: market value of 6.3 billion HK $Ye wangchun said to be Keeping true and true, long - term work
物联网应用技术专业是属于什么类
Detailed explanation of IP address and preparation of DOS basic commands and batch processing
Attack and defense world crypto WP
Embedded software architecture design - message interaction
鸿蒙第四次培训
【云资源】云资源安全管理用什么软件好?为什么?
PHP generate Poster
最简单不用证书也可以多开功能的方式
Anchor navigation demo
汇编语言 assembly language
LeetCode_2(两数相加)