当前位置:网站首页>Chunjun supports DDL conversion and automatic execution of heterogeneous data sources - dtmo 02 review (including course playback + courseware)
Chunjun supports DDL conversion and automatic execution of heterogeneous data sources - dtmo 02 review (including course playback + courseware)
2022-07-27 10:52:00 【Several stacks of dtinsight】

Reading guide :
4 month 26 Friday night ,ChunJun The core members of the project 、 Kangaroo cloud stack big data engine development expert Dujie brings you sharing 《ChunJun Support heterogeneous data sources DDL Conversion and automatic execution 》, We will finish the essence of live broadcast. , Let's review the content again , Deepen understanding of technical details .
You can see
Introduction to data restore
DDL Automatic transformation architecture design
Calcite analysis DDL actual combat
Get live courseware :
Official account “ Several stack Study Club ”, Backstage private message “ChunJun01” Get live courseware
Live video review :
Click on “ Read the original ”, Watch a great video
https://www.bilibili.com/video/BV1eR4y1P7AH?spm_id_from=333.999.0.0
speech / Cross robbery
Arrangement / Hua Xia

Introduction to data restore
ChunJun Real time synchronization support mysql oracle postgresql sqlserver And other data sources are synchronized in real time , But the data after synchronization is output in the form of log , On this basis, data restoration can make the changes of source data change correspondingly in the target table , contain DML as well as DDL The corresponding operations will be performed in the target table , Ensure that the source table and target table schema Agreement The data is consistent .

at present ChunJun Data restore already supports mysql To rdb Data restore of type data source , Only support DML Reduction ,DDL The next version of automatic execution supports .
Real time restore adds two main modules :
- Mapping between source table and target table (database table column The mapping of information )
- Interact with the outside , complete DDL Status update ,DML Data redistribution
In order to complete logical decoupling , We added 2 individual flatMap The operator completes the above operations :
- NameMappingFlatMap The data information is correspondingly replaced according to the mapping relationship
- RestorationFlatMap Process the data , Block and issue data DDL State monitoring
flatMap Operator introduction
Next, let's introduce two operators
01
NameMappingFlatMap
Restore defaults in real time source End schema table column Is and sink End consistent , But in most cases source and sink The mapping of is not completely consistent , Therefore need NameMappingFlatMap Operator pairs source Of schema table column Replace .NameMapping Support schema table column Mapping , The mapping relationship is shown in the figure below :
The mapping relationship in the figure represents the source table schema by ChunJun_source Under the source1 This table corresponds to the target side ChunJun_sink Under the sink1, Where the fields are mapped to Of the source table C1 Field corresponds to the target id Field ,C2 Field corresponds to the target name Field

Creating flink When synchronizing tasks , Will judge whether the script is configured with nameMapping Configuration of , If there is no configuration, there will be no NameMappingFlatMap operator .
02
RestorationFlatMap
In data restoration, it must involve DDL, But at the moment sink It only supports DML Implementation , Therefore, in the source table DDL After that DML Data cannot be sent directly to sink End execution , Need to wait until sink The corresponding end of DDL After execution ,DML To reissue .
therefore RestorationFlatMap The design is mainly to solve Data distribution When to issue questions , When it is issued is downstream sink Of DDL After execution , But this sink End ddl The execution of is not ChunJun Accomplished ,ChunJun There is no way to know the completion time . therefore RestorationFlatMap Will interact with the outside Access to this DDL Execution status To judge DML When will the data be distributed .
The structure design
RestorationFlatMap Internally, a set is maintained for each table ,DML&DDL The data will be stored in this collection . The collection switches between non blocking and blocking states , At the same time, there will be two internal components, namely workerManager as well as Monitor Components :
- WorkerManager: Listen for non blocking collection data , If it is DML Send out , If it is DDL Then set the queue to blocking state
- Monitor: take ddl Store to an external data source And listening for blocked queues ddl The implementation of , Make blocking to non blocking changes store Listen for the first in the blocking status queue ddl data , Store it in an external table fetcher Listen to external tables DDL The state of the data If executed , Change the blocking state of the set corresponding to this table to non blocking
External table design
ChunJun At present, we support DDL Data is stored externally Mysql The data source will DDL Data is written to an external data source , The third party modifies this DDL Data status by 2 after ,ChunJun Will think downstream DDL Completed

Sample data restore script
Script examples are mainly divided into nameMapping And restoration Two parts , They correspond to each other NameMappingFlatMap And RestorationFlatMap Operator parameters .
Be careful reader Of split I need to set to true.

There is a problem with data restore
- binlog Support DDL Read ,logminer There is no support for All data sources are required to support DDL The read
- RestorationFlatMap Will store the data in memory , External persistence will follow
- checkpoint Insufficient support ,RestorationFlatMap Module data will be lost after running
- The current data source generates DDL There is too much interaction between the scene and the outside , Follow up increase DDL Automatic execution , achieve DML&DDL All by chunjun complete , Users have no perception

DDL Automatic transformation architecture design
Restore current data DDL Execution depends on external operations ,ChunJun Only according to DDL Data status DML Data distribution , In order to route the whole chain ChunJun Closed loop complete , Reduce external dependence and operation and maintenance costs ,ChunJun Conduct DDL Automatic conversion operation , take source Of DDL Syntax to sink Of DDL grammar , So there it is DDL Design of automatic conversion module .
DDL Automatic conversion solves the following problems :
- At present ddl data ChunJun Downstream will not execute automatically
- Stored in external tables DDL The data status is manually modified by the customer
Main structural design :
take DDL The automatic conversion logic is placed in NameMappingFlatMap in ,NameMappingFlatMap Perform data conversion .

DDL Technical solution
The automatic conversion function of data restore mainly includes the following three parts :
1、 analysis DDL sentence
Source table DDL SQL Convert to an intermediate object and the intermediate object to the target end DDL sentence .
2、 Exception data management
If the automatic conversion fails , Throw out conventException after , Handled by the corresponding exception manager .
3、DDL Data status is automatically modified
DDL SQL After downstream execution , Store the intermediate table based on the event notification method DDL Change the status to completed .
DDL Architecture design
because DDL There is no uniform standard , Of each data source DDL The grammar is different , Therefore, it is necessary to follow the of each data source DDL Grammar for parsing , And parse it into an intermediate data , Then the intermediate data is converted to the target type data source DDL sentence .
therefore DDLConvent The top-level interface abstracts three basic methods :
1、RowData Convert to intermediate data
2、 Intermediate data is converted to DdlSql
3、 Get data source type

Calcite analysis DDL actual combat
Calcite analysis DDL This demonstration is based on the code level , Don't do a specific demonstration review on the official account , Community partners can go to B Check the live video review station .
B Station live review address :
https://www.bilibili.com/video/BV1eR4y1P7AH?spm_id_from=333.999.0.0

Conclusion
The above is what we have added in data restoration DDL Automatically execute design ideas , We plan to complete the above function points in the first half of the year , If you have good ideas, you are also welcome to give us issue perhaps pr.
issue standard
Submitting issue There must be a corresponding script 、 Commit mode 、 data ( It's not necessary )、 Full log ( Important things ) The content such as
pr Submit specifications
1、 stay pr Note in the repair issue
2、pr commit Template [hotfix/feat/docs-#issueID][#fix-module] #fix-commit( Try to use English , The content is clearly described )
3、 Try to keep the revised content consistent with issue The content is consistent , If there are irrelevant modifications , stay pr Note out ;
Community Document Center
At the same time, in order to help community partners better understand and use ChunJun, We launched the community Document Center , Welcome to use .
边栏推荐
- 这种动态规划你见过吗——状态机动态规划之股票问题(上)
- ctf (hardrce)
- Matlab discrete event system simulation experiment
- Problems and Countermeasures of minors' digital security protection
- [intensive reading of thesis]bert
- Tdengine business ecosystem partner recruitment starts
- MySQL日志管理、备份与恢复
- ECCV 2022 | complete four tracking tasks at the same time! Unicorn: towards the unification of target tracking
- MySQL index, transaction and storage engine
- Gamer questions
猜你喜欢

Want to speed up the vit model with one click? Try this open source tool!

ASP. Net core dependency injection journey: 1. Theoretical concepts

Use kaggle to run Li Hongyi's machine learning homework

How to modify the strict mode under MySQL so that adding new users by inserting user table is successful

【Flink】Flink进行Standalone模式的集群搭建

No Identifier specified for entity的解决办法

这种动态规划你见过吗——状态机动态规划之股票问题(上)
[intensive reading of thesis]bert

华硕无双,这可能是屏幕最好的平价高刷轻薄笔记本

Alibaba mailbox web login turn processing
随机推荐
Your appearance is amazing! Two JSON visualization tools are recommended for use with swagger. It's really fragrant
Metasploit Eternal Blue attack
Solved syntaxerror: (Unicode error) 'Unicode scape' codec can't decode bytes in position 2-3: truncated
gyp ERR! configure error. gyp ERR! stack Error: gyp failed with exit code: 1
Custom page 01 of JSP custom tag
php生成文字图片水印
Data types and variables
Synaesthesia integrated de cellular super large-scale MIMO and high-frequency wireless access technology
Gamer questions
It is thought-provoking: is syntax really important? Qiu Xipeng group proposed a powerful baseline for aspect based emotional analysis
MySQL master-slave architecture, read-write separation, and high availability architecture
让人深思:句法真的重要吗?邱锡鹏组提出一种基于Aspect的情感分析的强大基线...
Analysis of heterogeneous computing technology
C语言 2:求三数字最大值,求三数字中间值,编写程序步骤
ctf (hardrce)
MySQL deadlock, pessimistic lock, optimistic lock
Kgdb debug kernel cannot execute breakpoints and kdb-22:permisson denied
【Liunx】MariaDB/MySQL定时全量备份脚本及数据恢复
[brother hero June training] day 25: tree array
服务器访问速度