当前位置:网站首页>Real time development platform construction practice, in-depth release of real-time data value - 04 live broadcast review
Real time development platform construction practice, in-depth release of real-time data value - 04 live broadcast review
2022-07-27 11:05:00 【Several stacks of dtinsight】
Link to the original text : Real time development platform construction practice , Unlock the value of real-time data
Video review : Click here
Courseware acquisition : Click here
One 、 Background of real-time data warehouse construction
With the deepening of the digital transformation of the whole industry and the continuous improvement of technical capacity , Conventional T+1 type ( Next day ) The off-line big data model of the Internet is increasingly unable to meet the development needs of emerging businesses , Carry out real-time big data business , It's the only way for enterprises to deeply explore the value of data .
Facing the rapid generation of data under the digital transformation 、“ Small step run ” The demand for fine operation, real-time and automatic decision-making , How to improve the real-time data processing capability will become a major factor for enterprises to enhance their competitiveness .

While enterprises are building real-time data applications , They often face many difficulties :
High threshold of real-time development technology , It's hard to learn , The development process relies on various engines , The link is complex
Inefficient data development , Code debugging is complicated
Construction cost 、 High cost of use
Data modeling and development specifications are not unified , Difficult to monitor , Management is difficult

To solve these problems , In the process of real-time construction, we need to determine the construction methods and objectives , Help enterprises better realize real-time data warehouse construction .

Two 、 Real time data warehouse construction methodology
How to help enterprises build real-time data warehouses , We mainly start with the following four steps :

1、 Clear requirements
The first step for enterprises to carry out real-time data warehouse construction is to clarify the requirements , Requirements need to be combined with business requirements and technical requirements .
1) Business needs :
Sort out various real-time computing application scenarios in detail
Sort out the specific requirements of each real-time indicator in detail
2) Technical requirements :
- Sort out the data source information of each real-time indicator in detail

2、 Technology selection
The second step is the technology selection stage , Technology selection includes four aspects :
a、 Overall technical route
b、 Acquisition tools
c、 Message middleware + Computing engine and
d、 Dimension table 、 Storage database of result table

3、 Data warehouse design and development
The third step is the design and development of data warehouse , It includes the following three aspects :
a、 Real time data warehouse layered design
b、 The development of specification
c、 Code development and debugging

4、 Management and monitoring
The fourth step is management and monitoring , You can publish from the task 、 Operation monitoring and alarm 、 Real time data governance 3 Start with two parts .

3、 ... and 、 Build real-time data warehouse based on data stack real-time development platform
Construction methodology of sharing real-time data warehouse , Next, let's share the construction process of real-time data warehouse .

First step : Real time acquisition
be based on Chunjun( primary FlinkX) The database is CDC collection , Realize collection instrumentalization , The main data sources can be CDC( Log data ) And through the JDBC( Interval polling ) Two real-time acquisition methods .
1、CDC Read
How to read database logs , No pressure on the source reservoir
2、JDBC Read
For scenarios that do not open database logs , Through high frequency JDBC Poll read data , A self incrementing field is required

The second step : Data development
1、 Basic functions of data development
Now we show some basic functions of data development , Include :WEB SQL IDE、 Visual table creation 、 Dimension table cache strategy and system & Custom function , Rich underlying component encapsulation , Interface operation , Lower the development threshold , Enable developers to focus on business logic processing .

2、 Data development high-level functions
In addition to the basic functions of data development , There are also high-level configurations for specific industries or scenarios , Including automatic retries 、 Automatic start-stop 、 Dirty data management .

The third step : Published online
After the development is completed, the task will be released and launched , Release and launch include task debugging and task import and export .

Step four : Task operation and maintenance
Task operation and maintenance is to control the operation of tasks globally , The process of handling some abnormal or emergency situations .

The above real-time data warehouse construction process , In fact, it is the real-time development of several stacks developed by kangaroo cloud StreamWorks The process of product implementation .
Cloud native one-stop large number real-time development platform (StreamWorks), Cloud native one-stop big data real-time development platform for real-time data warehouse construction , Realize real-time data collection 、 Real time data processing 、 Full link coverage of task monitoring operation and maintenance . Support Flink Multi version engine 、Kubernetes Resource scheduling , Provide rich operation and maintenance monitoring curves , Help enterprises transform in real time .
picture
At the same time, the product has the following characteristics :
- collection + Calculation + Integration of operation and maintenance
It contains real-time development full link tools , collection 、 Calculation 、 Integration of operation and maintenance , Reduce customer use costs , Lower the threshold of real-time computing .
- Unified metadata management
Support output self research Hadoop colony , At the same time, it can be connected CDH、HDP、TDH And so on Oracle、TiDB Isometric engine ; Node resources can be rapidly and elastically scaled according to computing storage requirements , Stable response to business needs .
- Batch flow integration
Support Flink Batch flow integrated collection + Development , Integrate Iceberg, Enabling integrated Lake warehouse construction mode .
- Rich in functions
The platform provides tasks for cross environment publishing 、 Code debugging 、SQL check 、 Submit for inspection 、 Automatic start-stop 、 Batch connection of existing tasks and other rich functions
- Cloud native support
In support of YARN+HDFS On the basis of , Support at the same time Kubernetes Resource scheduling 、MinIO、OSS And so on
At the same time, the product has 3 Great value :
- Lower the development threshold
Compatible with multiple versions of the engine 、 Adapt to multiple data sources , The encapsulation set becomes the visual operation interface . be based on Web IDE, Visualize configuration table information and use SQL Language development , Lower the overall threshold for getting started
- Comprehensive operation and maintenance guarantee
Provide visual operation and maintenance for the whole life cycle of the task . Full link topology 、 Enrich Metirc Curve display 、 Multi mode and multi-channel task alarm , Help users build a comprehensive operation and maintenance system , Improve operation and maintenance guarantee .
- Promote data specification
Assist enterprises to build real-time data warehouses , Establish real-time data standards and specifications . Set up a set of real-time task scheduling 、 A real-time data platform integrating task operation monitoring and reliable recovery mechanism of real-time tasks , Ensure data quality , Provide unified standard data export .
Four 、 Real time data warehouse construction case
Next, let's share two actual cases of using customers , Introduce how the real-time development platform can help customers solve problems .
- A state-owned professional economic information service institution

- A securities client

5、 ... and 、 Analysis of multi stack batch flow integrated architecture
Finally, we will introduce a section of expansion materials for you , Analysis on the integrated architecture of several stacks and batch streams .
- Batch flow integrated overall architecture

- Core value of batch flow integration

- Batch stream integrated data construction link

- Batch stream integrated acquisition technology architecture

The source of the original :VX official account “ Several stack Study Club ”
Kangaroo cloud open source framework nail technology exchange group (30537511), Students interested in big data open source projects are welcome to join us to exchange the latest technical information , Open source project library address :https://github.com/DTStack
边栏推荐
- The second method of calculating overlapping integral
- PHP generates text and image watermarks
- Object array de duplication
- Use of parsel
- Use__ slots__ And__ dict__ To save space (it's simply a qualitative leap, and leetcode's personal test is effective)
- [brother hero July training] day 16: queue
- 荒野觅踪---寻找迭代次数
- antd table中排序th阻止悬停变色+table悬停行变色+table表头变色
- img src为空或者src不存在,图片出现白色边框
- 黑白像素分布对迭代次数的影响
猜你喜欢

Codeforces Round #807 (Div 2.) AB

如何组装一个注册中心

TDengine 商业生态合作伙伴招募开启

Tdengine business ecosystem partner recruitment starts

IMG SRC is empty or SRC does not exist, and the picture has a white border

Antd table+checkbox default value display

迭代次数的差异与信息熵

华硕无双,这可能是屏幕最好的平价高刷轻薄笔记本

Custom page 01 of JSP custom tag

The difference between scalar, vector, matrix and tensor in deep learning
随机推荐
antd table中排序th阻止悬停变色+table悬停行变色+table表头变色
Substr and substring function usage in SQL
Tcp/ip protocol
YonBuilder赋能创新,用友第四届开发者大赛“金键盘奖”开启竞逐!
MySQL index, transaction and storage engine
Mysql死锁、悲观锁、乐观锁
Tdengine helps Siemens' lightweight digital solution simicas simplify data processing process
Problems and Countermeasures of minors' digital security protection
antd table+checkbox 默认值显示
Set up Samba service
Alibaba mailbox web login turn processing
Analysis of new communication security risks brought by quantum computer and Countermeasures
树形数据转换
WebRTC实现简单音视频通话功能
Antd table+checkbox default value display
Integrated design of communication perception based on CSI: problems, challenges and Prospects
Use kaggle to run Li Hongyi's machine learning homework
一起学习C语言:结构体(二)
Learning notes - simple server implementation
Self optimization of wireless cell load balancing based on machine learning technology