当前位置:网站首页>Real time business intelligence Bi (II): reasonable ETL architecture design to realize quasi real time Business Intelligence BI
Real time business intelligence Bi (II): reasonable ETL architecture design to realize quasi real time Business Intelligence BI
2022-06-10 18:32:00 【Bi visualization of Parker data】
Today, let's talk about using ETL Architecture design and tuning to achieve business intelligence BI Quasi real time processing and presentation of some indicators . For example, update every hour , Or a few minutes , A minute , Even a few seconds , It's OK, too .
business intelligence BI Data warehouse ETL framework
General business intelligence BI Data warehouse ETL The architecture is designed in this way , Divided into four groups. ETL package 、 Or five ETL package , Each package is a hierarchical part of the data warehouse ETL Collection .

business intelligence BI Data warehouse - Parker data business intelligence BI Visual analysis platform
The first bag is ODS perhaps Staging layer , It contains all the source tables extracted from the business system data source to ODS Layer of ETL Treatment process .
The second package prioritizes all dimensions Dimension Table.
The third package starts with the standard fact layer Fact Tables.
Fourth package processing Data Mart Data mart layer .
The fifth package handles OLAP CUBE wait .
These packages are ordered by strict dependencies , It's serial . That is to say, the first package has not been processed , The second package is not executable ; The second package is not finished , The third package will not start .
What I mentioned above business intelligence BI Data warehouse ETL Architecture is a very standard layered architecture design , These five bags are usually placed in, for example Windows Timing task JOB Go inside to do regular scheduling , For example, once every night .
business intelligence BI Data warehouse ETL Architecture issues
But there is such a problem , If you want to achieve quasi real-time performance for some indicators, you cannot follow the above business intelligence BI Data warehouse ETL Architecture to design , We need to separate these indicators , The upstream and downstream of these indicators depend on ODS layer 、 Dimension level 、 The fact level indicators are packaged separately , And then in JOB There is a separate scheduled scheduling . One indicator, one JOB, Ten indicators are ten JOB. In this way, the implementation of these indicators does not depend on the original overall ETL framework , Can run alone , This is the first point .

Data visualization - Parker data business intelligence BI Visual analysis platform
The second point is , This JOB The time interval between scheduled tasks should be greater than this JOB Maximum execution time of . Like this JOB Generally, it is executed for one minute , Then set up business intelligence BI The best time interval for regular scheduling is two minutes or more . What does that mean , This indicator has not been calculated in the whole process , The next scheduled task is started , The last execution just finished writing data , This mission will clear the data , It's a mess .
therefore , To solve this problem, we need to do some additional business intelligence BI Data warehouse ETL Development and transformation of log framework , Every time ETL Check the log when executing , The last time the execution was not completed, this time it will not be started , Wait for the last execution to complete before starting. There will be no conflict .
business intelligence BI Data warehouse ETL Structural transformation
We used to run hundreds of packages in parallel on some large projects through business intelligence BI Data warehouse ETL The transformation of the framework to complete the quasi real-time implementation of data indicators , Of course, this business intelligence BI Quasi real time depends on the calculation time cycle and process of the index itself .
therefore , We will use a lot of incremental extraction , Including business intelligence BI Data table index in 、 Optimization of query performance .

Data visualization screen - Parker data business intelligence BI Visual analysis platform
In the past, each packet was executed serially from bottom to top , The scheduling of a packet is not executed until the scheduling of the previous packet is completed . Now it is equivalent to the need for real-time or quasi real-time business intelligence BI The index is separated from the original package and maintained separately to form a new serial , This business intelligence BI Data warehouse ETL The architecture is designed in the same way as the traditional data warehouse ETL The architecture is very different .
We are now in our own business intelligence BI Of the product ETL Scheduling is implemented in a linear way , Each index can be extracted and scheduled independently , And it's all configurable . This business intelligence BI in ETL The scheduling method is also a real-time data warehouse 、 Real time business intelligence BI Laid the foundation .
If you want to achieve complete business intelligence BI Real time analysis , Rather than quasi real-time analysis based on individual indicators , What kind of process is it , Share next time .
边栏推荐
猜你喜欢

Postman-接口测试工具

两部门发文明确校外培训机构消防安全条件

uniapp 原生js实现公历转农历

商业智能BI的价值,可视化报表等于商业智能BI吗?

The development of flutter in digital life and the landing practice of Tianyi cloud disk

踩坑了,BigDecimal 使用不当,造成P0事故!

攻防演练 | 网络安全“吹哨人”:安全监控

Uniapp native JS to convert the Gregorian calendar to the lunar calendar

Research on next generation distributed file system

c语言---13 循环语句while
随机推荐
Uniapp native JS to convert the Gregorian calendar to the lunar calendar
改变世界的开发者丨玩转“俄罗斯方块”的瑶光少年
【QNX Hypervisor 2.2 用户手册】3.2.2 VM配置示例
光储直柔配电系统浅析
什么是商业智能BI,谈谈商业智能BI的定义与作用
Custom types: structural bodies
“双碳”背景下 数据中心规划设计的新趋势及展望
The development of flutter in digital life and the landing practice of Tianyi cloud disk
【QNX Hypervisor 2.2 用户手册】3.3 配置Guest
CodeCraft-22 and Codeforces Round #795 (Div. 2)
连续六年稳居中国SDN(软件)市场份额第一
微信小程序,获取当前页面,判断当前页面是不是tabbar页面
当前有哪些主流的全光技术方案?-下篇
半导体硅片持续供不应求,胜高长期合约价上涨30%!
Postman-接口测试工具
两部门发文明确校外培训机构消防安全条件
Memory pool principle I (based on the whole block)
&& 与 ||
VMware Horizon 8 2111 部署系列(十六)Blast带宽测试
记录一个超级乌龙的智障bug,也许能帮助类似我的白吃