当前位置:网站首页>Real time data warehouse
Real time data warehouse
2022-07-04 14:22:00 【This program ape is so beautiful】
This article is just a summary of my real-time data warehouse experience , In terms of architecture and data flow, it is actually similar to offline data warehouse , But real-time processing has its own particularity
Why should there be real-time data warehouse ?
We have been able to take off-line positions , The purpose of data warehouse is to reuse , But offline is T+1 Of , In our massive real-time demand , Previous offline computing cannot be reused , A lot of new repetitive real-time code development , The cost of developing and computing resources is increasing
Real time data warehouse layering
ODS Raw data , Including logs and business data
DWD
DIM
DWM
DWS
ADS
DWD
One for each table Topic, Rewrite the order flow and other business data back kafka, In addition, the log data is output from the measurement output stream (sql That's more than one. insert +filter), There are mainly startup and exit logs 、 page ( Only include pages, that is pv journal ) journal 、 Behavior log, etc , Different data have completely different data structures , So we need to split it
At the same time, do some illegal value filtering , Like time stamps ,uid check ( Mainly regular matching , We are 13 Digit number ), in addition ODS In addition to the fact data, there will also be dimension data , Need to write DIM instead of DWD
DWD The main core of the layer is data diversion and state recognition
DIM
Like I said , some ODS Dimension data of Flink After you get it, you usually write it directly Hbase 了 , It is convenient for us to do dimensional flow join
DWM
DWM Layer is mainly due to the high cost of real-time computing, development, operation and maintenance , But in DWD -> DWS There are still many repeated calculations in the calculation of , Mainly extract this part for public
For example, order wide table , You need to associate order tables with order details and dimension tables , Then we can only process it once as a wide table , stay DWS Various behaviors or orders are used directly from DWM Just associate the data
This layer is often designed to have more streams join And flow dimension join
DWS
Mild polymerization , Deal with all kinds of real-time queries , And relieve the pressure of query
Combine more real-time data in a thematic way for easy management , At the same time, it can also reduce the number of dimension queries
How to make design DWS Table of , It mainly depends on dimension + Measure ( Fact data )
Metrics such as uv、pv、 Number of jumps 、 Number of times to enter the page (session_count)、 Continuous access duration, etc
The dimension is mainly the main , channel 、 Go to the ground 、 edition 、 at home and abroad 、 New and old users 、 System (ios, Android , The computer ) these
Accept detailed data , Merge streams into the same data format , Then the window is aggregated and output to the database ( We are clickhouse)
Real time data warehouse application landing
Real time data market
Alarm monitoring
Real-time recommendation
边栏推荐
- 【MySQL从入门到精通】【高级篇】(五)MySQL的SQL语句执行流程
- Understand chisel language thoroughly 06. Chisel Foundation (III) -- registers and counters
- Gorm data insertion (transfer)
- [antd] how to set antd in form There is input in item Get input when gourp Value of each input of gourp
- Ruiji takeout notes
- Introducing testfixture into unittest framework
- Golang uses JSON unmarshal number to interface{} number to become float64 type (turn)
- 商业智能BI财务分析,狭义的财务分析和广义的财务分析有何不同?
- MySQL的触发器
- R语言ggplot2可视化:gganimate包创建动画图(gif)、使用anim_save函数保存gif可视化动图
猜你喜欢
Introducing testfixture into unittest framework
TestSuite and testrunner in unittest
[antd] how to set antd in form There is input in item Get input when gourp Value of each input of gourp
Mask wearing detection based on yolov1
Test evaluation of software testing
vscode 常用插件汇总
Use of tiledlayout function in MATLAB
Oppo find N2 product form first exposure: supplement all short boards
C# wpf 实现截屏框实时截屏功能
ML之shap:基于boston波士顿房价回归预测数据集利用shap值对XGBoost模型实现可解释性案例
随机推荐
【信息检索】链接分析
【云原生】我怎么会和这个数据库杠上了?
Test evaluation of software testing
sql优化之explain
2022游戏出海实用发行策略
Why should Base64 encoding be used for image transmission
Supprimer les lettres dupliquées [avidité + pile monotone (maintenir la séquence monotone avec un tableau + Len)]
Vscode common plug-ins summary
What is the real meaning and purpose of doing things, and what do you really want
[MySQL from introduction to proficiency] [advanced chapter] (V) SQL statement execution process of MySQL
富文本编辑:wangEditor使用教程
【FAQ】华为帐号服务报错 907135701的常见原因总结和解决方法
Data warehouse interview question preparation
ViewModel 初体验
TestSuite and testrunner in unittest
[antd step pit] antd form cooperates with input Form The height occupied by item is incorrect
Fs4059c is a 5V input boost charging 12.6v1.2a. Inputting a small current to three lithium battery charging chips will not pull it dead. The temperature is 60 ° and 1000-1100ma is recommended
redis 日常笔记
[antd] how to set antd in form There is input in item Get input when gourp Value of each input of gourp
商业智能BI财务分析,狭义的财务分析和广义的财务分析有何不同?