当前位置:网站首页>Real time data warehouse
Real time data warehouse
2022-07-04 14:22:00 【This program ape is so beautiful】
This article is just a summary of my real-time data warehouse experience , In terms of architecture and data flow, it is actually similar to offline data warehouse , But real-time processing has its own particularity
Why should there be real-time data warehouse ?
We have been able to take off-line positions , The purpose of data warehouse is to reuse , But offline is T+1 Of , In our massive real-time demand , Previous offline computing cannot be reused , A lot of new repetitive real-time code development , The cost of developing and computing resources is increasing
Real time data warehouse layering

ODS Raw data , Including logs and business data
DWD
DIM
DWM
DWS
ADS

DWD
One for each table Topic, Rewrite the order flow and other business data back kafka, In addition, the log data is output from the measurement output stream (sql That's more than one. insert +filter), There are mainly startup and exit logs 、 page ( Only include pages, that is pv journal ) journal 、 Behavior log, etc , Different data have completely different data structures , So we need to split it
At the same time, do some illegal value filtering , Like time stamps ,uid check ( Mainly regular matching , We are 13 Digit number ), in addition ODS In addition to the fact data, there will also be dimension data , Need to write DIM instead of DWD
DWD The main core of the layer is data diversion and state recognition
DIM
Like I said , some ODS Dimension data of Flink After you get it, you usually write it directly Hbase 了 , It is convenient for us to do dimensional flow join
DWM
DWM Layer is mainly due to the high cost of real-time computing, development, operation and maintenance , But in DWD -> DWS There are still many repeated calculations in the calculation of , Mainly extract this part for public
For example, order wide table , You need to associate order tables with order details and dimension tables , Then we can only process it once as a wide table , stay DWS Various behaviors or orders are used directly from DWM Just associate the data

This layer is often designed to have more streams join And flow dimension join
DWS
Mild polymerization , Deal with all kinds of real-time queries , And relieve the pressure of query
Combine more real-time data in a thematic way for easy management , At the same time, it can also reduce the number of dimension queries
How to make design DWS Table of , It mainly depends on dimension + Measure ( Fact data )
Metrics such as uv、pv、 Number of jumps 、 Number of times to enter the page (session_count)、 Continuous access duration, etc
The dimension is mainly the main , channel 、 Go to the ground 、 edition 、 at home and abroad 、 New and old users 、 System (ios, Android , The computer ) these
Accept detailed data , Merge streams into the same data format , Then the window is aggregated and output to the database ( We are clickhouse)
Real time data warehouse application landing
Real time data market
Alarm monitoring
Real-time recommendation
边栏推荐
- 学内核之三:使用GDB跟踪内核调用链
- [antd step pit] antd form cooperates with input Form The height occupied by item is incorrect
- 商业智能BI财务分析,狭义的财务分析和广义的财务分析有何不同?
- Mask wearing detection based on yolov1
- The mouse wheel of xshell/bash/zsh and other terminals is garbled (turn)
- Yingshi Ruida rushes to the scientific and Technological Innovation Board: the annual revenue is 450million and the proposed fund-raising is 979million
- Detailed index of MySQL
- 商業智能BI財務分析,狹義的財務分析和廣義的財務分析有何不同?
- R语言ggplot2可视化:gganimate包创建动态折线图动画(gif)、使用transition_reveal函数在动画中沿给定维度逐步显示数据
- 奇妙秘境 码蹄集
猜你喜欢
![去除重複字母[貪心+單調棧(用數組+len來維持單調序列)]](/img/af/a1dcba6f45eb4ccc668cd04a662e9c.png)
去除重複字母[貪心+單調棧(用數組+len來維持單調序列)]

失败率高达80%,企业数字化转型路上有哪些挑战?
![Remove duplicate letters [greedy + monotonic stack (maintain monotonic sequence with array +len)]](/img/af/a1dcba6f45eb4ccc668cd04a662e9c.png)
Remove duplicate letters [greedy + monotonic stack (maintain monotonic sequence with array +len)]

sql优化之查询优化器

C # WPF realizes the real-time screen capture function of screen capture box

Oppo find N2 product form first exposure: supplement all short boards

Innovation and development of independent industrial software

Leetcode 61: 旋转链表

测试流程整理(2)

MySQL之详解索引
随机推荐
Apple 5g chip research and development failure: continue to rely on Qualcomm, but also worry about being prosecuted?
Whether the loyalty agreement has legal effect
[antd] how to set antd in form There is input in item Get input when gourp Value of each input of gourp
golang fmt. Printf() (turn)
Understand chisel language thoroughly 07. Chisel Foundation (IV) - bundle and VEC
R language uses dplyr package group_ The by function and the summarize function calculate the mean and standard deviation of the target variables based on the grouped variables
Assertion of unittest framework
【FAQ】華為帳號服務報錯 907135701的常見原因總結和解决方法
R language ggplot2 visualization: gganimate package creates dynamic line graph animation (GIF) and uses transition_ The reveal function displays data step by step along a given dimension in the animat
[matlab] summary of conv, filter, conv2, Filter2 and imfilter convolution functions
Understand chisel language thoroughly 12. Chisel project construction, operation and testing (IV) -- chisel test of chisel test
flink sql-client.sh 使用教程
Mask wearing detection based on yolov1
Why should Base64 encoding be used for image transmission
Blob, text geometry or JSON column'xxx'can't have a default value query question
Haobo medical sprint technology innovation board: annual revenue of 260million Yonggang and Shen Zhiqun are the actual controllers
How to operate and invest games on behalf of others at sea
【云原生】我怎么会和这个数据库杠上了?
Yingshi Ruida rushes to the scientific and Technological Innovation Board: the annual revenue is 450million and the proposed fund-raising is 979million
第十七章 进程内存