当前位置:网站首页>Models used in data warehouse modeling and layered introduction
Models used in data warehouse modeling and layered introduction
2022-07-06 17:25:00 【Stray_ Lambs】
Star model
Star model is a commonly used dimension modeling method . The star model is centered on the fact table , All dimension tables are directly linked to fact tables , Like a star . The dimension modeling of star pattern consists of a fact table and a set of dimension tables , And has the following characteristics :
- Dimension table is only associated with fact table , There is no relationship between dimension tables ;
- The primary key of each dimension table is a single column , And the primary key is placed in the fact table , As a foreign key connecting the two sides ;
- Take the fact table as the core , The dimension table is star shaped around the core .
Star architecture is an informal structure , Each dimension of the cube is directly connected to the fact table , So the data is redundant .
Snowflake model
Snowflake mode (Snowflake Schema) It's an extension of star mode . The dimension table in snowflake mode can have the , Although this model is more standard than star , But because this model is not easy to understand , Maintenance costs are high , Moreover, the performance aspect needs to associate multi-level dimension tables , Performance is also lower than the star model . So it's not commonly used .
Improve query performance by minimizing data storage and combining smaller dimension tables . Snowflake structure eliminates data redundancy .
Star model and snowflake model
1、 From a query performance perspective
stay OLTP(OLTP Is the main application of traditional relational database , Basically 、 Routine business .)-DW link , Due to the snowflake type to do multiple table join , Performance will be lower than star architecture ; But from DW-OLAP(OLAP Is the main application of data warehouse system , Support complex analysis operations , Focus on decision support , And provide intuitive and easy to understand query results .) link , Because the snowflake architecture is more conducive to the aggregation of measures , So the performance is better than star architecture .
2、 Model complexity angle
Star architecture is easier to handle
3、 Hierarchy Perspective
Snowflake architecture is closer to OLTP Structure of the system , More in line with business logic , The level is clear .
4、 Storage angle
Snowflake architecture has all the advantages of relational data model , No redundant data will be generated , In contrast, the star architecture will produce data redundancy .
Constellation model
Constellation pattern is an extension of star pattern , The star pattern is based on a fact table , And the constellation pattern is based on multiple fact tables , And share dimensional information . The two dimension modeling methods mentioned above are multi-dimensional tables corresponding to single fact tables , But in many cases, there is more than one fact table in the dimension space , A dimension table can also be used by multiple fact tables . In the late stage of business development , The vast majority of dimensional modeling uses constellation mode .
tips: Most of the time, the data warehouse is more suitable to use the star model to build the underlying data Hive surface , Improve query efficiency through a lot of redundancy , Star model for OLAP Analysis engine support is more friendly , This is in Kylin It can be reflected in Chinese . The snowflake model in relational databases such as MySQL,Oracle Very common in , Especially the database table of e-commerce . There are few application scenarios of snowflake model in data warehouse , But it's not without , So in the specific design , Consider whether it can be combined with the advantages of both to participate in design , In order to achieve the optimization of the design .
General data layered design
General , We divide the data model into three layers : Data operation layer ( ODS )、 Data warehouse layer (DW) And data application layer (APP). The simple understanding is ODS Layer is the access to the original data ,DW Layer is the middle layer of the data warehouse we want to focus on ,APP It is application data for business customization .
ODS layer
“ Subject oriented ”, Data operation layer , Also called ODS layer , It's the layer closest to the data in the data source , Data in data source , After extraction 、 Wash the 、 transmission , That is to say, the legendary ETL after , Load into this floor . The data of this layer , Generally speaking, most of them are classified according to the classification of source business system .
In general , In order to consider the possible need to trace data later , therefore For this layer, it is not recommended to do too much data cleaning , Just access the original data intact , As for data denoising 、 duplicate removal 、 Outlier processing and other processes can be placed in the following DWD Layer to do .
DW layer
The data warehouse layer is the core design layer when we do data warehouse , ad locum , from ODS The data obtained in the layer establishes various data models according to the topic .DW Layers are subdivided into DWD(Data Warehouse Detail) layer 、DWM(Data WareHouse Middle) Layer and the DWS(Data WareHouse Servce) layer .
1. Data detail layer :DWD(Data Warehouse Detail)
This layer generally maintains and ODS Layer like data granularity , And provide certain data quality assurance . meanwhile , In order to improve the usability of the data detail layer , This layer will adopt some methods of dimension degradation , Reduce dimensions to fact tables , Reduce the association between fact table and dimension table .
in addition , Some data aggregation will be done in this layer , Aggregate data from the same topic into a table , Improve data availability , I'll give you an example later .
2. Data middle layer :DWM(Data WareHouse Middle)
This layer will be in DWD Layer of data based on , Do a mild aggregation of data , Generate a series of intermediate tables , Improve the reusability of common indicators , Reduce repetitive processing .
Intuitively speaking , It is to aggregate common core dimensions , Calculate the corresponding statistical indicators .
3. Data service layer :DWS(Data WareHouse Servce)
Also known as data mart or wide table . By business , Such as flow 、 Order 、 The user etc. , Generate a wide table with more fields , Used to provide subsequent business queries ,OLAP analysis , Data distribution, etc .
In general , There will be relatively few tables in this layer , A table will cover more business content , Because of its many fields , Therefore, the table of this layer is generally called wide table .
In actual calculation , If you go directly from DWD perhaps ODS Calculate the statistical indicators of the wide table , There will be too much calculation and too few dimensions , So the general practice is , stay DWM The layer first calculates a number of small intermediate tables , And then they're stitched together into a piece DWS Wide table of . Because it's hard to define between wide and narrow , It can also be removed DWM This floor , Just stay DWS layer , Put all the data in DWS Yes .
Data application layer :APP(Application)
ad locum , It mainly provides data for data products and data analysis , It's usually stored in ES、PostgreSql、Redis And other systems for online systems , There may be Hive perhaps Druid For data analysis and data mining . For example, we often talk about report data , It's usually right here .
Dimensional surface (Dimension)
Finally, add a dimension surface , The dimension surface layer mainly contains two parts of data :
- High cardinality dimension data : It's usually a user profile 、 A commodity list is similar to a data sheet . The amount of data may be tens of millions or hundreds of millions .
- Low cardinality dimension data : It's usually a configuration table , For example, the Chinese meaning of enumeration values , Or date dimension table . The amount of data may be single digits or tens of thousands .
Reference resources
https://book.itheima.net/study/1269935677353533441/1268108183453343745/1270246187252850690
边栏推荐
- Take you hand-in-hand to do intensive learning experiments -- knock the level in detail
- MySQL字符串函数
- SQL tuning notes
- vscode
- 关于Selenium启动Chrome浏览器闪退问题
- DOS function call
- Idea breakpoint debugging skills, multiple dynamic diagram package teaching package meeting.
- The difference between URI and URL
- CentOS7上Redis安装
- Coursera cannot play video
猜你喜欢
随机推荐
Notes on how the network is connected
肖申克的救赎有感
The difference between URI and URL
ByteDance overseas technical team won the championship again: HD video coding has won the first place in 17 items
Logical operation instruction
CTF逆向入门题——掷骰子
In the command mode in the VI editor, delete the character usage at the current cursor__ Command.
mysql 基本增删改查SQL语句
集成开发管理平台
Instructions for Redux
Flink parsing (VI): savepoints
Only learning C can live up to expectations top3 demo exercise
1. JVM入门介绍
Login to verify the simple use of KOA passport Middleware
关于Selenium启动Chrome浏览器闪退问题
02个人研发的产品及推广-短信平台
在 vi 编辑器中的命令模式下,删除当前光标处的字符使用 __ 命 令。
[VNCTF 2022]ezmath wp
8086 segmentation technology
Use of mongodb in node