当前位置：网站首页>Models used in data warehouse modeling and layered introduction

Models used in data warehouse modeling and layered introduction

2022-07-06 17:25:00 【Stray_ Lambs】

Star model

Star model is a commonly used dimension modeling method . The star model is centered on the fact table , All dimension tables are directly linked to fact tables , Like a star . The dimension modeling of star pattern consists of a fact table and a set of dimension tables , And has the following characteristics ：

Dimension table is only associated with fact table , There is no relationship between dimension tables ;
The primary key of each dimension table is a single column , And the primary key is placed in the fact table , As a foreign key connecting the two sides ;
Take the fact table as the core , The dimension table is star shaped around the core .

Star architecture is an informal structure , Each dimension of the cube is directly connected to the fact table , So the data is redundant .

Snowflake model

Snowflake mode (Snowflake Schema) It's an extension of star mode . The dimension table in snowflake mode can have the , Although this model is more standard than star , But because this model is not easy to understand , Maintenance costs are high , Moreover, the performance aspect needs to associate multi-level dimension tables , Performance is also lower than the star model . So it's not commonly used .

Improve query performance by minimizing data storage and combining smaller dimension tables . Snowflake structure eliminates data redundancy .

Star model and snowflake model

1、 From a query performance perspective

stay OLTP(OLTP Is the main application of traditional relational database , Basically 、 Routine business .)-DW link , Due to the snowflake type to do multiple table join , Performance will be lower than star architecture ; But from DW-OLAP(OLAP Is the main application of data warehouse system , Support complex analysis operations , Focus on decision support , And provide intuitive and easy to understand query results .) link , Because the snowflake architecture is more conducive to the aggregation of measures , So the performance is better than star architecture .

2、 Model complexity angle

Star architecture is easier to handle

3、 Hierarchy Perspective

Snowflake architecture is closer to OLTP Structure of the system , More in line with business logic , The level is clear .

4、 Storage angle

Snowflake architecture has all the advantages of relational data model , No redundant data will be generated , In contrast, the star architecture will produce data redundancy .

Constellation model

Constellation pattern is an extension of star pattern , The star pattern is based on a fact table , And the constellation pattern is based on multiple fact tables , And share dimensional information . The two dimension modeling methods mentioned above are multi-dimensional tables corresponding to single fact tables , But in many cases, there is more than one fact table in the dimension space , A dimension table can also be used by multiple fact tables . In the late stage of business development , The vast majority of dimensional modeling uses constellation mode .

tips: Most of the time, the data warehouse is more suitable to use the star model to build the underlying data Hive surface , Improve query efficiency through a lot of redundancy , Star model for OLAP Analysis engine support is more friendly , This is in Kylin It can be reflected in Chinese . The snowflake model in relational databases such as MySQL,Oracle Very common in , Especially the database table of e-commerce . There are few application scenarios of snowflake model in data warehouse , But it's not without , So in the specific design , Consider whether it can be combined with the advantages of both to participate in design , In order to achieve the optimization of the design .

General data layered design

General , We divide the data model into three layers ： Data operation layer （ ODS ）、 Data warehouse layer （DW） And data application layer （APP）. The simple understanding is ODS Layer is the access to the original data ,DW Layer is the middle layer of the data warehouse we want to focus on ,APP It is application data for business customization .

ODS layer

“ Subject oriented ”, Data operation layer , Also called ODS layer , It's the layer closest to the data in the data source , Data in data source , After extraction 、 Wash the 、 transmission , That is to say, the legendary ETL after , Load into this floor . The data of this layer , Generally speaking, most of them are classified according to the classification of source business system .

In general , In order to consider the possible need to trace data later , therefore For this layer, it is not recommended to do too much data cleaning , Just access the original data intact , As for data denoising 、 duplicate removal 、 Outlier processing and other processes can be placed in the following DWD Layer to do .

DW layer

The data warehouse layer is the core design layer when we do data warehouse , ad locum , from ODS The data obtained in the layer establishes various data models according to the topic .DW Layers are subdivided into DWD（Data Warehouse Detail） layer 、DWM（Data WareHouse Middle） Layer and the DWS（Data WareHouse Servce） layer .

1. Data detail layer ：DWD（Data Warehouse Detail）

This layer generally maintains and ODS Layer like data granularity , And provide certain data quality assurance . meanwhile , In order to improve the usability of the data detail layer , This layer will adopt some methods of dimension degradation , Reduce dimensions to fact tables , Reduce the association between fact table and dimension table .

in addition , Some data aggregation will be done in this layer , Aggregate data from the same topic into a table , Improve data availability , I'll give you an example later .

2. Data middle layer ：DWM（Data WareHouse Middle）

This layer will be in DWD Layer of data based on , Do a mild aggregation of data , Generate a series of intermediate tables , Improve the reusability of common indicators , Reduce repetitive processing .

Intuitively speaking , It is to aggregate common core dimensions , Calculate the corresponding statistical indicators .

3. Data service layer ：DWS（Data WareHouse Servce）

Also known as data mart or wide table . By business , Such as flow 、 Order 、 The user etc. , Generate a wide table with more fields , Used to provide subsequent business queries ,OLAP analysis , Data distribution, etc .

In general , There will be relatively few tables in this layer , A table will cover more business content , Because of its many fields , Therefore, the table of this layer is generally called wide table .

In actual calculation , If you go directly from DWD perhaps ODS Calculate the statistical indicators of the wide table , There will be too much calculation and too few dimensions , So the general practice is , stay DWM The layer first calculates a number of small intermediate tables , And then they're stitched together into a piece DWS Wide table of . Because it's hard to define between wide and narrow , It can also be removed DWM This floor , Just stay DWS layer , Put all the data in DWS Yes .

Data application layer ：APP（Application）

ad locum , It mainly provides data for data products and data analysis , It's usually stored in ES、PostgreSql、Redis And other systems for online systems , There may be Hive perhaps Druid For data analysis and data mining . For example, we often talk about report data , It's usually right here .

Dimensional surface （Dimension）

Finally, add a dimension surface , The dimension surface layer mainly contains two parts of data ：

High cardinality dimension data ： It's usually a user profile 、 A commodity list is similar to a data sheet . The amount of data may be tens of millions or hundreds of millions .
Low cardinality dimension data ： It's usually a configuration table , For example, the Chinese meaning of enumeration values , Or date dimension table . The amount of data may be single digits or tens of thousands .