当前位置:网站首页>Models used in data warehouse modeling and layered introduction
Models used in data warehouse modeling and layered introduction
2022-07-06 17:25:00 【Stray_ Lambs】
Star model
Star model is a commonly used dimension modeling method . The star model is centered on the fact table , All dimension tables are directly linked to fact tables , Like a star . The dimension modeling of star pattern consists of a fact table and a set of dimension tables , And has the following characteristics :
- Dimension table is only associated with fact table , There is no relationship between dimension tables ;
- The primary key of each dimension table is a single column , And the primary key is placed in the fact table , As a foreign key connecting the two sides ;
- Take the fact table as the core , The dimension table is star shaped around the core .
Star architecture is an informal structure , Each dimension of the cube is directly connected to the fact table , So the data is redundant .
Snowflake model
Snowflake mode (Snowflake Schema) It's an extension of star mode . The dimension table in snowflake mode can have the , Although this model is more standard than star , But because this model is not easy to understand , Maintenance costs are high , Moreover, the performance aspect needs to associate multi-level dimension tables , Performance is also lower than the star model . So it's not commonly used .
Improve query performance by minimizing data storage and combining smaller dimension tables . Snowflake structure eliminates data redundancy .
Star model and snowflake model
1、 From a query performance perspective
stay OLTP(OLTP Is the main application of traditional relational database , Basically 、 Routine business .)-DW link , Due to the snowflake type to do multiple table join , Performance will be lower than star architecture ; But from DW-OLAP(OLAP Is the main application of data warehouse system , Support complex analysis operations , Focus on decision support , And provide intuitive and easy to understand query results .) link , Because the snowflake architecture is more conducive to the aggregation of measures , So the performance is better than star architecture .
2、 Model complexity angle
Star architecture is easier to handle
3、 Hierarchy Perspective
Snowflake architecture is closer to OLTP Structure of the system , More in line with business logic , The level is clear .
4、 Storage angle
Snowflake architecture has all the advantages of relational data model , No redundant data will be generated , In contrast, the star architecture will produce data redundancy .
Constellation model
Constellation pattern is an extension of star pattern , The star pattern is based on a fact table , And the constellation pattern is based on multiple fact tables , And share dimensional information . The two dimension modeling methods mentioned above are multi-dimensional tables corresponding to single fact tables , But in many cases, there is more than one fact table in the dimension space , A dimension table can also be used by multiple fact tables . In the late stage of business development , The vast majority of dimensional modeling uses constellation mode .
tips: Most of the time, the data warehouse is more suitable to use the star model to build the underlying data Hive surface , Improve query efficiency through a lot of redundancy , Star model for OLAP Analysis engine support is more friendly , This is in Kylin It can be reflected in Chinese . The snowflake model in relational databases such as MySQL,Oracle Very common in , Especially the database table of e-commerce . There are few application scenarios of snowflake model in data warehouse , But it's not without , So in the specific design , Consider whether it can be combined with the advantages of both to participate in design , In order to achieve the optimization of the design .
General data layered design
General , We divide the data model into three layers : Data operation layer ( ODS )、 Data warehouse layer (DW) And data application layer (APP). The simple understanding is ODS Layer is the access to the original data ,DW Layer is the middle layer of the data warehouse we want to focus on ,APP It is application data for business customization .
ODS layer
“ Subject oriented ”, Data operation layer , Also called ODS layer , It's the layer closest to the data in the data source , Data in data source , After extraction 、 Wash the 、 transmission , That is to say, the legendary ETL after , Load into this floor . The data of this layer , Generally speaking, most of them are classified according to the classification of source business system .
In general , In order to consider the possible need to trace data later , therefore For this layer, it is not recommended to do too much data cleaning , Just access the original data intact , As for data denoising 、 duplicate removal 、 Outlier processing and other processes can be placed in the following DWD Layer to do .
DW layer
The data warehouse layer is the core design layer when we do data warehouse , ad locum , from ODS The data obtained in the layer establishes various data models according to the topic .DW Layers are subdivided into DWD(Data Warehouse Detail) layer 、DWM(Data WareHouse Middle) Layer and the DWS(Data WareHouse Servce) layer .
1. Data detail layer :DWD(Data Warehouse Detail)
This layer generally maintains and ODS Layer like data granularity , And provide certain data quality assurance . meanwhile , In order to improve the usability of the data detail layer , This layer will adopt some methods of dimension degradation , Reduce dimensions to fact tables , Reduce the association between fact table and dimension table .
in addition , Some data aggregation will be done in this layer , Aggregate data from the same topic into a table , Improve data availability , I'll give you an example later .
2. Data middle layer :DWM(Data WareHouse Middle)
This layer will be in DWD Layer of data based on , Do a mild aggregation of data , Generate a series of intermediate tables , Improve the reusability of common indicators , Reduce repetitive processing .
Intuitively speaking , It is to aggregate common core dimensions , Calculate the corresponding statistical indicators .
3. Data service layer :DWS(Data WareHouse Servce)
Also known as data mart or wide table . By business , Such as flow 、 Order 、 The user etc. , Generate a wide table with more fields , Used to provide subsequent business queries ,OLAP analysis , Data distribution, etc .
In general , There will be relatively few tables in this layer , A table will cover more business content , Because of its many fields , Therefore, the table of this layer is generally called wide table .
In actual calculation , If you go directly from DWD perhaps ODS Calculate the statistical indicators of the wide table , There will be too much calculation and too few dimensions , So the general practice is , stay DWM The layer first calculates a number of small intermediate tables , And then they're stitched together into a piece DWS Wide table of . Because it's hard to define between wide and narrow , It can also be removed DWM This floor , Just stay DWS layer , Put all the data in DWS Yes .
Data application layer :APP(Application)
ad locum , It mainly provides data for data products and data analysis , It's usually stored in ES、PostgreSql、Redis And other systems for online systems , There may be Hive perhaps Druid For data analysis and data mining . For example, we often talk about report data , It's usually right here .
Dimensional surface (Dimension)
Finally, add a dimension surface , The dimension surface layer mainly contains two parts of data :
- High cardinality dimension data : It's usually a user profile 、 A commodity list is similar to a data sheet . The amount of data may be tens of millions or hundreds of millions .
- Low cardinality dimension data : It's usually a configuration table , For example, the Chinese meaning of enumeration values , Or date dimension table . The amount of data may be single digits or tens of thousands .
Reference resources
https://book.itheima.net/study/1269935677353533441/1268108183453343745/1270246187252850690
边栏推荐
- Wu Jun's trilogy insight (V) refusing fake workers
- List集合数据移除(List.subList.clear)
- Programmer orientation problem solving methodology
- Only learning C can live up to expectations TOP4 S1E6: data type
- Redis快速入门
- List set data removal (list.sublist.clear)
- DOS function call
- 关于Stream和Map的巧用
- Activiti directory (V) reject, restart and cancel process
- Redis installation on centos7
猜你喜欢
Akamai浅谈风控原理与解决方案
JVM garbage collector part 1
Introduction to spring trick of ByteDance: senior students, senior students, senior students, and the author "brocade bag"
CTF逆向入门题——掷骰子
How does wechat prevent withdrawal come true?
04个人研发的产品及推广-数据推送工具
ByteDance overseas technical team won the championship again: HD video coding has won the first place in 17 items
Mongodb learning notes
吴军三部曲见识(五) 拒绝伪工作者
集成开发管理平台
随机推荐
Shawshank's sense of redemption
List set data removal (list.sublist.clear)
唯有学C不负众望 TOP2 p1变量
【逆向中级】跃跃欲试
信息与网络安全期末复习(基于老师给的重点)
【逆向初级】独树一帜
DOS function call
轻量级计划服务工具研发与实践
【逆向】脱壳后修复IAT并关闭ASLR
ByteDance overseas technical team won the championship again: HD video coding has won the first place in 17 items
【MMdetection】一文解决安装问题
吴军三部曲见识(七) 商业的本质
MySQL string function
JVM之垃圾回收器上篇
数据仓库建模使用的模型以及分层介绍
Von Neumann architecture
微信防撤回是怎么实现的?
MySQL数字函数
Connect to LAN MySQL
Control transfer instruction