当前位置:网站首页>Comparison between dimensional modeling and paradigm modeling
Comparison between dimensional modeling and paradigm modeling
2022-07-26 17:15:00 【51CTO】
One 、 Two modeling ideas
about Inmon and Kimball Two modeling methods can be described at length , But the theory is boring , Especially the obscure words , I don't think you'll get much after reading it , So I use popular language to extract the core concepts according to my own understanding .
Paradigm modeling
Paradigm modeling is the father of data warehouse Inmon Advocated by ,“ Data warehouse ” The word is what the master defined , This modeling method is consistent with the paradigm theory 3NF, there 3NF And OLTP Medium 3NF It's a little different : In a relational database 3NF It is the abstraction of entity object relationship for specific business process , And the data warehouse 3NF It's a topic oriented abstraction from an enterprise perspective .
Inmon From the perspective of process, the model is top-down , Top down refers to the flow of data ,“ On ” That's upstream of the data ,“ Next ” That's downstream of the data , From distributed heterogeneous data sources -> Data warehouse -> The data mart . Data oriented , Then explore step by step to get the data that meets the expectation as much as possible , Because data sources are often heterogeneous , So more emphasis will be put on data cleaning , Extract data into entities - relational model , The concept of fact table and dimension table is not emphasized .
One 、 Two modeling ideas
about Inmon and Kimball Two modeling methods can be described at length , But the theory is boring , Especially the obscure words , I don't think you'll get much after reading it , So I use popular language to extract the core concepts according to my own understanding .
3、 ... and 、 Comparison of the two models
The characteristics of the two modeling methods
Paradigm modeling : From the specific examples in the previous section, we can see the advantages of paradigm modeling : Be able to integrate data model of business system , It is convenient to realize the model of data warehouse ; The same data is only stored in one place , No data redundancy , Data consistency is guaranteed ; Data decoupling , Convenient maintenance . But at the same time, it brings disadvantages : There are many tables ; When querying, there are many associated tables, which reduces the query performance .
Dimensional modeling : The structure of the model is simple , Analysis oriented , In order to improve query performance, data redundancy can be increased , De normalized design , Short development cycle , Ability to iterate quickly . The disadvantage is that there is a lot of redundancy in the data , The preprocessing phase is expensive , Late maintenance trouble ; Another problem is that the consistency of data caliber cannot be guaranteed , The reason is explained later .
Let's look at dimensional modeling : Data will be extracted as facts - Dimension model , Dimension is the angle of view , Look at a problem from a different perspective , You'll come to a different conclusion , And to get this conclusion , You need the measurement in the fact table , What is measurement , Fields of numeric type in fact table .
example : The sales situation of each commodity of a company in all parts of the country , Dimensions are cities and commodities across the country , Measurement is the unit price of goods , Calculate sales from different dimensions : Check the sales of yoghurt in Beijing , Sales of pure milk in Shanghai , That's how different dimensions are combined . On the condition of dimension , Calculate the sum of the unit prices of the goods , That is to say sum measurements , We can get the result we want .
Dimensional modeling , It's modeling by dimensions , But if the dimension design is unreasonable , Will it bring problems ?
If we think of provinces as a separate dimension , City as a dimension , Count the population of the city . At this time, provinces and cities are separate dimensions , There is no relationship between them , The following will happen :
Guangdong province, hangzhou 1500
Zhejiang Province guangzhou 1200
At this time, the data caliber is inconsistent , Finally, it leads to inaccurate data results . And paradigm modeling doesn't have this problem , Because entities are emphasized in paradigm modeling - relational model , So there must be a belonging relationship between provinces and cities , There will be no inconsistency between provinces and cities .
therefore , Paradigm modeling can guarantee Consistency of caliber , And dimensional modeling can't !

Four 、 Two kinds of modeling mixed scenes
Through the above sections, we have understood the ideas of paradigm modeling and dimension modeling, as well as the similarities and differences between them , Advantages and disadvantages . So can we mix the two modeling methods , Let them give full play to their respective advantages . Now let's take a look at .
Paradigm modeling Must conform to the quasi three paradigm design specification , If hybrid modeling is used , Then the source table also needs to conform to the limitation of normal form modeling , That is, the source data must be the data of operational or transactional systems . adopt ETL Extract transform and load to data warehouse ODS layer ,ODS Layer data is consistent with source data , therefore ODS Layer data is also in line with the paradigm design specifications , adopt ODS The data of , Using the paradigm modeling method , Build a data warehouse for atomic data EDW, Then based on EDW, Using dimension modeling method to build data mart .
Combine the specifications of the two modeling methods , Hybrid modeling follows “ loose coupling 、 hierarchical ” To implement the basic principles of architecture . The hybrid data warehouse architecture method mainly includes the following Key steps :
Business requirements are built step by step 、
Hierarchical storage of data 、
Integrate atomic data standards 、
Maintaining consistency dimensions, etc
边栏推荐
- Marketing guide | several common micro blog marketing methods
- 接口比较器
- 导数、微分、偏导数、全微分、方向导数、梯度的定义与关系
- [Development Tutorial 9] crazy shell arm function mobile phone-i2c tutorial
- Avalanche subnets vs. polygon supernets of application chain
- Anaconda download and Spyder error reporting solution
- Small application of C language using structure to simulate election
- 营销指南 | 几种常见的微博营销打法
- Implementing dropout with mxnet from zero sum
- Interface comparator
猜你喜欢

限流对比:Sentinel vs Hystrix 到底怎么选?

Marxan model, reserve optimization and protection vacancy selection technology, application in invest ecosystem

Thinkphp历史漏洞复现

Digital intelligence transformation, management first | jnpf strives to build a "full life cycle management" platform

2022 software testing skills postman+newman+jenkins continuous integration practical tutorial

导数、微分、偏导数、全微分、方向导数、梯度的定义与关系

What is a distributed timed task framework?
![37. [categories of overloaded operators]](/img/67/b821270079589c53b9c38b0ca033ac.png)
37. [categories of overloaded operators]

What kind of product is the Jetson nano? (how about the performance of Jetson nano)

How can win11 system be reinstalled with one click?
随机推荐
Definition and relationship of derivative, differential, partial derivative, total derivative, directional derivative and gradient
Small application of C language using structure to simulate election
【飞控开发基础教程3】疯壳·开源编队无人机-串口(基础收发)
Matlab paper illustration drawing template issue 40 - pie chart with offset sector
正则表达式
JD Sanmian: I want to query a table with tens of millions of data. How can I operate it?
2022-2023 信息管理毕业设计选题题目推荐
maximum likelihood estimation
PXE efficient batch network installation
[flight control development basic tutorial 3] crazy shell · open source formation UAV - serial port (basic transceiver)
“青出于蓝胜于蓝”,为何藏宝计划(TPC)是持币生息最后的一朵白莲花
37. [categories of overloaded operators]
[untitled]
How does win11 automatically clean the recycle bin?
Marxan模型保护区优化与保护空缺甄选技术、InVEST生态系统中的应用
Interface comparator
Use verdaccio to build your own NPM private library
Relationship between standardization, normalization and regularization
极大似然估计
快速学会配置yum的本地源和网络源,并学会yum的使用