当前位置:网站首页>Comparison between dimensional modeling and paradigm modeling
Comparison between dimensional modeling and paradigm modeling
2022-07-26 17:15:00 【51CTO】
One 、 Two modeling ideas
about Inmon and Kimball Two modeling methods can be described at length , But the theory is boring , Especially the obscure words , I don't think you'll get much after reading it , So I use popular language to extract the core concepts according to my own understanding .
Paradigm modeling
Paradigm modeling is the father of data warehouse Inmon Advocated by ,“ Data warehouse ” The word is what the master defined , This modeling method is consistent with the paradigm theory 3NF, there 3NF And OLTP Medium 3NF It's a little different : In a relational database 3NF It is the abstraction of entity object relationship for specific business process , And the data warehouse 3NF It's a topic oriented abstraction from an enterprise perspective .
Inmon From the perspective of process, the model is top-down , Top down refers to the flow of data ,“ On ” That's upstream of the data ,“ Next ” That's downstream of the data , From distributed heterogeneous data sources -> Data warehouse -> The data mart . Data oriented , Then explore step by step to get the data that meets the expectation as much as possible , Because data sources are often heterogeneous , So more emphasis will be put on data cleaning , Extract data into entities - relational model , The concept of fact table and dimension table is not emphasized .
One 、 Two modeling ideas
about Inmon and Kimball Two modeling methods can be described at length , But the theory is boring , Especially the obscure words , I don't think you'll get much after reading it , So I use popular language to extract the core concepts according to my own understanding .
3、 ... and 、 Comparison of the two models
The characteristics of the two modeling methods
Paradigm modeling : From the specific examples in the previous section, we can see the advantages of paradigm modeling : Be able to integrate data model of business system , It is convenient to realize the model of data warehouse ; The same data is only stored in one place , No data redundancy , Data consistency is guaranteed ; Data decoupling , Convenient maintenance . But at the same time, it brings disadvantages : There are many tables ; When querying, there are many associated tables, which reduces the query performance .
Dimensional modeling : The structure of the model is simple , Analysis oriented , In order to improve query performance, data redundancy can be increased , De normalized design , Short development cycle , Ability to iterate quickly . The disadvantage is that there is a lot of redundancy in the data , The preprocessing phase is expensive , Late maintenance trouble ; Another problem is that the consistency of data caliber cannot be guaranteed , The reason is explained later .
Let's look at dimensional modeling : Data will be extracted as facts - Dimension model , Dimension is the angle of view , Look at a problem from a different perspective , You'll come to a different conclusion , And to get this conclusion , You need the measurement in the fact table , What is measurement , Fields of numeric type in fact table .
example : The sales situation of each commodity of a company in all parts of the country , Dimensions are cities and commodities across the country , Measurement is the unit price of goods , Calculate sales from different dimensions : Check the sales of yoghurt in Beijing , Sales of pure milk in Shanghai , That's how different dimensions are combined . On the condition of dimension , Calculate the sum of the unit prices of the goods , That is to say sum measurements , We can get the result we want .
Dimensional modeling , It's modeling by dimensions , But if the dimension design is unreasonable , Will it bring problems ?
If we think of provinces as a separate dimension , City as a dimension , Count the population of the city . At this time, provinces and cities are separate dimensions , There is no relationship between them , The following will happen :
Guangdong province, hangzhou 1500
Zhejiang Province guangzhou 1200
At this time, the data caliber is inconsistent , Finally, it leads to inaccurate data results . And paradigm modeling doesn't have this problem , Because entities are emphasized in paradigm modeling - relational model , So there must be a belonging relationship between provinces and cities , There will be no inconsistency between provinces and cities .
therefore , Paradigm modeling can guarantee Consistency of caliber , And dimensional modeling can't !

Four 、 Two kinds of modeling mixed scenes
Through the above sections, we have understood the ideas of paradigm modeling and dimension modeling, as well as the similarities and differences between them , Advantages and disadvantages . So can we mix the two modeling methods , Let them give full play to their respective advantages . Now let's take a look at .
Paradigm modeling Must conform to the quasi three paradigm design specification , If hybrid modeling is used , Then the source table also needs to conform to the limitation of normal form modeling , That is, the source data must be the data of operational or transactional systems . adopt ETL Extract transform and load to data warehouse ODS layer ,ODS Layer data is consistent with source data , therefore ODS Layer data is also in line with the paradigm design specifications , adopt ODS The data of , Using the paradigm modeling method , Build a data warehouse for atomic data EDW, Then based on EDW, Using dimension modeling method to build data mart .
Combine the specifications of the two modeling methods , Hybrid modeling follows “ loose coupling 、 hierarchical ” To implement the basic principles of architecture . The hybrid data warehouse architecture method mainly includes the following Key steps :
Business requirements are built step by step 、
Hierarchical storage of data 、
Integrate atomic data standards 、
Maintaining consistency dimensions, etc
边栏推荐
- About the idea plug-in I wrote that can generate service and mapper with one click (with source code)
- 别用Xshell了,试试这个更现代的终端连接工具
- MySQL lock mechanism (example)
- Idea Alibaba cloud multi module deployment
- How does the data link layer transmit data
- It turns out that cappuccino information security association does this. Let's have a look.
- How emqx 5.0 under the new architecture of mria+rlog realizes 100million mqtt connections
- 极大似然估计
- Add 5g and AI, oppo announced to invest 10billion R & D funds next year!
- How to implement Devops with automation tools | including low code and Devops application practice
猜你喜欢
![Sharing of 40 completed projects of high-quality information management specialty [source code + Thesis] (VI)](/img/b9/629449d3c946b017075ed42eaa81bf.png)
Sharing of 40 completed projects of high-quality information management specialty [source code + Thesis] (VI)

别用Xshell了,试试这个更现代的终端连接工具

2022-2023 topic recommendation of information management graduation project

Redis persistence - detailed analysis of RDB source code | nanny level analysis! The most complete network

Speaker recruitment | AI time recruit icml/ijcai 2022 as a Chinese speaker!!!

Matlab论文插图绘制模板第40期—带偏移扇区的饼图

How to write unit tests

Marxan模型保护区优化与保护空缺甄选技术、InVEST生态系统中的应用
![[development tutorial 8] crazy shell · open source Bluetooth heart rate waterproof sports Bracelet - triaxial meter pace](/img/92/91cbc9dad67bb23276386dcbb82f1c.png)
[development tutorial 8] crazy shell · open source Bluetooth heart rate waterproof sports Bracelet - triaxial meter pace

Alibaba side: analysis of ten classic interview questions
随机推荐
PyQt5快速开发与实战 3.4 信号与槽关联
Interface comparator
Create MySQL function: access denied; you need (at least one of) the SUPER privilege(s) for this operation
The Ministry of Public Security issued a traffic safety warning for summer tourism passenger transport: hold the steering wheel and tighten the safety string
京东三面:我要查询千万级数据量的表,怎么操作?
[ctfshow-web]反序列化
Win11自动删除文件设置方法
Methods of path related comments (I)
【Express接收Get、Post、路由请求参数】
UPC 2022 summer personal training game 07 (part)
JD Sanmian: I want to query a table with tens of millions of data. How can I operate it?
Linear regression from zero sum using mxnet
On the evolution of cloud native edge computing framework
Marxan model, reserve optimization and protection vacancy selection technology, application in invest ecosystem
Speaker recruitment | AI time recruit icml/ijcai 2022 as a Chinese speaker!!!
maximum likelihood estimation
Three misunderstandings of CRM implementation: lack of strategy, lack of identity, and technology first
Win11如何关闭共享文件夹
Thoroughly uncover how epoll realizes IO multiplexing
Analysis of the advantages of eolink and JMeter interface testing