当前位置：网站首页>Three paradigms of database

Three paradigms of database

2022-06-09 04:47:00 【Picchu moving forward】

1. Why database design is needed
2. normal form (Normal Formal）
3. Anti normalization
4.BCNF( Bass paradigm )

This article is about Shangsi valley MySQL Notes written

1. Why database design is needed

When designing data tables , There are many problems to consider :

What data do users need , What data do we want to save in the data table
How to ensure the correctness of the data in the data table
How to reduce the redundancy of data table
How can developers use databases more conveniently

If the database design is unreasonable , It may cause the following problems :

Easy to design , Repetition of information , Waste of storage space
Data update , Insert , Deleted exception
Does not represent information correctly
Loss of valid information
Poor program performance

We can see that a well-designed database is very important , It has the following advantages :

Save data storage space
It can guarantee the integrity of data
Convenient for the development of database application system

Design database , We have to pay attention to the design of data table , In order to create small redundancy , A well structured database , Designing a database must follow certain rules .

2. normal form (Normal Formal）

2.1 Paradigm overview

In relational database , About the basic principles of data table design , Rules are called paradigms , Normal form is the rule and guiding method that we need to follow in the process of designing database structure .
Insert picture description here
however , Sometimes in order to improve some query performance , We also need to break the paradigm rules , It's anti normalization .

2.2 The concept of keys and related attributes

The definition of the normal form uses primary keys and candidate keys , Let's first look at the related concepts , A key in a database is made up of one or more attributes , Let's take a look at the definitions of several keys and attributes commonly used in data tables .
Insert picture description here

Insert picture description here

2.3 First normal form (1NF)

The first normal form mainly guarantees that the value of each field in the data table must have Atomicity , That is, the value of each field in the data table is It can't be broken up any more The smallest data unit
The atomicity of attributes is subjective , We should design according to the needs of the actual project , Like the address , If the project does not say to be subdivided into provinces , City , county , If the town is so specific , Generally, we can not split it .

2.4 Second normal form (2NF）

The second paradigm requires that on the basis of meeting the first paradigm , And satisfy Every data record in the data sheet , Are uniquely identifiable , And all non primary key fields , Must be completely dependent on the primary key , You can't rely on only part of the primary key .
If you know the values of all attributes of the primary key , We can retrieve any tuple ( That's ok ) Any value of any property of ( The primary key in the requirement can be expanded and replaced with a candidate key )

for instance , On the grade sheet ( Student number , Course no. , achievement ) In relationship ,( Student number , Course no. ) You can decide your grades , Because a student can take many courses , A course can also be selected by multiple students , Therefore, neither the student number nor the course number can independently determine the grade .
therefore ( Student number , Course no. )——> The result is Complete dependency .

The game table contains the player number , full name , Age , Game number , The nature of the game, the venue, etc , Both candidate keys and primary keys are ( Player number , Game number ), We can use candidate keys ( Primary key ) To determine the following relationship .
( Player number , Game number )——>( full name , Age , Time of the game , The venue , score )
But this data table does not satisfy the second paradigm , Because the fields in the data table still have the following correspondence :

( Player number )——>( full name , Age )
( Game number )——>( Time of the game , The venue )
Non primary attributes are not entirely dependent on candidate keys , This will cause the following problems .

data redundancy : If a player participates m game , Then the player's name and age are repeated m-1 Time , A game may have n Two players participate in , The time and place of the game are repeated n-1 Time
Insertion exception : If we want to add a new game , But it's not clear who the players are 2, Then there is no insert
Delete exception : We want to delete a player number , But if you don't save the game table separately , Will delete the game information at the same time
Update exception : If we adjust the time of a game , Then all the time of the game in the data sheet must be adjusted , Otherwise, there will be the same game but different times .

To avoid the above , We can design the players' game table into the following three tables .

Table name	attribute ( Field )
players player surface	Player number , full name , Age and other attributes
match game surface	Game number , Time of the game , Playing field and other attributes
The relationship between players and the game player_game surface	Player number , Game number , Score and other attributes

In this case , Each data sheet conforms to the second paradigm , To avoid the occurrence of abnormal conditions
The second paradigm requires that the attributes of an entity completely depend on the primary keyword , If there is an incomplete dependency , Then this attribute and this part of the main keyword should be separated to form a new entity , There is a one to many relationship between the new entity and the original entity

2.5 Third normal form (3NF）

The third paradigm is based on the second paradigm
Every non primary key field in the data table is directly related to the primary key field
That is, all non primary key fields in the data table cannot depend on other non primary key fields
This rule means that all non primary attributes cannot have dependencies , They are independent of each other
The primary key here can be expanded into a candidate key

Insert picture description here

2.6 The advantages and disadvantages of paradigms

advantage :
- The standardization of data helps to eliminate data redundancy in the database
The third paradigm is generally considered to be in performance , The best balance between scalability and data integrity
shortcoming :
- It reduces the query efficiency , Because the higher the paradigm level , The more watches you design , When querying data, you may need to associate multiple tables , Not only is it expensive , And it may invalidate some indexes
- The paradigm is only the standard of design , When actually designing , We may violate the principles of the paradigm for performance and read efficiency , Improve the read performance of the database by adding a small amount of redundant or duplicate data , Reduce associated queries , Realize the purpose of space for time

3. Anti normalization

3.1 summary

Follow the principle of business priority
First, meet the business needs , Then come in to reduce redundancy
Sometimes we want to optimize query efficiency , Anti paradigm is also an optimization idea , We can improve the read performance of the database by adding redundant fields to the data table .

3.2 New problems of anti paradigm

Although anti - normal form can exchange space for reality , Improve the efficiency of query , But anti paradigm also brings some new problems

The storage space has become larger
A field in a table has been modified , Redundant fields in another table should also be modified synchronously , Otherwise, the data will be inconsistent
If you use stored procedures to support data updates , Delete and other operations , If the operation is frequent , It will consume system resources
In the case of a small amount of data , The anti normal form does not embody the performance advantage , It may also complicate the design of the database .

3.3 Applicable scenarios of anti paradigm

When redundant information can Greatly improve query efficiency When , We will adopt anti paradigm optimization .

Suggestions for adding redundant fields
Adding redundant fields must meet the following two conditions , Add redundant fields only when the following two conditions are met
① This redundant field does not need to be modified frequently
② This redundant field is indispensable for query