当前位置：网站首页>[advanced MySQL] differences among 10 data types and how to optimize the table structure (3)

[advanced MySQL] differences among 10 data types and how to optimize the table structure (3)

2022-06-11 19:38:00 【wu_ fifty-five thousand five hundred and fifty-five】

0. introduction

An excellent developer will constantly pursue performance improvement and resource conservation , Constantly polished , We are using mysql When creating a table structure , What's more, we need to understand the difference between each data type , And select the appropriate data type in different business scenarios .

Today, let's talk about how to optimize the table structure

1. Data type optimization

Fields are the basis of the table structure , Optimize the data type , It is the first step of table structure optimization . So let's first look at how to select and optimize data types .

1.1 mysql Data type of

First we need to know mysql What data types are supported , What is the difference between these data types , Then we can select different types for different scenarios .

mysql The Chinese Communist Party supports 10 Type of data

1.1.1 Integer types

Integer types are divided into the following 5 Kind of

data type	size	Range
tinyint	1 byte	-128~127
smallint	2 byte	-32768~32767
mediumint	3 byte	-8388608~8388607
int	4 byte	-2147483648~2147483647
bigint	8 byte	`-9.2210^18`~`9.2210^18`

We know from the above tinyint Is the smallest of all integer types , We often see the use of int(n) And so on , Among them n Is the maximum width that the data type can display , such as 10 The width of is 2,100 The width of is 3,n It is not related to the storage size of the data , in other words int(1) And int(11) The space occupied is 4 byte

1.1.2 Floating point type

mysql Floating point types in are as follows 3 Kind of

data type	size
float	4 byte
double	8 byte
decimal (m,n)	Depending on m,n

decimal The size depends on m,n Value ,m The numeric length of the entire value ,n Indicates the number length of the decimal part , Default 4 Bytes can store 9 A digital , The decimal point takes up 1 Bytes , discontent 9 The size of the number is as follows

Number of numbers	size
1,2	1 byte
3,4	2 byte
5,6	3 byte
7,8	4 byte

for instance ：decimal(19,4),19/9=2…1,2*4=8 byte ,1 Two figures account for 1 byte , Add the decimal point 1 byte , So it accounts for 8+1+1=10 byte

It should be noted that ：mysql Regulations m<65,n<30

1.1.3 A type of

bit Data types are used to store bit field values , Support 1-64 Length , We can use bit(1) To represent Boolean data

Although in terms of space occupation ,bit smaller , But in actual development, we often use tinyint(1) To express , And rarely use bit, Mainly because bit Is in mysql5.x After the version, the support is gradually improved , The habit of historical development and inheritance make us prefer to use tinyint

1.1.4 The date type *

Date types mainly include the following 5 Kind of

data type	size	Range	purpose
date	3 byte	1000-01-01～9999-12-31	Storage date
time	3 byte	‘-838:59:59’~‘838:59:59’	Store time values （ Do not store the day, month, year ）
year	1 byte	1901~2155	Store the year value
datetime	8 byte	1000-01-01 00:00:00～9999-12-31 23:59:59	Store time values （ Including sun, moon and year ）
timestamp	4 byte	1970-01-01 00:00:00~2038-1-19 11:14:07	Storage time stamp , Note that only 2038 year

1.1.5 Character type *

Character types include the following ：

data type	size	purpose
char	0~255 byte	Store fixed length string
varchar	0-65535 byte	Store variable length strings
text	0-65535 byte	Store long text
tinytext	0-255 byte	Store short text
mediumtext	0-16M	Store medium text
longtext	0-4G	Store large text

It should be noted that , stay mysql It has been rarely used in 4 Kind of text Type , Generally, this kind of long text data will be stored in the form of file OSS, Or store it in es Middleware .

1.1.6 Binary string type

Corresponding to the above text type ,mysql Provided in 4 Chinese binary text type , Less used in actual development , It is listed here for your reference

data type	size	purpose
blob	0-65535 byte	Store long text in binary form
tinyblob	0-255 byte	Store short text in binary form
mediumblob	0-16M	Store medium text in binary form
longblob	0-4G	Store very large text in binary form

blob The type and text The difference between types is that ：
1、blob It stores data in binary ,text Is to store data in text
2、blob The stored data can only be read as a whole
3、 because blob It's binary , So you don't have to specify a character set , and text need

1.1.7 Enumeration type *

enum, And java equally ,mysql There are also enumeration types in , We can use enumeration types to represent states 、 Type and other enumeration values

1.1.8 Collection types

set, Not commonly used , Generally, sub tables are used to store set data

1.1.9 JSON type

json, Used to store json Type object 、 Array , Like storage bpmn Workflow template json Type data . But it is seldom used in practical work mysql Storage json data , This is from mysql It is determined by the nature of relational database itself , Use natural support more often json Of mongodb, If there is a large amount of data, it can also be used es Storage

1.1.10 Spatial data types

geometry,point etc. ,mysql Spatial coordinate data is rarely stored in , Also used more often mongodb、es To store

1.2 Data type optimization principle

We are getting to know mysql Supported data types and the size and purpose of the space they occupy , Then we can move on to today's topic , How to optimize table fields . We start from 3 Let's start with three principles to explain

1.2.1 The smaller the better.

We should try to use the smallest data type , The so-called minimum is the minimum space . It works smallint You don't have to int, It works tinyint You don't have to smallint

1、 integer
such as ‘ Whether or not to delete ’ Such an identification field , We can use it tinyint(1) To express , And don't use int(1) To express . It's explained above , Inside 1 It is not the size of the space it occupies , Don't think tinyint(1) The size and int(1) The size is consistent

The size of the integer space is from small to large ：tinyint<smallint<mediumint<int<bigint, Use small if you can

** 2、 character string **
The most common string we use is varchar and char, Such as telephone number 、 We try to use the fixed length characters such as postal code char. And its length meets the business requirements , Small is small , For example, the zip code is 6 position , Don't define it as char(10), Don't waste this 4 Space .

At the same time, for strings of uncertain length , Although we use varchar, But don't blindly define it as varchar(255), Instead, the maximum length should be defined according to the actual business situation , If you don't know his maximum length , Then ask the product manager 、 Ask about business , Determine a maximum length .

Here we need to add a few knowledge points ：

（1）varchar(n), When n<=255 when , You need to use extra 1 Bytes save length ,n>255 Additional use is required 2 Bytes save length .
（2） In order to improve the query efficiency , The length in time is not enough 255,mysql I will varchar reserve 255 The spatial location of , in other words varchar(1) The reserved space is also 255, Note that the reserved position does not represent the actual occupied size , The so-called reservation refers to reserving the continuous space in the disk , So these data are continuous , The efficiency is high when obtaining . Think of your family , Your brother lives in the village , You live at the end of the village , You said that the village head came to inform your family to have a meeting , Do you notice quickly when you live together or when you don't live together ？
（3）varchar stay mysql5.6 Before the release , Change the length to 255 Change the following to 255 When above , It will cause the watch to lock . So we all try to suggest that the length should be set to 255 following
（4）char It is more efficient than varchar Higher , So it works char Just use char. In combination with the example of village head and village tail mentioned above , Think about why ？

3、 Long text data
As shown above , This kind of data is usually articles or other documents , We usually store it as a file in OSS The server , And then in the database with varchar Form stores a OSS The address of the file on the

4、 Time type
In terms of occupancy ：date<timestamp<datetime. According to the above principle , Try to use small ones to meet business requirements , But the type of time is special , We consider that the business needs to be forward-looking ：

For example, do you need to store the date in seconds , There may be no such requirement in the business , But is it necessary to do data statistics in the future .

Or will the project run beyond 2038 year , because timestamp The biggest can only mean 2038 year . Use it now datetime

5、 Enumeration type
Use enum Type or numeric type to replace some enumerated strings

6、ip Address
Generally we store ip The address directly thinks of a string , It can actually be done by inet_aton The function returns a string type ip Data is converted into numerical values and stored in the database , In this way, the occupied space will be greatly reduced , When querying, you can pass inet_ntoa Function to convert

select inet_aton('1.1.1.1')
select inet_ntoa(16843009)

1.2.2 The simpler, the better

If you can use simpler types, you can use simpler types , Because simple data types consume CPU Less resources . So which types are simpler ？

1、 Integers are simpler than strings , such as ip Address conversion to integer for storage
2、 Date types are simpler than strings , For example, avoid using strings to store dates
3、char Than varchar It's simpler

1.2.3 Avoid being empty

We know it's right null Value judgment , direct =null、!=null It doesn't work , And have to pass is null、is not null To judge . also null Value will make the index unable to count , So when we create table fields , If you can give a default value, try to give it a default value , Especially the index field .

2. Table structure optimization principle

2.1 Appropriate data redundancy

Applicable scenario ： Frequently inquired and required join A small number of fields that can only be obtained from two or more tables

When designing a table structure , If some fields require join Another table to query , And the query is more frequent , Then we should consider redundancy of this field in the main table , In this way, all data can be queried through one table , Improve query efficiency

Case study 1:

For example, the product name , goods ID Wait for the data to be maintained in the commodity table , There are items in the order form ID. We must display the product name in the order , If you get the name from the product table every time , Its efficiency is certainly not as high as the redundancy of a product name in the order table

Of course, such redundancy should consider the business needs , For example, some of our businesses require data to be displayed in real time , Information such as name may be changed , Once the name found in the previous business table is changed, it should also be the latest , In this case , We need to query by association , Redundant data can no longer be solved

Some students may say , When I update, I will update the redundant fields together ？ Well, that's a good idea , But the score , If the data in the business table is special , Then you have to update a lot of redundant data , This may cause the update operation to lose more than gain .

Case study 2:

For example, we have a waybill , Each waybill records the information of multiple transported goods , At the same time, there is a list of transportation goods , So when we want to count the total transportation volume in a certain period of time , You need to associate the goods table with the waybill table , Then sum the weight fields in the cargo table

But in fact, we can redundancy a total weight field in the waybill , There may even be a total amount field , When the goods are inserted, they are summed and recorded in the waybill , In this way, the statistics can be directly obtained in the waybill

2.2 Appropriate split

Use scenarios ： There are fields in the table that occupy a large space but are not frequently queried , When there are other frequently queried fields

When one or several tables in our table occupy large space , such as text Type of , perhaps varchar(200+) Type of , We need to query some fields of this table frequently , But these big fields are not needed , We can consider splitting these large fields into another table , Through foreign key Association

The advantage of this is , Let these frequently queried data be stored in adjacent data blocks , This can improve the efficiency of query （ It is also an example of the above-mentioned improvement of the village head and village tail ）, Reduce IO frequency

2.3 Select the appropriate character set

Maybe many of our developers don't care about the choice of character sets , Many of them are set up for the database with closed eyes utf8mb4, And then it's done .

But actually , For different business scenarios , Selecting the appropriate character set can greatly improve our query efficiency

1、 If the business data is only in English and figures , Then it can be set to latin1, That is, the Latin character set , This can save a lot of storage space . If you can determine that you do not need to store data in multiple languages , There is no need to use utf8 or utf8mb4

2、 Different data conditions for different business tables , Choose a different character set , To reduce storage space , So as to improve query efficiency

3、 There is Chinese , Just use utf8mb4, And don't use utf8. because utf8 Most support 3 byte , But like some rare Chinese characters and emojj Expression is needed 4 Bytes of , Will result in storage errors bug,mysql The official in the 2010 Released in utf8mb4 To solve these problems bug.

2.4 Primary key selection

There are generally two schemes for selecting primary keys ： First, select the unique natural primary key in the business data , For example, the ID number in the user table , The order number in the order form ; The second is to generate a business independent primary key through the primary key generation algorithm , such as UUID

We prefer the second option , That is, a random primary key is generated by the primary key generation algorithm , There are two benefits to doing this ：

1 It has nothing to do with the business , Easier to maintain
Imagine , If it is in a multi tenant system , In many cases, in order to meet the management needs of the user company , We allow the user to enter a custom number , For example, tenants A Number entered by user of 111, If we use this unique number as the primary key , Because it's a multi tenant system , Data is isolated , Tenant B Of users can't see the number 111 Data. , But if he also happens to create a number 111 when , You will find that the database reports an error , Because duplicate primary keys are not allowed , But he would wonder ： I didn't see 111 It has been entered , There is no display in the page . So in this case , Business independent primary keys become necessary .
2 The general primary key generation algorithm is universal , General processing scheme can save time for overall development

summary

Okay , This concludes the principles of table structure optimization in this issue , The implementation of more details should be controlled according to the specific business requirements . If you think this article will help you , You might as well pay attention to it and give it a praise ！

See you next time ～

原网站

版权声明
本文为[wu_ fifty-five thousand five hundred and fifty-five]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206111927578659.html