当前位置:网站首页>[advanced MySQL] differences among 10 data types and how to optimize the table structure (3)
[advanced MySQL] differences among 10 data types and how to optimize the table structure (3)
2022-06-11 19:38:00 【wu_ fifty-five thousand five hundred and fifty-five】
List of articles
0. introduction
An excellent developer will constantly pursue performance improvement and resource conservation , Constantly polished , We are using mysql When creating a table structure , What's more, we need to understand the difference between each data type , And select the appropriate data type in different business scenarios .
Today, let's talk about how to optimize the table structure
1. Data type optimization
Fields are the basis of the table structure , Optimize the data type , It is the first step of table structure optimization . So let's first look at how to select and optimize data types .
1.1 mysql Data type of
First we need to know mysql What data types are supported , What is the difference between these data types , Then we can select different types for different scenarios .
mysql The Chinese Communist Party supports 10 Type of data
1.1.1 Integer types
Integer types are divided into the following 5 Kind of
| data type | size | Range |
|---|---|---|
| tinyint | 1 byte | -128~127 |
| smallint | 2 byte | -32768~32767 |
| mediumint | 3 byte | -8388608~8388607 |
| int | 4 byte | -2147483648~2147483647 |
| bigint | 8 byte | -9.22*10^18~9.22*10^18 |
We know from the above tinyint Is the smallest of all integer types , We often see the use of int(n) And so on , Among them n Is the maximum width that the data type can display , such as 10 The width of is 2,100 The width of is 3,n It is not related to the storage size of the data , in other words int(1) And int(11) The space occupied is 4 byte
1.1.2 Floating point type
mysql Floating point types in are as follows 3 Kind of
| data type | size |
|---|---|
| float | 4 byte |
| double | 8 byte |
| decimal (m,n) | Depending on m,n |
decimal The size depends on m,n Value ,m The numeric length of the entire value ,n Indicates the number length of the decimal part , Default 4 Bytes can store 9 A digital , The decimal point takes up 1 Bytes , discontent 9 The size of the number is as follows
| Number of numbers | size |
|---|---|
| 1,2 | 1 byte |
| 3,4 | 2 byte |
| 5,6 | 3 byte |
| 7,8 | 4 byte |
for instance :decimal(19,4),19/9=2…1,2*4=8 byte ,1 Two figures account for 1 byte , Add the decimal point 1 byte , So it accounts for 8+1+1=10 byte
It should be noted that :mysql Regulations m<65,n<30
1.1.3 A type of
bit Data types are used to store bit field values , Support 1-64 Length , We can use bit(1) To represent Boolean data
Although in terms of space occupation ,bit smaller , But in actual development, we often use tinyint(1) To express , And rarely use bit, Mainly because bit Is in mysql5.x After the version, the support is gradually improved , The habit of historical development and inheritance make us prefer to use tinyint
1.1.4 The date type *
Date types mainly include the following 5 Kind of
| data type | size | Range | purpose |
|---|---|---|---|
| date | 3 byte | 1000-01-01~9999-12-31 | Storage date |
| time | 3 byte | ‘-838:59:59’~‘838:59:59’ | Store time values ( Do not store the day, month, year ) |
| year | 1 byte | 1901~2155 | Store the year value |
| datetime | 8 byte | 1000-01-01 00:00:00~9999-12-31 23:59:59 | Store time values ( Including sun, moon and year ) |
| timestamp | 4 byte | 1970-01-01 00:00:00~2038-1-19 11:14:07 | Storage time stamp , Note that only 2038 year |
1.1.5 Character type *
Character types include the following :
| data type | size | purpose |
|---|---|---|
| char | 0~255 byte | Store fixed length string |
| varchar | 0-65535 byte | Store variable length strings |
| text | 0-65535 byte | Store long text |
| tinytext | 0-255 byte | Store short text |
| mediumtext | 0-16M | Store medium text |
| longtext | 0-4G | Store large text |
It should be noted that , stay mysql It has been rarely used in 4 Kind of text Type , Generally, this kind of long text data will be stored in the form of file OSS, Or store it in es Middleware .
1.1.6 Binary string type
Corresponding to the above text type ,mysql Provided in 4 Chinese binary text type , Less used in actual development , It is listed here for your reference
| data type | size | purpose |
|---|---|---|
| blob | 0-65535 byte | Store long text in binary form |
| tinyblob | 0-255 byte | Store short text in binary form |
| mediumblob | 0-16M | Store medium text in binary form |
| longblob | 0-4G | Store very large text in binary form |
blob The type and text The difference between types is that :
1、blob It stores data in binary ,text Is to store data in text
2、blob The stored data can only be read as a whole
3、 because blob It's binary , So you don't have to specify a character set , and text need
1.1.7 Enumeration type *
enum, And java equally ,mysql There are also enumeration types in , We can use enumeration types to represent states 、 Type and other enumeration values
1.1.8 Collection types
set, Not commonly used , Generally, sub tables are used to store set data
1.1.9 JSON type
json, Used to store json Type object 、 Array , Like storage bpmn Workflow template json Type data . But it is seldom used in practical work mysql Storage json data , This is from mysql It is determined by the nature of relational database itself , Use natural support more often json Of mongodb, If there is a large amount of data, it can also be used es Storage
1.1.10 Spatial data types
geometry,point etc. ,mysql Spatial coordinate data is rarely stored in , Also used more often mongodb、es To store
1.2 Data type optimization principle
We are getting to know mysql Supported data types and the size and purpose of the space they occupy , Then we can move on to today's topic , How to optimize table fields . We start from 3 Let's start with three principles to explain
1.2.1 The smaller the better.
We should try to use the smallest data type , The so-called minimum is the minimum space . It works smallint You don't have to int, It works tinyint You don't have to smallint
1、 integer
such as ‘ Whether or not to delete ’ Such an identification field , We can use it tinyint(1) To express , And don't use int(1) To express . It's explained above , Inside 1 It is not the size of the space it occupies , Don't think tinyint(1) The size and int(1) The size is consistent
The size of the integer space is from small to large :tinyint<smallint<mediumint<int<bigint, Use small if you can
** 2、 character string **
The most common string we use is varchar and char, Such as telephone number 、 We try to use the fixed length characters such as postal code char. And its length meets the business requirements , Small is small , For example, the zip code is 6 position , Don't define it as char(10), Don't waste this 4 Space .
At the same time, for strings of uncertain length , Although we use varchar, But don't blindly define it as varchar(255), Instead, the maximum length should be defined according to the actual business situation , If you don't know his maximum length , Then ask the product manager 、 Ask about business , Determine a maximum length .
Here we need to add a few knowledge points :
(1)varchar(n), When n<=255 when , You need to use extra 1 Bytes save length ,n>255 Additional use is required 2 Bytes save length .
(2) In order to improve the query efficiency , The length in time is not enough 255,mysql I will varchar reserve 255 The spatial location of , in other words varchar(1) The reserved space is also 255, Note that the reserved position does not represent the actual occupied size , The so-called reservation refers to reserving the continuous space in the disk , So these data are continuous , The efficiency is high when obtaining . Think of your family , Your brother lives in the village , You live at the end of the village , You said that the village head came to inform your family to have a meeting , Do you notice quickly when you live together or when you don't live together ?
(3)varchar stay mysql5.6 Before the release , Change the length to 255 Change the following to 255 When above , It will cause the watch to lock . So we all try to suggest that the length should be set to 255 following
(4)char It is more efficient than varchar Higher , So it works char Just use char. In combination with the example of village head and village tail mentioned above , Think about why ?
3、 Long text data
As shown above , This kind of data is usually articles or other documents , We usually store it as a file in OSS The server , And then in the database with varchar Form stores a OSS The address of the file on the
4、 Time type
In terms of occupancy :date<timestamp<datetime. According to the above principle , Try to use small ones to meet business requirements , But the type of time is special , We consider that the business needs to be forward-looking :
For example, do you need to store the date in seconds , There may be no such requirement in the business , But is it necessary to do data statistics in the future .
Or will the project run beyond 2038 year , because timestamp The biggest can only mean 2038 year . Use it now datetime
5、 Enumeration type
Use enum Type or numeric type to replace some enumerated strings
6、ip Address
Generally we store ip The address directly thinks of a string , It can actually be done by inet_aton The function returns a string type ip Data is converted into numerical values and stored in the database , In this way, the occupied space will be greatly reduced , When querying, you can pass inet_ntoa Function to convert
select inet_aton('1.1.1.1')
select inet_ntoa(16843009)
1.2.2 The simpler, the better
If you can use simpler types, you can use simpler types , Because simple data types consume CPU Less resources . So which types are simpler ?
1、 Integers are simpler than strings , such as ip Address conversion to integer for storage
2、 Date types are simpler than strings , For example, avoid using strings to store dates
3、char Than varchar It's simpler
1.2.3 Avoid being empty
We know it's right null Value judgment , direct =null、!=null It doesn't work , And have to pass is null、is not null To judge . also null Value will make the index unable to count , So when we create table fields , If you can give a default value, try to give it a default value , Especially the index field .
2. Table structure optimization principle
2.1 Appropriate data redundancy
Applicable scenario : Frequently inquired and required join A small number of fields that can only be obtained from two or more tables
When designing a table structure , If some fields require join Another table to query , And the query is more frequent , Then we should consider redundancy of this field in the main table , In this way, all data can be queried through one table , Improve query efficiency
Case study 1:
For example, the product name , goods ID Wait for the data to be maintained in the commodity table , There are items in the order form ID. We must display the product name in the order , If you get the name from the product table every time , Its efficiency is certainly not as high as the redundancy of a product name in the order table
Of course, such redundancy should consider the business needs , For example, some of our businesses require data to be displayed in real time , Information such as name may be changed , Once the name found in the previous business table is changed, it should also be the latest , In this case , We need to query by association , Redundant data can no longer be solved
Some students may say , When I update, I will update the redundant fields together ? Well, that's a good idea , But the score , If the data in the business table is special , Then you have to update a lot of redundant data , This may cause the update operation to lose more than gain .
Case study 2:
For example, we have a waybill , Each waybill records the information of multiple transported goods , At the same time, there is a list of transportation goods , So when we want to count the total transportation volume in a certain period of time , You need to associate the goods table with the waybill table , Then sum the weight fields in the cargo table
But in fact, we can redundancy a total weight field in the waybill , There may even be a total amount field , When the goods are inserted, they are summed and recorded in the waybill , In this way, the statistics can be directly obtained in the waybill
2.2 Appropriate split
Use scenarios : There are fields in the table that occupy a large space but are not frequently queried , When there are other frequently queried fields
When one or several tables in our table occupy large space , such as text Type of , perhaps varchar(200+) Type of , We need to query some fields of this table frequently , But these big fields are not needed , We can consider splitting these large fields into another table , Through foreign key Association
The advantage of this is , Let these frequently queried data be stored in adjacent data blocks , This can improve the efficiency of query ( It is also an example of the above-mentioned improvement of the village head and village tail ), Reduce IO frequency
2.3 Select the appropriate character set
Maybe many of our developers don't care about the choice of character sets , Many of them are set up for the database with closed eyes utf8mb4, And then it's done .
But actually , For different business scenarios , Selecting the appropriate character set can greatly improve our query efficiency
1、 If the business data is only in English and figures , Then it can be set to latin1, That is, the Latin character set , This can save a lot of storage space . If you can determine that you do not need to store data in multiple languages , There is no need to use utf8 or utf8mb4
2、 Different data conditions for different business tables , Choose a different character set , To reduce storage space , So as to improve query efficiency
3、 There is Chinese , Just use utf8mb4, And don't use utf8. because utf8 Most support 3 byte , But like some rare Chinese characters and emojj Expression is needed 4 Bytes of , Will result in storage errors bug,mysql The official in the 2010 Released in utf8mb4 To solve these problems bug.
2.4 Primary key selection
There are generally two schemes for selecting primary keys : First, select the unique natural primary key in the business data , For example, the ID number in the user table , The order number in the order form ; The second is to generate a business independent primary key through the primary key generation algorithm , such as UUID
We prefer the second option , That is, a random primary key is generated by the primary key generation algorithm , There are two benefits to doing this :
1 It has nothing to do with the business , Easier to maintain
Imagine , If it is in a multi tenant system , In many cases, in order to meet the management needs of the user company , We allow the user to enter a custom number , For example, tenants A Number entered by user of 111, If we use this unique number as the primary key , Because it's a multi tenant system , Data is isolated , Tenant B Of users can't see the number 111 Data. , But if he also happens to create a number 111 when , You will find that the database reports an error , Because duplicate primary keys are not allowed , But he would wonder : I didn't see 111 It has been entered , There is no display in the page . So in this case , Business independent primary keys become necessary .2 The general primary key generation algorithm is universal , General processing scheme can save time for overall development
summary
Okay , This concludes the principles of table structure optimization in this issue , The implementation of more details should be controlled according to the specific business requirements . If you think this article will help you , You might as well pay attention to it and give it a praise !
See you next time ~
边栏推荐
- [signal denoising] signal denoising based on FFT and fir with matlab code
- Skywalking source code analysis Part 5 - server configuration configuration module startup
- Hospital intelligent infusion management system source code hospital source code
- 使用贝叶斯优化进行深度神经网络超参数优化
- 金字塔测试原理:写好单元测试的8个小技巧,一文总结
- CMU 15-445 database course lesson 5 text version - buffer pool
- Anaconda installation, jupyter notebook default startup path modification and nbextensions plug-in installation
- 基于 Vue + Codemirror 实现 SQL 在线编辑器
- Practice of Flink CDC in Dajian cloud warehouse
- Lecture 30 linear algebra Lecture 2 Matrix
猜你喜欢

iMeta | 南科大夏雨组纳米孔测序揭示微生物可减轻高海拔冻土温室气体排放

Poj1028 web navigation

Raki's notes on reading paper: memory replace with data compression for continuous learning

556. next larger element iii- (31. next permutation) - two iterations

CMU 15 - 445 cours de base de données Leçon 5 version texte - Pool tampon

Yolov3 pytoch code and principle analysis (II): network structure and loss calculation

What is the workflow of dry goods MapReduce?

Operator new and placement new

On the translation of rich text storage database format

SISO decoder for SPC (supplementary Chapter 1)
随机推荐
[laravel series 7.5] event system
[C language questions -- 10 simple questions for leetcode]
Skywalking source code analysis Part 5 - server configuration configuration module startup
In 2021, the global barite product revenue was about $571.3 million, and it is expected to reach $710.2 million in 2028
MySQL federated index and BTREE
[assembly] analysis of Experiment 7 of the fourth edition of assembly language
司空见惯 - 会议室名称
Big work and requirements of economics in autumn 21 of Dagong [standard answer]
pstack和dmesg
巴比特 | 元宇宙每日必读:数字藏品二级市场乱象丛生,00后成新韭菜,监管迫在眉睫?...
SISO decoder for SPC (supplementary Chapter 1)
Building web applications
iMeta | 南科大夏雨组纳米孔测序揭示微生物可减轻高海拔冻土温室气体排放
01. Telecommunications_ Field business experience
金字塔测试原理:写好单元测试的8个小技巧,一文总结
Software requirements engineering review
MOS transistor 24n50 parameters of asemi, 24n50 package, 24n50 size
09 MySQL lock
SISO decoder for repetition (supplementary Chapter 4)
[solution] codeforces round 798 (Div. 2)