当前位置:网站首页>Why on earth is it not recommended to use select *?
Why on earth is it not recommended to use select *?
2022-07-28 20:48:00 【Hollis Chuang】

source : Cicadas bathe in the wind (ID:chanmufeng1994)
“ Do not use SELECT *” It has almost become a use MySQL A golden rule of , Even 《 Ali Java Development Manual 》 It is also clearly stated that * List of fields as query , It also gives this rule the blessing of authority .

But I use it directly in the development process SELECT * There are still more , There are two reasons :
Because of the simple , Very efficient development , And if you add or modify fields frequently later ,SQL The statement does not need to be changed ;
I think premature optimization is a bad habit , Unless you can determine at the beginning what fields you actually need in the end , And build an appropriate index ; otherwise , I choose to tell you when I'm in trouble SQL To optimize , Of course, if the trouble is not fatal .
But we have to know why we don't recommend using it directly SELECT *, This paper starts from 4 Give reasons in three aspects .
1. Unnecessary disk I/O
We know MySQL Essentially, user records are stored on disk , Therefore, the query operation is a kind of disk operation IO act ( The premise is that the records to be queried are not cached in memory ).
The more fields to query , Explain that the more content to read , Therefore, the disk size will be increased IO expenses . Especially when some fields are TEXT、MEDIUMTEXT perhaps BLOB And so on , The effect is particularly obvious .
That use SELECT * Will it make MySQL Take up more memory ?
Not in theory , Because for Server In terms of layers , It is not to store the complete result set in memory and then transfer it to the client all at once , Instead, each row obtained from the storage engine , It's called net_buffer In memory space of , The size of this memory is determined by the system variable net_buffer_length To control , The default is 16KB; When net_buffer Write the memory space of the local network stack after it is full socket send buffer Write data to the client , Send successfully ( Client read complete ) Empty after net_buffer, Then continue to read the next line and write .
in other words , By default , The result set takes up the most memory space, but net_buffer_length It's just size , It won't take up extra memory space just because of a few more fields .
2. Increase network delay
Take on the last point , Although I always put socket send buffer The data in is sent to the client , It seems that the amount of data is small at a time , But it can't hold. Someone really uses it * hold TEXT、MEDIUMTEXT perhaps BLOB The type field is also found , The total amount of data is large , This directly leads to more network transmission times .
If MySQL Not on the same machine as the application , This kind of expense is very obvious . Even if MySQL The server and the client are on the same machine , The protocol used is still TCP, Communication also takes extra time .
3. Cannot use overlay index
To illustrate the point , We need to build a watch
CREATE TABLE `user_innodb` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`gender` tinyint(1) DEFAULT NULL,
`phone` varchar(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDX_NAME_PHONE` (`name`,`phone`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; We created a storage engine for InnoDB Table of user_innodb, And set up id Primary key , In addition to name and phone Created a federated index , Finally, the table is initialized randomly 500W+ Data .
InnoDB The primary key will be automatically id Create a tree named primary key index ( Also known as clustered index ) Of B+ Trees , This B+ The most important feature of the tree is that the leaf node contains complete user records , It looks something like this .

If we execute this statement
SELECT * FROM user_innodb WHERE name = ' Cicadas bathe in the wind '; Use EXPLAIN Check the execution plan of the statement :

Found this SQL Statement will use IDX_NAME_PHONE Indexes , This is a secondary index . The leaf node of the secondary index looks like this :

InnoDB The storage engine will find... In the leaf node of the secondary index according to the search criteria name by Cicadas bathe in the wind The record of , But the secondary index only records name、phone And the primary key id Field ( Who let us use SELECT * Well ), therefore InnoDB You need to hold the key id Go to the primary key index to find this complete record , This process is called Back to the table .
Think about it , If the leaf node of the secondary index has all the data we want , Don't you need to go back to your watch ? Yes , This is it. Overlay index .
for instance , We just happen to want to search name、phone And the primary key field .
SELECT id, name, phone FROM user_innodb WHERE name = " Cicadas bathe in the wind "; Use EXPLAIN Check the execution plan of the statement :

You can see Extra A column shows Using index, It means that our query list and search criteria only contain columns belonging to an index , That is, the overlay index is used , Can directly abandon the operation of returning to the table , Greatly improve query efficiency .
4. May slow down JOIN Link query
We create two tables t1,t2 Connect to illustrate the next problem , And to t1 In the table 100 Data , towards t2 Inserted in 1000 Data .
CREATE TABLE `t1` (
`id` int NOT NULL,
`m` int DEFAULT NULL,
`n` int DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT;
CREATE TABLE `t2` (
`id` int NOT NULL,
`m` int DEFAULT NULL,
`n` int DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT;If we execute the following statement
SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.m = t2.m;I used it here STRAIGHT_JOIN Injunction
t1Table as drive table ,t2Table as driven table
For join queries , The drive table will only be accessed once , The driven table has to be accessed many times , The specific number of accesses depends on the number of records in the drive table that match the query records . Since the driven table and driven table have been forcibly determined , Now let's talk about the essence of the connection between two tables :
t1As a driving table , Filter conditions for the drive table , Execute ont1Table in the query . Because there are no filter conditions , That is to gett1All data of table ;For each record in the result set obtained in the previous step , To the driven table , Find matching records according to the connection filter conditions
If expressed in pseudo code, the whole process is like this :
// t1Res It's for the drive table t1 Filtered result set
for (t1Row : t1Res){
// t2 Table driven is complete
for(t2Row : t2){
if ( Satisfy join Conditions && Satisfy t2 The filter conditions of ){
Send to client
}
}
} This is the easiest way , But at the same time, the performance is also the worst , This way is called Nested loop connection (Nested-LoopJoin,NLJ). How to speed up the connection ?
One way is to create an index , It's best to be in the driven table (t2) Create an index on the field involved in the connection condition , After all, the driven table needs to be queried many times , Moreover, the access to the driven table is essentially a single table query ( because t1 The result set is fixed , Every connection t2 The query criteria are also dead ).
Now that the index is used , To avoid repeating the mistake of not being able to use overlay indexes , We should also try not to directly SELECT *, Instead, the fields actually used are used as query Columns , And build an appropriate index .
But if we don't use indexes ,MySQL Is it true to perform join query in the way of nested circular query ? Of course not. , After all, this nested loop query is too slow !
stay MySQL8.0 Before ,MySQL Provides Block based nested loop connections (Block Nested-Loop Join,BLJ) Method ,MySQL8.0 Has introduced hash join Method , Both methods are proposed to solve a problem , That is to minimize the number of accesses to the driven table .
Both methods use a method called join buffer Fixed size memory area , There are several records in the result set of the drive table ( The difference between the two methods is the form of storage ), In this way , When the table is loaded into memory , Disposable and join buffer Match the records in multiple drive tables in the , Because the matching process is completed in memory , Therefore, this can significantly reduce the number of driven tables I/O cost , It greatly reduces the cost of repeatedly loading the driven table from the disk . Use join buffer The process is shown in the figure below :

Let's take a look at the execution plan of the connection query above , I found that hash join( The premise is that there is no t2 Create an index for the join query field of the table , Otherwise, the index will be used , Can't use join buffer).

The best situation is join buffer Large enough , It can accommodate all records in the result set of the drive table , In this way, you only need to access the driven table once to complete the connection operation . We can use join_buffer_size This system variable is configured , The default size is 256KB. If it doesn't fit , Put the result set of the driving table into join buffer It's in , After the comparison in memory is completed , Empty join buffer Then load the next batch of result sets , Until the connection is complete .
The key is coming. ! Not all columns of the drive table records will be placed in join buffer in , Only the columns in the query list and the columns in the filter criteria will be placed in join buffer in , So remind us again , Better not put * As a query list , Just put the columns we care about in the query list , In this way, it can also be in join buffer Put more records in , Reduce the number of batches , This naturally reduces the number of accesses to the driven table .
End
My new book 《 In depth understanding of Java The core technology 》 It's on the market , After listing, it has been ranked in Jingdong best seller list for several times , At present 6 In the discount , If you want to start, don't miss it ~ Long press the QR code to buy ~

Long press to scan code and enjoy 6 A discount
Previous recommendation
Two sides of meituan : Redis 5 Basic data structures ?
Is dubious , Why? "𠮷𠮷𠮷" .length !== 3 ??
There is Tao without skill , It can be done with skill ; No way with skill , Stop at surgery
Welcome to pay attention Java Road official account

Good article , I was watching ️
边栏推荐
- Use of DDR3 (axi4) in Xilinx vivado (5) board test
- JS fly into JS special effect pop-up login box
- Yyds dry inventory interview must brush top101: every k nodes in the linked list are turned over
- 全链路灰度在数据库上我们是怎么做的?
- Install keras, tensorflow, and add the virtual environment to the Jupiter notebook
- 太空射击第10课: Score (繪畫和文字)
- Sorting out problems in interface endpoint testing practice using Supertest
- LVM logical volume
- [1331. Array serial number conversion]
- Nocturnal simulator settings agent cannot be saved
猜你喜欢

Subcontracting loading of wechat applet

太空射击第11课: Sound and Music

超大模型工程化实践打磨,百度智能云发布云原生AI 2.0方案

Configure Windows Server + install MySQL database on the server + Remote Access database

JS picture hanging style photo wall JS special effect
![Linxu [basic instructions]](/img/94/98d7b2cb4a72c6437a9f604ec5da9d.png)
Linxu [basic instructions]
![[complete collection of common ADB commands and their usage (from a comprehensive summary of [wake up on Sunday)]](/img/63/91b53b0ba718537383a97df59fe573.png)
[complete collection of common ADB commands and their usage (from a comprehensive summary of [wake up on Sunday)]

Learn about the native application management platform of rainbow cloud

Unity typewriter teaches you three ways

Explain rigid body and collider components in unity
随机推荐
[C语言刷题篇]链表运用讲解
Network shell
Voice controlled robot based on ROS (II): implementation of upper computer
System. ArgumentException: Object of type ‘System. Int64‘ cannot be converted to type ‘System.Int32‘
h5微信射击小游戏源码
一个程序员的水平能差到什么程度?尼玛,都是人才呀...
Subcontracting loading of wechat applet
DHCP.DNS.NFS
十七年运维老兵万字长文讲透优维低代码~
Speech controlled robot based on ROS (I): realization of basic functions
远光软件获得阿里云产品生态集成认证,携手阿里云共建新合作
有奖征文 | 2022 云原生编程挑战赛征稿活动开启
[pytorch] LSTM neural network
企业如何成功完成云迁移?
How to balance security and performance in SQL?
华为云数字资产链,“链”接数字经济无限精彩
MySQL batch update data
Want to draw a picture that belongs to you? AI painting, you can also
算法面试高频题解指南【一】
[task01: getting familiar with database and SQL]


