当前位置：网站首页>Interviewer soul torture: why does the code specification require SQL statements not to have too many joins?

Interviewer soul torture: why does the code specification require SQL statements not to have too many joins?

2022-07-05 13:26:00 【Hollis Chuang】

Send questions

interviewer ： Have been operated Linux Do you ?

I ： Yes

interviewer ： I want to check the memory usage. What command should I use

I ：free perhaps top

interviewer ： Then you can use it free What information can be seen in the command

I ： that , As shown in the figure below You can see memory and cache usage

total Total memory
used Used memory
free Free memory
buff/cache Used cache
avaiable Available memory

interviewer ： Do you know how to clean up the used cache (buff/cache)

I ：em... I do not know!

interviewer ：sync; echo 3 > /proc/sys/vm/drop_caches You can clean it up buff/cache 了 , Can you tell me if I can execute this command online ?

I ：( Send questions , Inner joy ) The benefits are great , Clean up the cache and we have more memory available , Just follow pc above xx Like the guard's little rocket , click , To free up a lot of memory

interviewer ：em...., Go back and wait for the announcement

We can talk SQL Join

interviewer ： Change the subject , Talk to you about join The understanding of the

I ： well （ If you make another mistake, it's over , Seize the opportunity )

review

SQL Medium join According to some conditions, you can combine the specified table with and return the data to the client

join There are

inner join Internal connection

left join Left connection

right join The right connection

full join Full connection

The above picture is from ：cnblogs.com/reaptomorrow-flydream/p/8145610.html

interviewer ：

If it needs to be used in project development join sentence , How to optimize and improve performance ?

I ： There are two cases , Small data size , Large data scale .

interviewer : then ?

I ： about

Small data size It's all in memory. Whoa
Large data scale

Can be optimized by adding indexes join Statement execution speed Can be reduced by redundant information join The number of times Minimize the number of table connections , One SQL Do not connect statement tables more than 5 Time

interviewer ： It can be summarized as join Statement is relatively cost performance , Am I right? ？

I ： Yes

interviewer : Why? ?

buffer

I : In execution join There must be a process of comparison

interviewer : Yes

I ： Comparing two tables one by one is slow , So we can read the data from two tables into one in turn Memory block in , With MySQL Of InnoDB Engine as an example , By using the following statements, we can find the relevant memory area show variables like '%buffer%'

As shown in the figure below join_buffer_size The size of will affect us join Statement execution performance

interviewer : Besides ?

A big premise

I ： Any project will go online after all , Data generation is inevitable , The size of the data can't be too small

interviewer : That's true

I ： Most of the data in the database will eventually be saved to Hard disk On , And stored as a file .

With MySQL Of InnoDB Engine as an example

InnoDB With page (page) Basic IO Company , The size of each page is 16KB
InnoDB For each table, a .ibd file

verification

I ： This means that we need to read as many files as we have tables to connect , Although the index can be used , But it's still necessary to move the head of the hard disk frequently

interviewer ： That is to say, frequent movement of magnetic head will affect performance, right

I ： Yes , Don't all open source frameworks like to say that they have greatly improved performance through sequential reading and writing , such as hbase、kafka

interviewer ： That's right. , Then you think Linux Is this optimized ? Tips , You can do it again free Order to have a look

I ： Strange how the cache is occupied 1.2G many

picture source ：https://www.linuxatemyram.com/

interviewer : Have you ever thought about it

buff/cache What's in it ,？
Why? buff/cache Take up so much memory , Available memory is availlable also 1.1G？
Why can you clean it up with two orders buff/cache Memory footprint , And want to release used It can only be done by ending the process ?

product , Your delicacies

After thinking for a few minutes

I ： It's so easy to let go buff/cache Memory used , That means it doesn't matter , Clearing it will not affect the operation of the system

interviewer : Not exactly

I ： Is it ？ Remember 《CSAPP》（ Deep understanding of computer systems ） There's a word in it

The essence of memory hierarchy is , Each tier of storage device is the cache of the lower tier devices

Adult translation , That is to say Linux Think of memory as the cache of hard disk

Related information ：http://tldp.org/LDP/sag/html/buffer-cache.html

interviewer ： Now you know how to answer that question

I ： I ....

Join Algorithm

interviewer ： Give you another chance , If you can do it Join What would you do with the algorithm ?

I ： Without index , Nested loop is over . If there is an index , Index can be used to improve performance .

interviewer ： Back to join_buffer Do you think join_buffer What is stored in it ?

I ： During scanning , The database will select a table to put it Data to be returned and compared with other tables In the join_buffer

interviewer ： How to handle with index ？

I ： This is a little bit easier , Just read the index tree of two tables and compare them , Let me introduce the processing method without index

Nested Loop Join

Nested loop , Read only one row of data in the table at a time , That is to say if outerTable Yes 10 Ten thousand rows of data , innerTable Yes 100 Row data , Read required 10000000 Time ( Suppose the files of these two tables are not cached in memory by the operating system , We call it the cold data sheet )

Of course, no database engine uses this algorithm now （ Too slow )

Block nested loop

Block block , That is to say, every time a piece of data is taken to memory to reduce I/O The cost of

When no index is available ,MySQL InnoDB That's how it works

Consider the following two tables t_a and t_b

When index execution is not possible join During operation ,InnoDB Will be used automatically Block nested loop Algorithm

summary

At school , Database teachers like database paradigm best , I didn't learn to be performance oriented until I went to work , Redundancy means redundancy , There's no redundancy join If join It really affects performance . Try to get your join_buffer_size, Or replace the SSD .