Several rounds of SQL queries in a database

We use the database , Intuitively, the client sends a SQL, The database puts this SQL Execute it. , The found data is returned to the client . But in fact SQL Be transformed behind , Optimize , Go through many 「 suffering 」 Just get the result back .

Pictured above , We see that it is from the query processor through the parser , Optimizer , Just entered the execution engine .

Today, let's first look at the query manager , Later, we will focus on how the query optimizer is cost-effective .

Query Manager

This part is the embodiment of database function . In this part , Will turn poorly written queries into code that can be executed quickly , And then execute it , And return the result to the client . This process involves several steps ：

First, analyze whether the query is legal
Then the query will be rewritten , Remove useless operators , And do some pre optimization
Optimize queries to improve performance , Transform queries into execution and data access plans
Compile query plan
perform

In this part , We won't say much about the last two points , Relatively speaking, they are not so critical .

Query parser

Every SQL Statements will pass through the parser to verify whether the syntax is correct . If you make a mistake , The parser will reject the query . For example, your hand is wrong , hold SELECT It has been written. SLECT, That will stop right here .

Besides , It will also check whether the keyword order is correct .

then , Inquire about SQL Table names and column names in are also analyzed , The parser will pass through the database metadata To check the following ：

Does the table exist
Whether the corresponding query field in the table exists
Whether the corresponding operator can act on the specified column ( For example, you can't compare a number with a string , You can't give one integer use substring)

Then it will check whether you have permission to read or write the corresponding table in the query , After all, these access rights are DBA The distribution of .

In the process of parsing , Inquire about SQL Will be converted into the internal representation of the database ( It's usually a tree ). If everything OK, The converted content will be sent to the query 「 Rewriter 」

Inquire about Rewriter

In this step , We got an internal representation of the query , The goal of rewriter is to ：

Pre optimize queries
Avoid useless operations
Help the optimizer find the best solution

Rewriter will execute a series of known rules on the query . If the query conforms to a certain rule pattern , This rule will be applied to rewrite the query . Here are ( Optional ) The rules of ：

View merging ： If the view is used in the query , That view will follow the SQL Code conversion .

Subquery leveling ： Queries with subqueries are difficult to optimize , So the rewriter will try to modify the query , Even delete subqueries .

for example

SELECT PERSON.* FROM PERSON WHERE PERSON.person_key IN (SELECT MAILS.person_key FROM MAILS WHERE MAILS.mail LIKE 'christophe%'); 
 Copy code

Will be this SQL Replace

SELECT PERSON.* FROM PERSON, MAILS WHERE PERSON.person_key = MAILS.person_key and MAILS.mail LIKE 'christophe%'; 
 Copy code

Remove useless operators ： If you use DISTINCT, But you already have one UNIQUE Constraints to ensure data uniqueness , that DISTINCT Keywords will be deleted .
Eliminate redundant connections ： If you have the same connection condition twice , Because a connection condition is hidden in the view , Or useless connections due to Transitivity , Delete it .
Continuous arithmetic evaluation ： If the query is the content to be calculated , Then it will be calculated once in the rewriting process . such as , hold WHERE AGE> 10 + 2 Convert to WHERE AGE> 12, And then TODATE(“ date ”) Convert to datetime Format date
( senior ) Partition correction ： If you use a partition table , Rewriter can find the partition to use .
( senior ) Instantiate view overrides ： If you already have an instantiated view that matches the query subset , Rewriter will check whether the view is the latest view , And modify the query to use the instantiated view instead of the original table .
( senior ) Custom rule ： If you create a custom rule to rewrite the query , The rewriter will execute these rules ( senior )Olap transformation ： analysis / Window function , Star connection , Summary … Will also be converted ( But whether it is done by rewriter or optimizer depends on the database , Because these two processes are adjacent ).

The rewritten query will be sent to the query optimizer , It's interesting .

Statistics

Before entering the database how to optimize queries , We need to talk about statistics first , Because there are no statistics , The database will be stupid . If you don't tell the database to analyze your data , It won't do that , And will make wrong assumptions .

What information does the database need ?

Let's talk about how databases and operating systems store data . The smallest unit they use is called page or block ( The default is 4 or 8 KB). in other words , If you just need 1 KB, It will also occupy a page . If the page occupies 8 KB, That will waste 7 KB.

Back to statistics , When you ask the database for statistical information , It will calculate these contents ：

The number of rows or pages in a table
Every column in a table
- Separate data content
- Length of data ( Minimum , Maximum , Average )
- Data range information ( Minimum 、 Maximum 、 Average )
Table index information

These statistics will help the optimizer better estimate the disks in the query I/O,CPU And the use of memory .

The statistics of each column are important . For example, a PERSON surface , Need to be in LAST_NAME, FIRST_NAME Connect two columns , Through the statistics , The database can know FIRST_NAME How many different values are there in this column ,LAST_NAME How many different values . So the database will use LAST_NAME,FIRST_NAME To connect , instead of FIRST_NAME,LAST_NAME, because LAST_NAME Unlikely to be the same , Less data will be generated . Most of the time , The first two or three characters of the database are compared LAST_NAME That's enough .

Of course, these are basic statistical information , You can also let the database calculate histograms This higher-order statistics . The most commonly used value , Quality and so on , Through these additional information , It can help the database find more efficient query plans , Especially like equivalent query , And range query . Because the database already knows how many records there are in this case .

These statistics are recorded in the metadata of the database . Therefore, it also needs to take time to constantly update . This is also why it does not update automatically in most databases .

Later articles , Will describe some details of the query optimizer .

After reading this part , Extended reading ：

The initial research paper (1979) on cost based optimization: Access Path Selection in a Relational Database Management System. This article is only 12 pages and understandable with an average level in computer science.
A very good and in-depth presentation on how DB2 9.X optimizes queries here
A very good presentation on how PostgreSQL optimizes queries here. It’s the most accessible document since it’s more a presentation on “let’s see what query plans PostgreSQL gives in these situations“ than a “let’s see the algorithms used by PostgreSQL”.
The official SQLite documentation about optimization. It’s “easy” to read because SQLite uses simple rules. Moreover, it’s the only official documentation that really explains how it works.
A good presentation on how SQL Server 2005 optimizes queries here
A white paper about optimization in Oracle 12c here
2 theoretical courses on query optimization from the authors of the book “DATABASE SYSTEM CONCEPTS”here and there. A good read that focuses on disk I/O cost but a good level in CS is required.
Another theoretical course that I find more accessible but that only focuses on join operators and disk I/O.

当前位置：网站首页>Several rounds of SQL queries in a database

Several rounds of SQL queries in a database

边栏推荐

猜你喜欢

随机推荐