当前位置:网站首页>Several rounds of SQL queries in a database
Several rounds of SQL queries in a database
2022-07-27 12:06:00 【Humble Xiao Zhang】
We use the database , Intuitively, the client sends a SQL, The database puts this SQL Execute it. , The found data is returned to the client . But in fact SQL Be transformed behind , Optimize , Go through many 「 suffering 」 Just get the result back .
Pictured above , We see that it is from the query processor through the parser , Optimizer , Just entered the execution engine .
Today, let's first look at the query manager , Later, we will focus on how the query optimizer is cost-effective .
Query Manager
This part is the embodiment of database function . In this part , Will turn poorly written queries into code that can be executed quickly , And then execute it , And return the result to the client . This process involves several steps :
- First, analyze whether the query is legal
- Then the query will be rewritten , Remove useless operators , And do some pre optimization
- Optimize queries to improve performance , Transform queries into execution and data access plans
- Compile query plan
- perform
In this part , We won't say much about the last two points , Relatively speaking, they are not so critical .
Query parser
Every SQL Statements will pass through the parser to verify whether the syntax is correct . If you make a mistake , The parser will reject the query . For example, your hand is wrong , hold SELECT It has been written. SLECT, That will stop right here .
Besides , It will also check whether the keyword order is correct .
then , Inquire about SQL Table names and column names in are also analyzed , The parser will pass through the database metadata To check the following :
- Does the table exist
- Whether the corresponding query field in the table exists
- Whether the corresponding operator can act on the specified column ( For example, you can't compare a number with a string , You can't give one integer use substring)
Then it will check whether you have permission to read or write the corresponding table in the query , After all, these access rights are DBA The distribution of .
In the process of parsing , Inquire about SQL Will be converted into the internal representation of the database ( It's usually a tree ). If everything OK, The converted content will be sent to the query 「 Rewriter 」
Inquire about Rewriter
In this step , We got an internal representation of the query , The goal of rewriter is to :
- Pre optimize queries
- Avoid useless operations
- Help the optimizer find the best solution
Rewriter will execute a series of known rules on the query . If the query conforms to a certain rule pattern , This rule will be applied to rewrite the query . Here are ( Optional ) The rules of :
View merging : If the view is used in the query , That view will follow the SQL Code conversion .
Subquery leveling : Queries with subqueries are difficult to optimize , So the rewriter will try to modify the query , Even delete subqueries .
for example
SELECT PERSON.* FROM PERSON WHERE PERSON.person_key IN (SELECT MAILS.person_key FROM MAILS WHERE MAILS.mail LIKE 'christophe%');
Copy code Will be this SQL Replace
SELECT PERSON.* FROM PERSON, MAILS WHERE PERSON.person_key = MAILS.person_key and MAILS.mail LIKE 'christophe%';
Copy code - Remove useless operators : If you use DISTINCT, But you already have one UNIQUE Constraints to ensure data uniqueness , that DISTINCT Keywords will be deleted .
- Eliminate redundant connections : If you have the same connection condition twice , Because a connection condition is hidden in the view , Or useless connections due to Transitivity , Delete it .
- Continuous arithmetic evaluation : If the query is the content to be calculated , Then it will be calculated once in the rewriting process . such as , hold WHERE AGE> 10 + 2 Convert to WHERE AGE> 12, And then TODATE(“ date ”) Convert to datetime Format date
- ( senior ) Partition correction : If you use a partition table , Rewriter can find the partition to use .
- ( senior ) Instantiate view overrides : If you already have an instantiated view that matches the query subset , Rewriter will check whether the view is the latest view , And modify the query to use the instantiated view instead of the original table .
- ( senior ) Custom rule : If you create a custom rule to rewrite the query , The rewriter will execute these rules ( senior )Olap transformation : analysis / Window function , Star connection , Summary … Will also be converted ( But whether it is done by rewriter or optimizer depends on the database , Because these two processes are adjacent ).
The rewritten query will be sent to the query optimizer , It's interesting .
Statistics
Before entering the database how to optimize queries , We need to talk about statistics first , Because there are no statistics , The database will be stupid . If you don't tell the database to analyze your data , It won't do that , And will make wrong assumptions .
What information does the database need ?
Let's talk about how databases and operating systems store data . The smallest unit they use is called page or block ( The default is 4 or 8 KB). in other words , If you just need 1 KB, It will also occupy a page . If the page occupies 8 KB, That will waste 7 KB.
Back to statistics , When you ask the database for statistical information , It will calculate these contents :
- The number of rows or pages in a table
- Every column in a table
- Separate data content
- Length of data ( Minimum , Maximum , Average )
- Data range information ( Minimum 、 Maximum 、 Average )
- Table index information
These statistics will help the optimizer better estimate the disks in the query I/O,CPU And the use of memory .
The statistics of each column are important . For example, a PERSON surface , Need to be in LAST_NAME, FIRST_NAME Connect two columns , Through the statistics , The database can know FIRST_NAME How many different values are there in this column ,LAST_NAME How many different values . So the database will use LAST_NAME,FIRST_NAME To connect , instead of FIRST_NAME,LAST_NAME, because LAST_NAME Unlikely to be the same , Less data will be generated . Most of the time , The first two or three characters of the database are compared LAST_NAME That's enough .
Of course, these are basic statistical information , You can also let the database calculate histograms This higher-order statistics . The most commonly used value , Quality and so on , Through these additional information , It can help the database find more efficient query plans , Especially like equivalent query , And range query . Because the database already knows how many records there are in this case .
These statistics are recorded in the metadata of the database . Therefore, it also needs to take time to constantly update . This is also why it does not update automatically in most databases .
Later articles , Will describe some details of the query optimizer .
After reading this part , Extended reading :
- The initial research paper (1979) on cost based optimization: Access Path Selection in a Relational Database Management System. This article is only 12 pages and understandable with an average level in computer science.
- A very good and in-depth presentation on how DB2 9.X optimizes queries here
- A very good presentation on how PostgreSQL optimizes queries here. It’s the most accessible document since it’s more a presentation on “let’s see what query plans PostgreSQL gives in these situations“ than a “let’s see the algorithms used by PostgreSQL”.
- The official SQLite documentation about optimization. It’s “easy” to read because SQLite uses simple rules. Moreover, it’s the only official documentation that really explains how it works.
- A good presentation on how SQL Server 2005 optimizes queries here
- A white paper about optimization in Oracle 12c here
- 2 theoretical courses on query optimization from the authors of the book “DATABASE SYSTEM CONCEPTS”here and there. A good read that focuses on disk I/O cost but a good level in CS is required.
- Another theoretical course that I find more accessible but that only focuses on join operators and disk I/O.
边栏推荐
- Difference between verification and calibration
- JS-寄生组合式继承
- [machine learning whiteboard derivation series] learning notes - support vector machine and principal component analysis
- go 用本地代码replace
- MATLAB画带延时系统的伯德图
- Introduction to box diagram
- The chess robot "broke" the chess boy's finger...
- Weibo comment crawler + visualization
- Guangdong's finance has taken many measures to help stabilize the "ballast stone" of food security
- While loop instance in shell
猜你喜欢

MySQL数据库主从复制集群原理概念以及搭建流程

【产品】关于微信产品分析

哈希表 详细讲解

Why is ack=seq+1 when TCP shakes hands three times

阿里云云数据库RDS版Exception during pool initialization

解决方案:Can not issue executeUpdate() or executeLargeUpdate() for SELECTs

Iptables firewall

MATLAB画带延时系统的伯德图

Analysis of the use of JUC framework from runnable to callable to futuretask

我在英国TikTok做直播电商
随机推荐
【机器学习-白板推导系列】学习笔记---概率图模型和指数族分布
[machine learning whiteboard derivation series] learning notes - support vector machine and principal component analysis
shell中的while循环实例
Analysis of the use of JUC framework from runnable to callable to futuretask
Check the number of file descriptors opened by each process under the system
deeplab系列详解(简单实用年度总结)
Difference between verification and calibration
微机和单片机的区别
Conversion between multiple bases
CH340模块无法识别/烧写不进的一种可能性
Top 10 in the 5.3 billion Bi Market: fansoft, Microsoft, Yonghong, sap, Baidu, IBM, SAS, smart, salesforce, Inspur soft
Shell script text three swordsmen sed
Regular expression of shell programming (grep of shell script text three swordsmen)
希腊字母读法
银行人脸识别系统被攻破:一储户被偷走 43 万元
Proteus8专业版破解后用数码管闪退的解决
Sync.map of go language
Leetcode 03: t58. Length of the last word (simple); Sword finger offer 05. replace spaces (simple); Sword finger offer 58 - ii Rotate string left (simple)
关于离线缓存Application Cache /使用 manifest文件缓存
解决方案:idea project没有显示树状图