当前位置：网站首页>[MySQL from introduction to proficiency] [advanced chapter] (VIII) clustered index & non clustered index & joint index

[MySQL from introduction to proficiency] [advanced chapter] (VIII) clustered index & non clustered index & joint index

2022-07-26 19:09:00 【Man Nong Feige】

Hello! , I'm Manon Feige , Thank you for reading this article , Welcome to three links with one button .
1. Python Basic column , Basic knowledge in a net ,9.9 Yuan can't buy a loss , I can't buy it . Python From entry to mastery
️ 2. Python Crawler column , Systematically learn the knowledge points of reptiles .9.9 Yuan can't buy a loss , I can't buy it .python Reptile beginner level
️ 3. Ceph actual combat , Everything from principle to actual combat . Ceph actual combat
️ 4. Java Introduction to high concurrency programming , Punch in to learn Java High concurrency . Java Introduction to high concurrency programming
5. Take a stroll around the community , Weekly benefits , There are surprises every week . Manon Feige community , Leap plan
The whole network has the same name 【 Manon Feige 】 Welcome to your attention , personal VX: wei158556

List of articles

1. brief introduction

Last article we introduced 【MySQL From entry to mastery 】【 Advanced 】（ 7、 ... and ） Design an index &InnoDB Index scheme in , This paper introduces how to design an index , as well as InnoDB How to form the index in .
This article then introduces InnoDB The index of , The index is physically implemented , The index can be divided into 2 Clustering index and non clustering index . We also call non clustered indexes secondary indexes or auxiliary indexes .

2. Environmental Science

Environmental Science	edition
Red Hat	4.8.5-39
MySQL	5.7

3. Cluster index

Clustered index is not an index type , It's a way of storing data （ All user records are stored in the leaf node ）, That is to say, index is data , Data is index .
The term ：“ Clustering ” Indicates that data rows and adjacent key values are clustered and stored together .
Insert picture description here

characteristic ：

Use the size of the record primary key value to sort records and pages , This includes three meanings ：
- The records in the page are arranged into a one-way linked list according to the size order of the primary key value .
- Each store Page of user record It is also arranged into a two-way linked list according to the primary key size of user records in the page .
- Deposit Page of directory entry record Divided into different levels , The pages in the same level are also arranged into a two-way linked list according to the primary key size of the directory item records in the page .
B+ Treelike Leaf node What is stored is a complete user record .
The so-called complete user record means that the values of all columns are stored in this record （ Include hidden columns ）.
We put B+ The tree is called Cluster index , All complete user records are stored in this Cluster index At the leaf node of . This clustering index does not require us to MySQL Pass through INDEX Statement to create ,InnoDB The storage engine will automatically create clustered indexes for us .

advantage ：

Data access is faster , Because the clustered index keeps the index and data in the same B+ In the tree , So getting data from clustered indexes is faster than non clustered indexes .
Clustering index for primary key Sort search And range lookup speed is very fast .
Sort by cluster index , When a query displays a certain range of data , Because the data is tightly connected , The database doesn't have to extract data from multiple data blocks , So it saves a lot of IO operation .

shortcoming ：

The insertion speed depends heavily on the insertion order , Inserting in the order of the primary keys is the fastest way , Otherwise, there will be page splitting , Seriously affect performance . therefore , about InnoDB surface , We usually define a Self increasing ID List as primary key .
The cost of updating the primary key is very high , Because it will cause the updated row to move , therefore , about InnoDB surface , We generally define that the primary key cannot be updated .
Secondary index access requires two index lookups , Find the primary key value for the first time , The second time, find the row data according to the primary key value .

Limit ：

about MySQL The database currently has only InnoDB Data engine supports clustering index , and MyISAM Clustering index is not supported .
There can only be one way to sort data in physical storage , So each MySQL The table of can only have one clustered index . In general, it is the primary key of the table .
If there is no primary key defined ,InnoDB Will choose Non empty unique index Instead of , If there is no such index ,InnoDB Will implicitly define a primary key as the cluster index .
In order to make full use of the clustering characteristics of clustering index , therefore InnoDB The primary key column of the table should be Orderly order id, It is not recommended to use disordered id, such as UUID、MD5、HASH、 The character string column as the primary key cannot guarantee the sequential growth of data .

4. Nonclustered index （ Secondary indexes , Secondary index ）

The clustering index on the top is only when the search condition is Primary key value It can only work when , because B+ The data in the tree is sorted by primary key . What if we want to use other columns as search criteria ？ It must not be traversing and recording along the linked list from beginning to end .
The answer is ： We can build more trees B+ Trees , Different B+ The data in the tree uses different sorting rules , For example, we use age The size of the column is used as the collation of the records in the data page , Build another one B+ Trees , The effect is shown below ：
Insert picture description here

Use records age Sort records and pages according to the size of columns , This includes three meanings ：
- The records in the page are in accordance with age The size order of the columns is arranged into a one-way linked list .
- Each store Page of user record Also according to the record on page age The columns are arranged in order of size Double linked list .
- Deposit Page of directory entry record Divided into different levels , Pages in the same hierarchy are also recorded according to the directory entries in the page age The columns are arranged in order of size Double linked list .
- The directory entry record is no longer Primary key + Page number , And become age Column + Page number The collocation of .
  So if we want to pass now age The value of the column can be used to find some records B+ Trees , To find age The value of the column is 28 For example , The search process is as follows ;
1. determine Catalog item record page
  according to Root page , That's the page 33, Can quickly locate Catalog item record The page is page 42
2. adopt Catalog item record Page determines the page where the user record is actually located .
  stay page 42 You can quickly locate Catalog item record The page is page 46 （ because 20<28<30）.
3. Locate the specific record in the page where the user record is actually stored .
4. But this B+ The records in the leaf node of the tree only store age and id（ That's the primary key ） Two columns , So we have to find the complete user record again in the clustering index according to the primary key value .

4.1. Back to the table

Based on this, we take age Sort by column size B+ We can only determine the primary key value of the record we are looking for , So if we want to be based on age If the value of the column finds the complete user record , Still need to Cluster index Check again in , This process is called Back to the table . That is to say, according to age To query a complete user record by using the value of the column 2 Tree B+ Trees .
problem ： Why do we need a table return operation ？ Can't you put the complete user record directly on the leaf node ？
answer ： If you put the complete user record in the leaf node, you don't need to go back to the table , But there is more than one non clustered index , If every tree is built B+ The tree needs to copy all user records once , This is a waste of storage space .
Because of this according to Non primary key columns The establishment of a B+ The tree needs a table back operation to locate the complete user record , So this kind of B+ The tree is also called Secondary indexes , perhaps Secondary index . Because we use age The size of the column is used as B+ Tree sorting rules , So we also call this B+ The tree is age Index created by column .
The existence of non clustered index does not affect the organization of data in clustered index , So there can be multiple non clustered indexes in a table .

5. Joint index

We can also use the size of multiple columns as the sorting rule at the same time , That is, index multiple columns at the same time , Let's say we want B+ The tree follows age Column sum name Sort column sizes , This contains two meanings ：

First, follow the records and pages age Sort columns .
On record age In the same case , use name Sort columns .
by age Column sum name The diagram of the index created by the column is as follows ：

As shown in the figure , We need to pay attention to the following ：

Every one of them Catalog item record All by age、name、 The page number consists of three parts , Each record is based on age Sort the values of the columns , If it's recorded age The columns are the same , According to name Sort the values of the columns .
B+ Trees Leaf node The user record at is by age,name And the primary key id Column composition .
Be careful. , With age Column sum name The size of the column is set up by the collation B+ The tree is called Joint index , It's essentially a secondary index . Its meaning and distinction are age and name Columns are indexed differently , The differences are as follows ;
Set up a joint index Only one tree like the one above will be built B+ Trees .
by age and name Columns are indexed separately with age and name The size of the column creates two columns for the collation B+ Trees .

summary

This paper introduces clustering index in detail , Non clustered index and federated index ,MySQL A clustered index will be created with the primary key column by default , Used to store complete user record data , Non clustered and federated indexes only store index columns and primary keys . Querying a complete user record according to a non clustered index requires a back table query .

原网站

版权声明
本文为[Man Nong Feige]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207261743435630.html