当前位置：网站首页>MySQL index

MySQL index

2022-06-12 23:52:00 【InfoQ】

Indexes

Index is help MySQL

Efficient access to data

data structure （ Orderly ）

. Out of data , The database system also maintains a data structure that satisfies a specific search algorithm , These data structures are referenced in some way （ Point to ） data , In this way, advanced query algorithms can be implemented on these data structures , This data structure is the index .

Advantages and disadvantages ：

advantage ：

Improve the efficiency of data retrieval , Reduce the IO cost

Sort data through index columns , Reduce the cost of sorting data , Reduce CPU Consumption of

shortcoming ：

Index columns also take up space

Indexing greatly improves query efficiency , But it reduces the speed of update , such as INSERT、UPDATE、DELETE

Index structure

B-Tree

The disadvantage of binary tree can be solved by red black tree ：

Red and black trees also have a large amount of data , Deeper levels , The problem of slow retrieval speed .

In order to solve the above problems , have access to B-Tree structure .B-Tree ( Multiway balanced search tree ) At a maximum degree （max-degree, Refers to the number of child nodes of a node ） by 5（5 rank ） Of b-tree For example （ Each node can store up to 4 individual key,5 A pointer to the ）

B-Tree Data insertion process animation reference ：https://www.bilibili.com/video/BV1Kr4y1i7ru?p=68 Demo address ：https://www.cs.usfca.edu/~galles/visualization/BTree.html

B+Tree

chart ：

Demo address ：https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

And B-Tree The difference between ：

All the data will appear in the leaf node

Leaf nodes form a one-way linked list

MySQL Index data structure for classic B+Tree optimized . In the original B+Tree On the basis of , Add a pointer to the linked list of adjacent leaf nodes , So we have a sequence pointer B+Tree, Improve the performance of interval access .

Hash

Hash index is to use a certain hash Algorithm , Convert key values to new hash value , Map to the corresponding slot , Then stored in hash In the table . If two （ Or more ） Key value , Map to the same slot , They produced hash Conflict （ Also known as hash Collision ）, It can be solved by linked list .

characteristic ：

Hash Indexes can only be used for peer-to-peer comparisons （=、in）, Range query is not supported （betwwn、>、<、...）

Cannot complete sort operation with index

High query efficiency , Usually only one search is needed , Efficiency is usually higher than B+Tree Indexes

Storage engine support ：

Memory

InnoDB: It has adaptive function hash function ,hash The index is the storage engine B+Tree The index is automatically built under specified conditions

Interview questions

Why? InnoDB The storage engine chooses to use B+Tree Index structure ？

Relative to a binary tree , Fewer levels , High search efficiency

about B-Tree, Whether leaf nodes or non leaf nodes , Data will be saved , This results in fewer key values stored in a page , The pointer decreases as well , Save a lot of data as well , Can only increase the height of the tree , Resulting in reduced performance

be relative to Hash Indexes ,B+Tree Support range matching and sorting operations

Index classification

stay InnoDB In the storage engine , According to the storage form of the index , It can be divided into the following two types ：

Illustration ：

Clustered index selection rules ：

If there is a primary key , A primary key index is a clustered index

If there is no primary key , The first unique... Will be used (UNIQUE) Index as clustered index

If the table does not have a primary key or an appropriate unique index , be InnoDB It will automatically generate a rowid As a hidden clustered index

Thinking questions

1. following SQL sentence , Which is more efficient ？ Why? ？

select * from user where id = 10;
select * from user where name = 'Arm';
--  remarks ：id Primary key ,name Fields are indexed

answer ： The first sentence , Because the second item needs to be queried back to the table , Equivalent to two steps .

2. InnoDB Primary key index B+Tree What's the height ？

answer ： Suppose the size of a row of data is 1k, A page can store 16 Line this data .InnoDB Pointer occupation of 6 Bytes of space , The primary key is assumed to be bigint, The number of bytes occupied is 8. We can get the formula ：

n * 8 + (n + 1) * 6 = 16 * 1024

, among 8 Express bigint Number of bytes occupied ,n Represents the storage of the current node key The number of ,(n + 1) Indicates the number of pointers （ Than key More than a ）. Work out n about 1170.

If the height of the tree is 2, Then the amount of data he can store is about ：

1171 * 16 = 18736

; If the height of the tree is 3, Then the amount of data he can store is about ：

1171 * 1171 * 16 = 21939856

in addition , If there are thousands of data , Then we should consider the sub table , Knowledge involved in operation and maintenance .

grammar

Create index ：

CREATE [ UNIQUE | FULLTEXT ] INDEX index_name ON table_name (index_col_name, ...);

If not CREATE Do not add index type parameters after , Then a general index is created

Look at the index ：

SHOW INDEX FROM table_name;

Delete index ：

DROP INDEX index_name ON table_name;

Case study ：

-- name The field is the name field , The value of this field may be repeated , Create an index for this field 
create index idx_user_name on tb_user(name);
-- phone The value of the mobile phone number field is not empty , And only , Create a unique index for this field 
create unique index idx_user_phone on tb_user (phone);
--  by profession, age, status Create a federated index 
create index idx_user_pro_age_stat on tb_user(profession, age, status);
--  by email Establish an appropriate index to improve query efficiency 
create index idx_user_email on tb_user(email);

--  Delete index 
drop index idx_user_email on tb_user;

Usage rule

The leftmost prefix rule

If the index is associated with multiple columns （ Joint index ）, Follow the leftmost prefix rule , The leftmost prefix rule means that the query starts from the leftmost column of the index , And don't skip columns in the index . If you jump a column , The index will be partially invalidated （ The following field index is invalid ）.

In the union index , Range query appears （<, >）, The column index on the right side of the range query is invalid . It can be used >= perhaps <= To avoid index failure .

Index failure

Perform operations on index columns , Index will fail . Such as ：
explain select * from tb_user where substring(phone, 10, 2) = '15';

When using string type fields , Without quotes , Index will fail . Such as ：
explain select * from tb_user where phone = 17799990015;
, here phone The value of is not quoted

Fuzzy query , If it's just tail blur matching , The index will not be invalidated ; If it's a fuzzy head match , Index failure . Such as ：
explain select * from tb_user where profession like '% engineering ';
, Before and after % It's going to fail .

use or The conditions of separation , If or The column of one of the conditions has no index , Then the indexes involved will not be used .

If MySQL Evaluation uses indexes more slowly than full tables , Index is not used .

SQL Tips

Is an important means to optimize the database , Simply speaking , Is in the SQL Add some human prompts in the statement to optimize the operation .

for example , Use index ：

explain select * from tb_user use index(idx_user_pro) where profession=" Software Engineering ";

Which index is not used ：

explain select * from tb_user ignore index(idx_user_pro) where profession=" Software Engineering ";

Which index must be used ：

explain select * from tb_user force index(idx_user_pro) where profession=" Software Engineering ";

use It's a suggestion , Which index is actually used MySQL They will also weigh their running speed to change ,force Is to force the use of the index anyway .

Overlay index & Return to the table for query

Try to use overlay index （ The query uses an index , And the columns that need to be returned , All can be found in this index ）, Reduce select *.

explain in extra Field meaning ：

using index condition

： Search uses index , But you need to return the table to query the data

using where; using index;

： Search uses index , But all the data needed can be found in the index column , So there is no need to query back to the table

If the corresponding row can be found directly in the clustered index , Then directly return row data , Just one query , Even if it is select *; Find the secondary index in the secondary index , Such as

select id, name from xxx where name='xxx';

, You only need to use the secondary index (name) Find the corresponding id, return name and name The index corresponds to id that will do , Just one query ; If you are looking for other fields through the secondary index , You need to query back to the table , Such as

select id, name, gender from xxx where name='xxx';

So try not to use

select *

, Easy to appear back to table query , Reduce efficiency , Unless a union index contains all fields

Interview questions ： A watch , There are four fields （id, username, password, status）, Because of the amount of data , The following SQL Statement optimization , How to proceed is the best solution ：

select id, username, password from tb_user where username='itcast';

Explain ： to username and password Field to create a federated index , There is no need to query back to the table , Directly overwrite the index

Prefix index

When the field type is string （varchar, text etc. ） when , Sometimes you need to index long strings , This makes the index big , When inquiring , Waste a lot of disk IO, Affecting query efficiency , In this case, only part of the prefix of the string can be reduced , Index , This can greatly save index space , To improve index efficiency .

grammar ：

create index idx_xxxx on table_name(columnn(n));

Prefix length ： It can be determined according to the selectivity of the index , Selectivity refers to index values that are not repeated （ base ） And the total number of records in the data table , The higher the index selectivity, the higher the query efficiency , The only index selectivity is 1, This is the best index selectivity , Performance is also the best . Find the selectivity formula ：

select count(distinct email) / count(*) from tb_user;
select count(distinct substring(email, 1, 5)) / count(*) from tb_user;

show index Inside sub_part You can see the length of the connection

Single index & Joint index

Single index ： That is, an index contains only a single column union index ： That is, an index contains multiple columns in the business scenario , If there are multiple query criteria , Consider when indexing fields , It is recommended to establish a joint index , Instead of a single column index .

Single column index ：

explain select id, phone, name from tb_user where phone = '17799990010' and name = ' Han xin ';

This sentence only uses phone Index field

matters needing attention

When multi condition joint query ,MySQL The optimizer evaluates which fields are more efficient to index , The index will be selected to complete this query

Design principles

For large amount of data , The tables that are frequently queried are indexed

For often used as query criteria （where）、 Sort （order by）、 grouping （group by） Index the fields of the operation

Try to select highly differentiated columns as indexes , Try to build a unique index , The more distinguishable , The more efficient the index is

If it is a string type field , Long field length , You can focus on the characteristics of the field , Building prefix index

Try to use a federated index , Reduce single column index , When inquiring , Joint indexes can often overwrite indexes , Save storage space , Avoid returning to your watch , Improve query efficiency

To control the number of indexes , The index is not that more is better , More indexes , The more expensive it is to maintain the index structure , It will affect the efficiency of addition, deletion and modification

If the index column cannot store NULL value , Please use... When creating the table NOT NULL Constrain it . When the optimizer knows whether each column contains NULL When the value of , It can better determine which index is most effectively used for queries

原网站

版权声明
本文为[InfoQ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/163/202206122348177815.html