当前位置：网站首页>MySQL index and its classification

MySQL index and its classification

2022-06-27 13:27:00 【User 3147702】

1. summary

mysql Indexes can easily improve query performance by several orders of magnitude , And one “ The optimal ” An index is sometimes better than a “ well ” Index performance is two orders of magnitude higher .

stay MySQL in , An index can contain the values of one or more columns , because MySQL Only the leftmost prefix column of the index can be used efficiently , So the order of columns in an index that contains multiple columns is also very important . Creating an index with two columns is very different from creating two indexes with one column each .

2. Type of index

MySQL in , There are many types of indexes , It can provide better performance for different scenarios . Indexing is implemented at the storage engine level , Indexing works differently for different storage engines , Not all engines support all index types , For the same kind of index , The underlying implementation of different engines may also be different .

3. B-Tree Indexes

majority MySQL Storage engines support B-Tree Indexes , Therefore ,B-Tree Index is the most commonly used index type , If not specified , Index generally refers to B-Tree Indexes . However , Although when creating tables, the keywords are B-Tree, But the underlying implementation of each storage engine may be different , Such as NDB The cluster storage engine actually uses T-Tree structure , and innoDB The use of B+ Tree.

3.1. disk IO With pre reading

Because the disk reading depends on mechanical motion , Each time it takes time to find a way 、 Rotation delay 、 Transmission time three parts of time to read data , The total time is very long , If the database is used for data query of hundreds of millions or even tens of millions , A few milliseconds at a time , The result will be disastrous . So the operating system has made some optimizations , Each time you read, you don't just read the data you need , Instead, all adjacent data is read into the memory buffer , such , Read one page at a time （4KB or 8KB）, For reading data on a page , In fact, only one disk test was performed IO operation .

3.2. B-Tree Characteristics of

B-Tree The structure of is as follows ：

because B The multi branch structure of tree , As a result, the height of the tree can be greatly reduced , such , If each node stores a page of data , If you need to access layer 3 data , Then you only need to do the disk test three times IO, This obviously saves a lot of time . B+ Trees and B The difference between trees is that only leaf nodes store real data , Other non leaf nodes are only used as data items to guide the search direction . In this way, the storage engine no longer needs full table scanning , Instead, you can quickly find the required data according to the guidance of each node .

meanwhile , because B The structural properties of trees , As a result, all values are usually stored in sequence , So in use ORDER BY In operation , This index can also meet the corresponding sorting requirements .

3.3. Matching rules for multi column indexes

CREATE TABLE People (
    a    varchar(50)    not null,
    b    varchar(50)    not null,
    c    date        not null,
    d    date        not null,
    e    enum('m', 'f')    not null,
    key(a, b, c, d)
);

For the above table , Created four column indexes , They follow the following rules .

Leftmost prefix matching principle This is a very important principle ,MySQL It will keep matching to the right until it encounters a range query （>、<、between、like） Such as query a="" and b="2" and c >= 3 and d = 4. In this query ,d There is no index , And if established (a, b, d, c) It's OK . meanwhile where The order of queries in the statement can be adjusted arbitrarily , namely a、b、c、d The order of can be adjusted at will ,MySQL Always query in the order in which the index is built .
The principle of maximum discrimination Try to select highly differentiated columns as indexes , Or put it on the left end , The more distinguishable , That is, the fewer result lines selected , Then the actual number of queries will be less .
The index column cannot participate in the calculation about from_unixtime(a) = ’2014-05-29’ Such queries cannot be indexed , Instead, it should be optimized to a = from_unixtime(‘2014-05-29’) such as a+1>5 Only optimized as a > 4 The index will be applied .
The query must start with the leftmost column If you inquire b = 5 and c < 2014 The index... Is not applied , This is also the leftmost prefix matching principle .
You can't skip columns in an index For queries a=5 and c > 2015, Because of skipping b Column , therefore c No index will be applied .
explain The above limitations exist in MySQL 5.5 And previous database versions , Future versions may remove some restrictions . However , You can see , How to select indexed columns when creating tables , And their order is very important .

4. Hash index

4.1. brief introduction

CREATE TABLE testhash (
    a varchar(50) not null,
    b varchar(50) not null,
    KEY USING HASH(a)
) ENGINE=MEMORY;

In the process of creating the table above, a hash index is created .

seeing the name of a thing one thinks of its function , The underlying data structure of hash index is implemented by hash table , Only queries that exactly match all columns of the index are valid . The index will create a small hash code for each row of data , So the hash index takes up less space , High execution efficiency , However, only equivalent queries are supported , Range query is not supported . meanwhile , Because the hash table does not store values in the order of their size , So in ORDER BY The index is not applied in the operation , It is also not supported to use only some columns in the index for searching . however , If it is some specific occasions that are suitable for using hash index , The performance gains from indexing will be significant , Like the classic “ Star type ” schema, Many lookup tables need to be associated , Hash index is very suitable for the needs of lookup table .

4.2. Hash index and storage engine

The hash index is MEMORY The default indexing method of the storage engine ,MEMORY The engine also supports B-Tree Indexes , at present , stay MySQL in , Only MEMORY The engine explicitly supports hash indexing .

InnoDB The engine has a special function — adaptive hash index , For frequently used index values ,InnoDB The engine will automatically create a hash index in memory , The user can only choose whether to enable this feature through configuration , Once enabled , The process will be fully automated , What the user cannot perceive . InnoDB Creating an adaptive hash index is not the same thing as a real hash index , But in the original B-Tree Based on the index , Change the retrieved value into a hash code , To reduce disk usage .

4.3. Custom hash index

For storage engines that do not support hash indexing , Users can also use similar InnoDB To customize the hash index . A typical example is url become CRC32, Can effectively save disk usage , And improve the query speed .

For example, for the following query ：

SELECT id FROM url WHERE url = 'http://www.techlog.cn/article/list/10182793';

Such a query is obviously time-consuming , And if url Create index , The index will also be very large .

The optimization is as follows ：

SELECT id FROM url WHERE crc32_url = CRC32('http://www.techlog.cn/article/list/10182793');

such , We are crc32_url Field creation index , The size of the index 、 Query efficiency will be significantly improved .

however , In this way, a new field needs to be maintained crc32_url, By creating triggers , This field can be automatically added ：

CREATE TABLE pseudohash (
    id    int unsigned NOT NULL auto_increment,
    url    varchar(255) NOT NULL,
    url_crc    int unsigned NOT NULL DEFAULT 0,
    PRIMARY KEY(id)
    KEY(url_crc);
);

DELIMITER //

CREATE TRIGGER pseudohash_crc_ins BEFORE INSERT ON pseudohash FOR EACH ROW BEGIN
SET NEW.url_crc = crc32(NEW.url);
END;
//

CREATE TRIGGER pseudohash_crc_upd BEFORE UPDATE ON pseudohash FOR EACH ROW BEGIN
SET NEW.url_crc = crc32(NEW.url);
END;
//

DELIMITER ;

such , Whenever you add or modify url Field , The trigger will automatically update url_crc Field .

Due to possible hash conflicts , Therefore, direct query may result in multiple records , It can be optimized to ：

SELECT id FROM url WHERE crc32_url = CRC32('http://www.techlog.cn/article/list/10182793') and url = 'http://www.techlog.cn/article/list/10182793';

5. Spatial data index （R-Tree）

MyISAM Tables support spatial indexes , It can be used as a geographic data store . And B-Tree Different indexes , Spatial data indexes do not require prefix queries , He will index data from all dimensions , Any combination of queries . But you have to use MySQL Of GIS Correlation function , Such as MBRCONTAINS() To maintain data , However MySQL Yes GIS Support is not perfect , So most people won't use this feature . PostgreSQL Of PostGIS Yes GIS The support is good .

6. Full-text index

The full-text index looks up the keywords in the text , Instead of comparing the values in the index , Similar to search engines . Use MATCH AGAINST Operation to index , Chinese... Is not supported at this time .

7. Other indexes

There are also many third-party storage engines that use different types of data structures to store indexes , They have different application scenarios and advantages .

原网站

版权声明
本文为[User 3147702]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/178/202206271253468521.html