当前位置：网站首页>B+ tree | MySQL index usage principle

B+ tree | MySQL index usage principle

2022-06-29 13:20:00 【Full stack programmer webmaster】

‘’MYSQL I haven't known much , Written before sql Before preparing to submit the production environment , The old staff helped me check sql, Let me modify the storage engine , I was using Myisam, Change the back to InnoDB 了 . Why change it to this , I've never heard of a storage engine before , So I checked it online .

In fact, using different storage engines is also very different , The following ape friends can learn about .

One 、 Comparison of storage engines

notes ： As mentioned above B The tree index does not indicate that B-Tree and B+Tree Indexes , however B- Trees and B+ The definition of a tree is different .

stay MySQL in , There are four main types of indexes , Respectively ： B-Tree Indexes , Hash Indexes , Fulltext Index and R-Tree Indexes .

B-Tree The index is MySQL The most frequently used index type in a database , except Archive All storage engines other than storage engines support B-Tree Indexes .Archive Engine until MySQL 5.1 Index is supported , And only a single index is supported AUTO_INCREMENT Column .

It's not just MySQL That's true in China , In fact, in many other DBMS B-Tree Index is also the most important index type , This is mainly because B-Tree The storage structure of index has excellent performance in data retrieval of database .

Generally speaking , MySQL Medium B-Tree Most of the physical files indexed are based on Balance Tree To store , That is to say, all the data actually needed are stored in Tree Of Leaf Node( Leaf node ) , And to any one of them Leaf Node The length of the shortest path is exactly the same , So we all call it B-Tree Indexes . Of course , Maybe all kinds of databases （ or MySQL All kinds of storage engines ） I'm storing my own B-Tree When indexing, the storage structure will be slightly modified . Such as Innodb Storage engine B-Tree The storage structure that the index actually uses is actually B+Tree, That is to say B-Tree On the basis of data structure, we have made a very small transformation , In every one of them Leaf Node Besides the information about storing index keys , It also stores points to the Leaf Node The next one next to each other LeafNode Pointer information for （ Added sequential access pointer ）, This is mainly to speed up the retrieval of multiple adjacent Leaf Node Efficiency considerations .

InnoDB yes Mysql The default storage engine for (Mysql5.5.5 It used to be MyISAM）

It may be very difficult for ape friends who have not known the index to read this article like this , This kind of ape friend needs to be right first Mysql The index has a general understanding , Take a look at another article by Xiaobao pigeon ： Database query optimization ——Mysql Indexes . After reading this article, let's go back to the text above .

Let's take a look at B- Trees 、B+ The concept of tree . To figure out what , Why does the speed of query with index increase ？

Two 、B- Trees 、B+ Tree concept

B Trees

The binary search tree ：

1. All non leaf nodes have at most two sons （Left and Right）;

2. All nodes store a keyword ;

3. The left pointer of a non leaf node points to a subtree smaller than its keyword , The right pointer points to a subtree larger than its keyword ;

Such as ：

B- Trees

It's a multiway search tree （ It's not binary ）：

1. The definition of any non leaf node is at most M A son ; And M>2;

2. The number of sons at the root node is [2, M];

3. The number of sons of non leaf nodes other than root nodes is [M/2, M];

4. Each node stores at least M/2-1（ Take the whole thing ） And at most M-1 Key words ;（ At least 2 Key words ）

5. Number of keywords of non leaf nodes = The number of pointers to the son -1;

6. Keywords of non leaf nodes ：K[1], K[2], …, K[M-1]; And K[i] < K[i+1];

7. A pointer to a non leaf node ：P[1], P[2], …, P[M]; among P[1] The pointing keyword is less than K[1] The subtree of ,P[M] The pointing keyword is greater than K[M-1] The subtree of , Other P[i] Pointing to the keyword belongs to (K[i-1], K[i]) The subtree of ;

8. All leaf nodes are on the same layer ;

Such as ：（M=3）

B- Tree search , Start at root , For keywords in nodes （ Orderly ） Sequence binary search , If

Hit and end , Otherwise, enter the child node of the query keyword ; repeat , Until the corresponding son pointer is

empty , Or it's already a leaf node ;

B- Characteristics of trees ：

1. The keyword set is distributed throughout the tree ;

2. Any keyword appears in only one node ;

3. The search may end at a non leaf node ;

4. Its search performance is equivalent to a binary search in the keyword set ;

5. Automatic level control ;

Due to the limitation of non leaf nodes except root nodes , At least it contains M/2 A son , At least the utilization rate of nodes is ensured .

therefore B- The performance of trees is always equivalent to binary search （ And M Value independent ）, There is no B The problem of tree balance ;

because M/2 The limitation of , When inserting a node , If the node is full , The node needs to be split into two separate nodes M/2 The node of ; When deleting a node , Two deficiencies need to be addressed M/2 The brother nodes of ;

B+ Trees

B+ Trees are B- A variation of a tree , It's also a multi-path search tree ：

1. Its definition is basically the same as B- The same tree , except ：

2. The subtree pointer of non leaf node is the same as the number of keywords ;

3. Sub tree pointer of non leaf node P[i], Point to the keyword value belongs to [K[i], K[i+1]) The subtree of （B- Trees are open spaces ）;

5. Add a chain pointer to all leaf nodes ;

6. All keywords appear in the leaf node ;

Such as ：（M=3）

B+ Search and B- Trees are basically the same , The difference is that B+ A tree hits only when it reaches the leaf node （B- Trees can be in

Non leaf node hits ）, Its performance is also equivalent to a binary search in the keyword set ;

B+ Characteristics of ：

1. All keywords appear in the linked list of leaf nodes （ Dense index ）, And the keywords in the list are just ordered ;

2. It's impossible to hit at a non leaf node ;

3. The non leaf node is equivalent to the index of the leaf node （ Sparse index ）, Leaf node is equivalent to storage （ keyword ） Data layer of data ;

4. More suitable for file index system ;

3、 ... and 、 Several principles of indexing

1. Leftmost prefix matching principle , Very important principle ,mysql It will keep matching to the right until it encounters a range query (>、<、between、like) Just stop matching , such as a = 1 and b = 2 and c > 3 and d = 4 If set up (a,b,c,d) Index of order ,d There is no index , If set up (a,b,d,c) The index of can be used ,a,b,d The order of can be adjusted at will .

2.= and in You can order , such as a = 1 and b = 2 and c = 3 establish (a,b,c) Indexes can be in any order ,mysql The query optimizer will help you optimize it into a form that the index can recognize

3. Try to select highly differentiated columns as indexes , The formula of discrimination is count(distinct col)/count(*), Indicates the proportion of fields that do not repeat , The larger the ratio, the fewer records we scan , The only key differentiator is 1, And some states 、 Gender fields may be differentiated in front of big data 0, Then someone might ask , What is the experience value of this ratio ？ Different scenarios , It's also hard to determine , Generally need join We all need to 0.1 above , On average 1 Bar scan 10 Bar record

4. The index column cannot participate in the calculation , Keep the column “ clean ”, such as from_unixtime(create_time) = ’2014-05-29’ You can't use indexes , The reason is simple ,b+ All the data stored in the tree are the field values in the data table , But when searching , You need to apply functions to all elements to compare , Obviously it costs too much . So the sentence should be written as create_time = unix_timestamp(’2014-05-29’);

5. Expand the index as much as possible , Don't create a new index . Let's say it's already in the table a The index of , Now want to add (a,b) The index of , All you need to do is change the original index

Reference article ： http://blog.sina.com.cn/s/blog_4e0c21cc01010itp.html http://www.cnblogs.com/oldhorse/archive/2009/11/16/1604009.html http://blog.jobbole.com/86594/

Total design index , The code logic is implemented around the index . Avoid duplicate indexes , Key tables control the number of indexes , Not more than 5 individual

In the definition of index fields, the fields with high access frequency should be placed first

Publisher ： Full stack programmer stack length , Reprint please indicate the source ：https://javaforall.cn/2266.html Link to the original text ：https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/180/202206291221376174.html

当前位置：网站首页>B+ tree | MySQL index usage principle

B+ tree | MySQL index usage principle

边栏推荐

猜你喜欢

随机推荐