当前位置:网站首页>B+ tree | MySQL index usage principle
B+ tree | MySQL index usage principle
2022-06-29 13:20:00 【Full stack programmer webmaster】
‘’MYSQL I haven't known much , Written before sql Before preparing to submit the production environment , The old staff helped me check sql, Let me modify the storage engine , I was using Myisam, Change the back to InnoDB 了 . Why change it to this , I've never heard of a storage engine before , So I checked it online .
In fact, using different storage engines is also very different , The following ape friends can learn about .
One 、 Comparison of storage engines
notes : As mentioned above B The tree index does not indicate that B-Tree and B+Tree Indexes , however B- Trees and B+ The definition of a tree is different .
stay MySQL in , There are four main types of indexes , Respectively : B-Tree Indexes , Hash Indexes , Fulltext Index and R-Tree Indexes .
B-Tree The index is MySQL The most frequently used index type in a database , except Archive All storage engines other than storage engines support B-Tree Indexes .Archive Engine until MySQL 5.1 Index is supported , And only a single index is supported AUTO_INCREMENT Column .
It's not just MySQL That's true in China , In fact, in many other DBMS B-Tree Index is also the most important index type , This is mainly because B-Tree The storage structure of index has excellent performance in data retrieval of database .
Generally speaking , MySQL Medium B-Tree Most of the physical files indexed are based on Balance Tree To store , That is to say, all the data actually needed are stored in Tree Of Leaf Node( Leaf node ) , And to any one of them Leaf Node The length of the shortest path is exactly the same , So we all call it B-Tree Indexes . Of course , Maybe all kinds of databases ( or MySQL All kinds of storage engines ) I'm storing my own B-Tree When indexing, the storage structure will be slightly modified . Such as Innodb Storage engine B-Tree The storage structure that the index actually uses is actually B+Tree, That is to say B-Tree On the basis of data structure, we have made a very small transformation , In every one of them Leaf Node Besides the information about storing index keys , It also stores points to the Leaf Node The next one next to each other LeafNode Pointer information for ( Added sequential access pointer ), This is mainly to speed up the retrieval of multiple adjacent Leaf Node Efficiency considerations .
InnoDB yes Mysql The default storage engine for (Mysql5.5.5 It used to be MyISAM)
It may be very difficult for ape friends who have not known the index to read this article like this , This kind of ape friend needs to be right first Mysql The index has a general understanding , Take a look at another article by Xiaobao pigeon : Database query optimization ——Mysql Indexes . After reading this article, let's go back to the text above .
Let's take a look at B- Trees 、B+ The concept of tree . To figure out what , Why does the speed of query with index increase ?
Two 、B- Trees 、B+ Tree concept
B Trees
The binary search tree :
1. All non leaf nodes have at most two sons (Left and Right);
2. All nodes store a keyword ;
3. The left pointer of a non leaf node points to a subtree smaller than its keyword , The right pointer points to a subtree larger than its keyword ;
Such as :
B- Trees
It's a multiway search tree ( It's not binary ):
1. The definition of any non leaf node is at most M A son ; And M>2;
2. The number of sons at the root node is [2, M];
3. The number of sons of non leaf nodes other than root nodes is [M/2, M];
4. Each node stores at least M/2-1( Take the whole thing ) And at most M-1 Key words ;( At least 2 Key words )
5. Number of keywords of non leaf nodes = The number of pointers to the son -1;
6. Keywords of non leaf nodes :K[1], K[2], …, K[M-1]; And K[i] < K[i+1];
7. A pointer to a non leaf node :P[1], P[2], …, P[M]; among P[1] The pointing keyword is less than K[1] The subtree of ,P[M] The pointing keyword is greater than K[M-1] The subtree of , Other P[i] Pointing to the keyword belongs to (K[i-1], K[i]) The subtree of ;
8. All leaf nodes are on the same layer ;
Such as :(M=3)
B- Tree search , Start at root , For keywords in nodes ( Orderly ) Sequence binary search , If
Hit and end , Otherwise, enter the child node of the query keyword ; repeat , Until the corresponding son pointer is
empty , Or it's already a leaf node ;
B- Characteristics of trees :
1. The keyword set is distributed throughout the tree ;
2. Any keyword appears in only one node ;
3. The search may end at a non leaf node ;
4. Its search performance is equivalent to a binary search in the keyword set ;
5. Automatic level control ;
Due to the limitation of non leaf nodes except root nodes , At least it contains M/2 A son , At least the utilization rate of nodes is ensured .
therefore B- The performance of trees is always equivalent to binary search ( And M Value independent ), There is no B The problem of tree balance ;
because M/2 The limitation of , When inserting a node , If the node is full , The node needs to be split into two separate nodes M/2 The node of ; When deleting a node , Two deficiencies need to be addressed M/2 The brother nodes of ;
B+ Trees
B+ Trees are B- A variation of a tree , It's also a multi-path search tree :
1. Its definition is basically the same as B- The same tree , except :
2. The subtree pointer of non leaf node is the same as the number of keywords ;
3. Sub tree pointer of non leaf node P[i], Point to the keyword value belongs to [K[i], K[i+1]) The subtree of (B- Trees are open spaces );
5. Add a chain pointer to all leaf nodes ;
6. All keywords appear in the leaf node ;
Such as :(M=3)
B+ Search and B- Trees are basically the same , The difference is that B+ A tree hits only when it reaches the leaf node (B- Trees can be in
Non leaf node hits ), Its performance is also equivalent to a binary search in the keyword set ;
B+ Characteristics of :
1. All keywords appear in the linked list of leaf nodes ( Dense index ), And the keywords in the list are just ordered ;
2. It's impossible to hit at a non leaf node ;
3. The non leaf node is equivalent to the index of the leaf node ( Sparse index ), Leaf node is equivalent to storage ( keyword ) Data layer of data ;
4. More suitable for file index system ;
3、 ... and 、 Several principles of indexing
1. Leftmost prefix matching principle , Very important principle ,mysql It will keep matching to the right until it encounters a range query (>、<、between、like) Just stop matching , such as a = 1 and b = 2 and c > 3 and d = 4 If set up (a,b,c,d) Index of order ,d There is no index , If set up (a,b,d,c) The index of can be used ,a,b,d The order of can be adjusted at will .
2.= and in You can order , such as a = 1 and b = 2 and c = 3 establish (a,b,c) Indexes can be in any order ,mysql The query optimizer will help you optimize it into a form that the index can recognize
3. Try to select highly differentiated columns as indexes , The formula of discrimination is count(distinct col)/count(*), Indicates the proportion of fields that do not repeat , The larger the ratio, the fewer records we scan , The only key differentiator is 1, And some states 、 Gender fields may be differentiated in front of big data 0, Then someone might ask , What is the experience value of this ratio ? Different scenarios , It's also hard to determine , Generally need join We all need to 0.1 above , On average 1 Bar scan 10 Bar record
4. The index column cannot participate in the calculation , Keep the column “ clean ”, such as from_unixtime(create_time) = ’2014-05-29’ You can't use indexes , The reason is simple ,b+ All the data stored in the tree are the field values in the data table , But when searching , You need to apply functions to all elements to compare , Obviously it costs too much . So the sentence should be written as create_time = unix_timestamp(’2014-05-29’);
5. Expand the index as much as possible , Don't create a new index . Let's say it's already in the table a The index of , Now want to add (a,b) The index of , All you need to do is change the original index
Reference article : http://blog.sina.com.cn/s/blog_4e0c21cc01010itp.htmlhttp://www.cnblogs.com/oldhorse/archive/2009/11/16/1604009.htmlhttp://blog.jobbole.com/86594/
Total design index , The code logic is implemented around the index . Avoid duplicate indexes , Key tables control the number of indexes , Not more than 5 individual
In the definition of index fields, the fields with high access frequency should be placed first
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/2266.html Link to the original text :https://javaforall.cn
边栏推荐
- Install the terrain ovirt plug-in to provide automated management for ovirt
- C#通过中序遍历对二叉树进行线索化
- 记一次固态更新与系统迁移debug的过程
- Mirror vulnerability scanner: trivy
- Uber前安全主管面临欺诈指控 曾隐瞒数据泄露事件
- 从零搭建Pytorch模型教程(四)编写训练过程--参数解析
- Rslo: self supervised lidar odometer (real time + high precision, icra2022)
- C#实现二叉树的层次遍历
- C#实现顺序表定义、插入、删除、查找操作
- File contained log poisoning (user agent)
猜你喜欢

SCHIEDERWERK电源维修SMPS12/50 PFC3800解析

强大、优秀的文件管理软件评测:图片管理、书籍管理、文献管理

Lm07 - detailed discussion on cross section strategy of futures

Yolo series combs (IX) first taste of newly baked yolov6

AcWing 234 放弃测试

LR、CR纽扣电池对照表

Cvpr2022 | panopticdepth: a unified framework for depth aware panoramic segmentation

Cvpr2022 𞓜 loss problem in weakly supervised multi label classification

Hystrix circuit breaker

C#二叉树结构定义、添加节点值
随机推荐
Acwing 234 abandoning testing
Hystrix circuit breaker
C # indexe l'arbre binaire en traversant l'ordre moyen
qt json
趣谈网络协议(二)传输层
编写一个shell脚本,求一个数的”逆序数“
C语言内存函数
PyGame accurately detects image collision
树状数组应用(AcWing 242,243,244)
Another "provincial capital university", coming!
Precautions for Beifu controller connecting Panasonic EtherCAT servo
C # implementation of binary tree non recursive middle order traversal program
C binary tree structure definition and node value addition
Repoptimizer: it's actually repvgg2
Application Service Vulnerability scanning and exploitation of network security skills competition in secondary vocational schools (SSH private key disclosure)
C # realizes the first order traversal, middle order traversal and second order traversal of binary tree
Matlab to find the limit
netdata邮件告警配置
基于51单片机控制的BUCK开关电源Proteus仿真
C#实现二叉树的先序遍历、中序遍历、后序遍历