当前位置:网站首页>Some personal understandings about MySQL indexes (partially refer to MySQL45 lectures)
Some personal understandings about MySQL indexes (partially refer to MySQL45 lectures)
2022-07-29 23:53:00 【lonelyMangoo】
Overview
Wrote the data structure of MySQL index, covering index, leftmost prefix rule, index pushdown and one-to-one interview questions I encountered.
What is an index
The appearance of the index is to improve the query efficiency, just like the directory of the book
Data structure of the index
Hashtable
It is a key-value structure, the same key value uses the zipper method
In the scenario of equal value query, the efficiency is very high,
For ordered arrays, the efficiency of using ordered arrays for range queries can be as follows:Achieving O(log(N)), however, maintaining an ordered array is too costly and requires constant movement of data, so ordered data is only suitable for static storage engines.
Two (N)-ary tree
When using a binary tree, it may cause the tree to be too high, and the index is not only in memory, but also written to disk.Each query has to go through many unnecessary nodes, and the data accessed is too fast and time-consuming.
So you can use the N-ary tree, you can reduce the read disk!In InnoDB, if N is 1200 and the tree height is 4, 120012001200 can be stored, and there are 1.7 billion data.
InnoDB: B+Tree
Each index corresponds to a B+ tree in InnoDB.
The leaf node of the primary key index stores the entire row of data, and the primary key index is also called the clustered index.
The leaf nodes of the non-primary key index are the primary key value.Non-primary key indexes are also known as secondary indexes.
- What is the difference between a query based on a primary key index and a normal index?
For the primary key query, directly search the ID B+ tree
instead of the primary key index to find the ID and then search the ID index book, this process is called return table.
- Why use bigint to increment primary key?
Such an insertion is an append operation, which does not require moving other records, nor does it initiate the splitting of child nodes.
And suppose we use varchar as the primary key. For example, the snowflake algorithm needs 19 bytes (different encoding methods), and the bigint only needs 8 bytes. The length of the primary key is short, which reduces the number of leaf nodes.Ordinary indexes also take up less space.
- Why use a B+ tree?
- The tree is not tall and can store a large amount of data
- The non-leaf node single page of the B+ tree can store more keywords. The more keywords that are read into memory at one time, the fewer random I/O reads from the disk.Reduce the number of disk accesses for a single query.(The memory stores more keys, the data is stored more closely, and has better spatial locality. Therefore, accessing the associated data on the leaf node also has a better cache hit rate.)
- B+ tree can well support single-point query, range query, ordered query (leaf node linked list).
- B+ tree query efficiency is more stable, it must be O(logN),
- Why not use a B-tree?
- The child nodes of tree B need to store data, the depth increases, the number of I/Os increases, and the query performance decreases
Covering Index
When querying the index, the table will be returned, so the index needs to be checked twice. How to optimize the index to avoid returning the table?
That is using a covering index.
The covering index already contains the query results and does not need to return to the table, which reduces the number of searches and significantly improves the query performance.
- How to weigh the need to add a covering index?
If it is a high-frequency request, the joint index makes sense, reducing the execution time of the statement, but the maintenance of the index has a cost.
Leftmost prefix rule
This leftmost prefix can be the leftmost N fields of a union index, or the leftmost M characters of a string index.(For example, like Zhang%, you can find Zhang xx first)
But if (a, b), use Zhang% on the a field, the b field will be invalid.(mysql will keep matching to the right until it encounters a range query (>, <, between, like) and then stop matching. Range columns can use indexes, but columns after the range column cannot use indexes. That is, an index can be used for at most oneThe range column, so if there are two range columns in the query condition, the index cannot be used in full. Why? Thinking that the index is arranged in lexicographic order, the first order cannot guarantee that there will be the following ones.sequence.)
- How do I choose an index?
- The first principle is that if one less index can be maintained by adjusting the order, then this order is often the one that needs to be prioritized.
For example, the joint index of (a,b) satisfies both (a,b) and (a), but (b) does not satisfy
Index Pushdown
Give an example
Now there is a joint index of (a,b), but to check the data of a, b, c
Then a uses Zhang%, obviously b will be invalid.
Versions before 5.6 After matching a sheet, all data will be directly returned to the table.
Versions after 5.6 After matching the sheet, you will get b (in the joint index), and compare it with the b value to be checked. If the error is directly discarded, the number of times of returning to the table will be reduced
An interview question
I encountered the following question in my interview with Baidu before, I probably should...
(a,b,c) fields have index(a), index(b) and index(a,b,c)
When the query conditions are a and b, is it faster to use two separate indexes or a joint index?
I don't know what the internal optimization is like here, but the joint index definitely does not need to return to the table, obviously the latter is faster.
Test 600w data, no doubt use joint index
Delete the joint index, leave a single-column index, and directly become a full table scan,
After adding the query condition, although the index is used, it is still a full tableScanning, extremely inefficient
To sum up, covering index yyds!
边栏推荐
- Tkinter:功能按钮Button
- 线上无序的
- 管理区解耦架构见过吗?能帮客户解决大难题的
- Sentinel入门
- C陷阱与缺陷 第5章 库函数 5.4 使用errno检测错误
- Access Modbus TCP and Modbus RTU protocol devices using Neuron
- Adaptive feature fusion pyramid network for multi-classes agriculturalpest detection
- 暴力递归到动态规划 04 (数字字符串转化)
- 50. Leetcode 】 【 Pow (x, n) (medium) (power) quickly
- Allure环境部署与生成+Allure内容增强
猜你喜欢
随机推荐
决策树原理及代码实现
Apache Doris 1.1 特性揭秘:Flink 实时写入如何兼顾高吞吐和低延时
esp12f + tft 显示图片问题
c语言小游戏扫雷
Redis系列:高可用之Sentinel(哨兵模式)
C陷阱与缺陷 第4章 链接 4.4 形参、实参与返回值
MySQL函数(经典收藏)
【openlayers】地图【二】
JVM初探- 内存分配、GC原理与垃圾收集器
论文精读——YOLOv3: An Incremental Improvement
Brute force recursion to dynamic programming 04 (digital string conversion)
JSON.parseObject 带泛型告警
MySQL事务隔离级别详解
2022年最新甘肃建筑施工焊工(建筑特种作业)模拟题库及答案解析
接口测试的概念、目的、流程、测试方法有哪些?
CesiumJS 2022^ 源码解读[0] - 文章目录与源码工程结构
WIN2008的IIS上下载文件大小限制之修改
Huawei 14 Days - (3) Kernel Development
全国双非院校考研信息汇总整理 Part.1
【云原生Kubernetes】二进制搭建Kubernetes集群(中)——部署node节点








![[leetcode] 75. Color classification (medium) (double pointer, in-situ modification)](/img/0e/e4ed76902194755a3b075a73f272f3.png)
