当前位置:网站首页>Simple understanding of B tree and b+ tree
Simple understanding of B tree and b+ tree
2022-06-29 23:39:00 【Fairy wants carry】
Preface
First , We know , We use it B Trees B+ The tree is to increase the efficiency of our index ( Increase query efficiency )
We all know Binary search tree The time complexity of the search is O(log N), Its search efficiency is high enough , Then why do you still have B Trees and B+ The emergence of trees ? Is the time complexity of binary search tree smaller than that of binary search tree ?
problem :
This will throw disk read / write IO Efficiency is a problem , as everyone knows ,IO The query efficiency is very low , You need to read the disk and load the data into memory , When a large amount of data is stored , We can't load the received data into memory at once , Instead, it gradually loads into Disk page , Each disk page corresponds to the node of the tree , That is, the side shows the nodes of the tree ( data ) The more , More disk pages ,IO The more ( Back to );——> Tree depth and IO Operate the vibration wave
solve :
1. Each data node stores multiple elements , That is, every time you load , More data , More data on disk pages
2. Using multi tree
B Trees
First, let's talk about his structure directly : A pointer to the current data , A pointer pointing down , Another is data
One m Hierarchical B Tree features : When a node has k A child , For example, the head node has three children , So there must be k-1 Key words , That is, two keywords can divide the keywords of the subtree into 3 A subset of

Inquire about :
Pictured above is an example : If the query value is 5:
First disk IO: Locate in memory ( And 17、35 Compare ), Than 17 Small , The left subtree ;
Second disk IO: Locate in memory ( And 8、12 Compare ), Than 8 Small , The left subtree ;
The third disk IO: Locate in memory ( And 3、5 Compare ), find 5, End .
The whole process , We can see that : The number of comparisons is no less than that of binary search trees , Especially when there are many data in a node , But disk IO The number of times is greatly reduced . The comparison is done in memory ,( Because we compare in memory , And we can have more than one data at a time , data up 了 , So it greatly reduces IO), Compared to disk IO The speed of , The time-consuming comparison is almost negligible . So when the height of the tree is low enough , Can greatly improve efficiency . by comparison , It doesn't matter if there are many elements in the node , Just a few more memory interactions , As long as it does not exceed the size of disk pages .
B+ Trees
structure : Pointer pointing down + data ( Only leaf nodes have ), A linked table is used below , Specifically linked data
1. Yes k The middle node of the subtree contains k Elements (B In the tree is k-1 Elements ), Each element does not hold data , Index only , All the data
Are saved in the leaf node .
2. All leaf nodes contain the information of all elements , And pointers to records containing these elements , The size of the node depends on the leaf itself
Link from small to large .
3. All the intermediate node elements exist in the child nodes at the same time , Is the largest of the child node elements ( Or the smallest ) Elements .
advantage : A node stores content compared to B Trees , One pointer is missing , So every time IO Load data from hard disk into memory , Can load more data than B More trees , The side shows B+ The optimization of the tree

lookup :
And B The difference between trees is , We don't have a satellite search ( That is, the current data record pointed to by the index element ), This means that the same disk page can hold more node elements ——>IO Less operation

What needs to be added is , Clustered index in database (Clustered Index) in , The leaf node directly contains satellite data . In the nonclustered index (NonClustered Index) in , The leaf node has a pointer to the satellite data ;
B+ Tree range lookup
First, find the lower limit of the range by binary search , At this point, the function of our linked list is reflected , The upper limit can be found directly through the link sequence traversal of leaf nodes , Process is simple , Efficient ;

summary
B+ Trees are compared to B The advantages of trees :
1. A single node stores more elements , Make the query IO Fewer times ;
2. All queries need to find leaf nodes , Query performance is stable ;
3. All leaf nodes form an ordered list , Easy range query .
边栏推荐
- Profit distribution and taxation of funds
- Start harvesting! Nailing: adjust the maximum number of free users of "nailing team". If the number exceeds 10, it will not work normally
- [LeetCode] 只出现一次的数字【136】
- 十大券商:“推土机行情”再现
- Create an API rapid development platform, awesome!
- 搭建企业级NTP时间服务器
- 自己收藏的一些网址
- Halcon实用:焊点检出设计思路
- Implementation principle of dynamic agent
- High performance and high availability computing architecture of "Weibo comments"
猜你喜欢

AI赋能新零售,「智」胜之道在于生态思维|数智夜话直播精选摘录

架构实战营模块5作业

Status acquisition and control system of on-site express cabinet

matplotlib matplotlib中plt.hist()参数解释

Constexpr function

海外数字身份验证服务商ADVANCE.AI入选EqualOcean《2022品牌出海服务市场研究报告》

【一起上水硕系列】Day 8

二叉搜索树 230. 二叉搜索树中第K小的元素 1038. 从二叉搜索树到更大和树

Matlab exercises -- program control process exercise

新钛云服荣膺“2022爱分析 · IT运维厂商全景报告”云管理平台CMP 代表厂商!...
随机推荐
Fund valuation, expenses and accounting
Remember the process of checking online MySQL deadlock. You should not only know curd, but also know the principle of locking
大厂试水 HPE推出Arm CPU通用服务器产品
動態代理的實現原理
Matlab exercises -- program control process exercise
二叉搜索树 230. 二叉搜索树中第K小的元素 1038. 从二叉搜索树到更大和树
疫情下我离职一年,收入增长了10倍
Leetcode(76)——最小覆盖子串
Use of jetpack's room in combination with flow
Wechat applet: (update) cloud development wechat group contacts
Solr基础操作4
[译]在软件开发行业工作 6 年后,那些年我曾改过的观念
均值、方差、标准差、协方差的概念及意义
Implementation principle of dynamic agent
Regular expressions: characters (2)
Matplotlib plt Hist() parameter explanation
搭建企业级NTP时间服务器
Leetcode(633)——平方数之和
招商证券靠谱吗?开股票账户安全吗?
Leetcode 1385. Distance value between two arrays