当前位置：网站首页>A simple understanding of B tree

A simple understanding of B tree

2022-06-10 23:54:00 【Programmer Xiangzai】

understand B The concept of tree

B A tree is a self balancing lookup tree , Ability to keep data in order . This data structure allows you to find data 、 Sequential access 、 The act of inserting and deleting data , Can be completed in logarithmic time .

It is different from the general binary search tree ,B Tree is a multiway balanced search tree , Its characteristics are ： The number of child nodes of a node can be more than two , And each node can store multiple elements .

stay B In the tree , Non leaf nodes can have a variable number of child nodes , In order to keep within the preset quantity range , Usually, non leaf nodes are merged and separated . Its advantage is that it does not need to rebalance as frequently as other self balancing search trees , The disadvantage is that some space will be wasted when the nodes are not fully filled .

characteristic

Usually , We will be in B Add a degree before the name of the tree to illustrate , Such as m rank B Trees . One m Step B Trees have the following characteristics ：

Any node can have at most m A child node
Any non leaf node except the root node has at least \(\frac{m}{2}\) Child nodes
If the root node is not a leaf node , Then at least it has 2 A child node
Yes k The non leaf nodes of child nodes have k-1 Key
All the leaf nodes are on the same layer ,B The tree is also balanced by this constraint

The following shows a 3 rank B Trees ：

3 rank B Trees

variant

B A tree can refer to a specific tree structure , It can also refer to a kind of tree structure in general .

about B Tree is a kind of tree structure , It also includes B+ Trees and B* Trees and other structures , Their simple definitions are as follows ：

about B+ Trees , Keywords are stored only in leaf nodes , Non leaf nodes store partial copies of keywords stored in leaf nodes , All leaf nodes are at the same height , The leaf node itself is linked from small to large according to the keyword size .

B* Trees are B+ A variation of a tree , stay B+ On the basis of trees , Non leaf node （ Apart from the root node ） Will increase the pointer to the brother of the same layer , And the number of non leaf node keywords is at least \(\frac{2m}{3}\), That is, the minimum utilization rate of the block is \(\frac{2}{3}\)（B+ The tree is \(\frac{1}{2}\)）.

The following is B* Tree structure ：

B* Tree structure

Origin and Application

Actually ,B A tree is a tree structure designed for disk , It is mainly to reduce the disk access of other tree structures IO frequency .

Disk read

The time to read data from the disk mainly involves “ Seek time ” and “ Rotation delay ”：

Seek time refers to the time after the disk receives the system instruction , The time it takes for the head to move from the beginning to the track where the data resides , May be 0 To 20 Milliseconds or more
Rotation delay refers to the end of seek , The time required for the disk to rotate the corresponding sector under the head , The average time is about... Of the rotation period 50% about , For one 7200 Rotating disk , use 60×1000÷7200 According to the formula, we know that a rotation period event is 8.33 Millisecond or so

This is also why sequential disk reads and writes are faster than random disk reads and writes , In sequential reading and writing , The magnetic head does not need to seek , Only a small amount of rotation time is required , Random reading and writing requires constantly moving the head to find the corresponding track .

Disk read ahead

In order to minimize IO operation , The computer system generally adopts the way of pre reading , The length of pre reading is generally page （Page） Integer multiple .

Page is the logical block of computer management memory , Hardware and operating system often divide main memory and disk storage area into continuous blocks of equal size , Each storage block is called a page （ The size of most operating system pages is 4k）, Main memory and disk exchange data in pages .

The computer system reads and stores pages , The minimum unit for each read and access is one page , Disk read ahead usually reads an integer multiple of the page .

Index structure

Indexes for file systems and database systems , It is usually stored on disk in the form of file , So finding the index will also perform disk IO operation , If the disk is in the process of discovery IO Too many accesses will affect the efficiency of the index .

Database systems are commonly used B Trees or B+ Tree as index structure , It cleverly makes use of the disk read ahead principle , Set a node to the size of a page , In this way, each node needs only once IO You can load it completely .

meanwhile , The following skills are also used in the process of use ：

Each time a node is created , Apply for one page space directly , It only takes one time to implement a node IO
Resident the root node in memory , In actual use, it can reduce 1 Time IO

Use B When a tree is used as an index structure , Because the size of a node is equal to the size of a page , Usually the order will be relatively large , So the depth of the tree is shallow （ Usually no more than 3）, Search efficiency is very high .

shortcoming

Although database systems are commonly used B Tree as index structure , But there are still the following shortcomings ：

Non leaf nodes store data directly , The number of indexes stored in the same node will be relatively small
Data may be stored in leaf nodes , It may also be stored in non leaf nodes , Query efficiency is relatively unstable
There are no adjacent pointers between nodes in the same layer , It is not suitable for data traversal

Insert and delete

Insert

B Tree insert elements

B All insertion processes of the tree start with the root node , The first is to find the node to store the new element , Then determine the number of elements inserted into the node ：

If the number of elements stored in the node is less than the maximum , So there's room for new elements , Directly insert and keep the nodes in order ;
If the number of elements stored in the node is greater than or equal to the maximum value , Divide it evenly into 2 Nodes ：
1. Select the median from the original and new elements of the node （ The number in the middle of a set of data arranged in order ）;
2. Elements smaller than the median are placed in the left node , Elements larger than the median are placed in the right node , Median as a separated value ;
3. Insert separated values into the parent node , This may also cause the parent node to split , The splitting of the parent node may also cause the splitting of its parent node , And so on ; If there is no parent node , Create a new parent node .

Delete

Delete B There are two common strategies for nodes in a tree ：

Locate and delete elements , Then adjust the tree to meet the constraints
Deal with the tree from top to bottom , Before entering a node , Adjust the tree so that once it encounters a key to be deleted , It can be deleted without any adjustment

For the previous delete policy , The deletion process is as follows ：

If you delete the element in the leaf node , Delete it directly , If the number of elements in the node is less than the minimum , Then perform the rebalance operation ;
If you delete an element in a non leaf node , Select a new delimited value （ The largest element in the left subtree or the smallest element in the right subtree ）, Remove it from the leaf node , Replace the deleted element as a new separated value , If the number of elements in the leaf node is less than the minimum value , Then perform the rebalance operation ;
The rebalance operation starts at the leaf node , To the root node , Until the tree rebalances .

In the delete node , send B Tree rebalancing mainly involves the following situations ：

If the right sibling of the missing element node exists and has redundant elements , Then rotate to the left ：
1. Move the separated value of the parent node to the largest element in the left subtree ;
2. Move the smallest element of the right sibling node to the separated value of the original parent node ;
If the left sibling of the missing element node exists and has redundant elements , Then rotate to the right ：
1. Move the separated value of the parent node to the smallest element in the right subtree ;
2. Move the largest element of the left sibling node to the separated value of the original parent node ;
If the two direct siblings of the missing element node have only the minimum number of elements , Then merge it with the left sibling node and their separated values in the parent node ：
1. Copy the separated values to the node on the left ;
2. Move all elements in this missing element node to the left node ;
3. Remove nodes that are missing elements ;
4. If the parent node is the root node and there are no elements , Release it and make the merged node the new root node ; If the number of elements of the parent node is less than the minimum , Rebalance parent nodes .

Yes B It is more complicated to delete elements from a tree , But still to keep B Tree balance is the main , And do not make it cause feature failure .

原网站

版权声明
本文为[Programmer Xiangzai]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/161/202206102242099956.html