understand B The concept of tree
B A tree is a self balancing lookup tree , Ability to keep data in order . This data structure allows you to find data 、 Sequential access 、 The act of inserting and deleting data , Can be completed in logarithmic time .
It is different from the general binary search tree ,B Tree is a multiway balanced search tree , Its characteristics are : The number of child nodes of a node can be more than two , And each node can store multiple elements .
stay B In the tree , Non leaf nodes can have a variable number of child nodes , In order to keep within the preset quantity range , Usually, non leaf nodes are merged and separated . Its advantage is that it does not need to rebalance as frequently as other self balancing search trees , The disadvantage is that some space will be wasted when the nodes are not fully filled .
characteristic
Usually , We will be in B Add a degree before the name of the tree to illustrate , Such as m rank B Trees . One m Step B Trees have the following characteristics :
- Any node can have at most m A child node
- Any non leaf node except the root node has at least \(\frac{m}{2}\) Child nodes
- If the root node is not a leaf node , Then at least it has 2 A child node
- Yes k The non leaf nodes of child nodes have k-1 Key
- All the leaf nodes are on the same layer ,B The tree is also balanced by this constraint
The following shows a 3 rank B Trees :

variant
B A tree can refer to a specific tree structure , It can also refer to a kind of tree structure in general .
about B Tree is a kind of tree structure , It also includes B+ Trees and B* Trees and other structures , Their simple definitions are as follows :
about B+ Trees , Keywords are stored only in leaf nodes , Non leaf nodes store partial copies of keywords stored in leaf nodes , All leaf nodes are at the same height , The leaf node itself is linked from small to large according to the keyword size .
B* Trees are B+ A variation of a tree , stay B+ On the basis of trees , Non leaf node ( Apart from the root node ) Will increase the pointer to the brother of the same layer , And the number of non leaf node keywords is at least \(\frac{2m}{3}\), That is, the minimum utilization rate of the block is \(\frac{2}{3}\)(B+ The tree is \(\frac{1}{2}\)).
The following is B* Tree structure :

Origin and Application
Actually ,B A tree is a tree structure designed for disk , It is mainly to reduce the disk access of other tree structures IO frequency .
Disk read
The time to read data from the disk mainly involves “ Seek time ” and “ Rotation delay ”:
- Seek time refers to the time after the disk receives the system instruction , The time it takes for the head to move from the beginning to the track where the data resides , May be 0 To 20 Milliseconds or more
- Rotation delay refers to the end of seek , The time required for the disk to rotate the corresponding sector under the head , The average time is about... Of the rotation period 50% about , For one 7200 Rotating disk , use 60×1000÷7200 According to the formula, we know that a rotation period event is 8.33 Millisecond or so
This is also why sequential disk reads and writes are faster than random disk reads and writes , In sequential reading and writing , The magnetic head does not need to seek , Only a small amount of rotation time is required , Random reading and writing requires constantly moving the head to find the corresponding track .
Disk read ahead
In order to minimize IO operation , The computer system generally adopts the way of pre reading , The length of pre reading is generally page (Page) Integer multiple .
Page is the logical block of computer management memory , Hardware and operating system often divide main memory and disk storage area into continuous blocks of equal size , Each storage block is called a page ( The size of most operating system pages is 4k), Main memory and disk exchange data in pages .
The computer system reads and stores pages , The minimum unit for each read and access is one page , Disk read ahead usually reads an integer multiple of the page .
Index structure
Indexes for file systems and database systems , It is usually stored on disk in the form of file , So finding the index will also perform disk IO operation , If the disk is in the process of discovery IO Too many accesses will affect the efficiency of the index .
Database systems are commonly used B Trees or B+ Tree as index structure , It cleverly makes use of the disk read ahead principle , Set a node to the size of a page , In this way, each node needs only once IO You can load it completely .
meanwhile , The following skills are also used in the process of use :
- Each time a node is created , Apply for one page space directly , It only takes one time to implement a node IO
- Resident the root node in memory , In actual use, it can reduce 1 Time IO
Use B When a tree is used as an index structure , Because the size of a node is equal to the size of a page , Usually the order will be relatively large , So the depth of the tree is shallow ( Usually no more than 3), Search efficiency is very high .
shortcoming
Although database systems are commonly used B Tree as index structure , But there are still the following shortcomings :
- Non leaf nodes store data directly , The number of indexes stored in the same node will be relatively small
- Data may be stored in leaf nodes , It may also be stored in non leaf nodes , Query efficiency is relatively unstable
- There are no adjacent pointers between nodes in the same layer , It is not suitable for data traversal
Insert and delete
Insert

B All insertion processes of the tree start with the root node , The first is to find the node to store the new element , Then determine the number of elements inserted into the node :
If the number of elements stored in the node is less than the maximum , So there's room for new elements , Directly insert and keep the nodes in order ;
If the number of elements stored in the node is greater than or equal to the maximum value , Divide it evenly into 2 Nodes :
- Select the median from the original and new elements of the node ( The number in the middle of a set of data arranged in order );
- Elements smaller than the median are placed in the left node , Elements larger than the median are placed in the right node , Median as a separated value ;
- Insert separated values into the parent node , This may also cause the parent node to split , The splitting of the parent node may also cause the splitting of its parent node , And so on ; If there is no parent node , Create a new parent node .
Delete
Delete B There are two common strategies for nodes in a tree :
- Locate and delete elements , Then adjust the tree to meet the constraints
- Deal with the tree from top to bottom , Before entering a node , Adjust the tree so that once it encounters a key to be deleted , It can be deleted without any adjustment
For the previous delete policy , The deletion process is as follows :
- If you delete the element in the leaf node , Delete it directly , If the number of elements in the node is less than the minimum , Then perform the rebalance operation ;
- If you delete an element in a non leaf node , Select a new delimited value ( The largest element in the left subtree or the smallest element in the right subtree ), Remove it from the leaf node , Replace the deleted element as a new separated value , If the number of elements in the leaf node is less than the minimum value , Then perform the rebalance operation ;
- The rebalance operation starts at the leaf node , To the root node , Until the tree rebalances .
In the delete node , send B Tree rebalancing mainly involves the following situations :
If the right sibling of the missing element node exists and has redundant elements , Then rotate to the left :
- Move the separated value of the parent node to the largest element in the left subtree ;
- Move the smallest element of the right sibling node to the separated value of the original parent node ;
If the left sibling of the missing element node exists and has redundant elements , Then rotate to the right :
- Move the separated value of the parent node to the smallest element in the right subtree ;
- Move the largest element of the left sibling node to the separated value of the original parent node ;
If the two direct siblings of the missing element node have only the minimum number of elements , Then merge it with the left sibling node and their separated values in the parent node :
- Copy the separated values to the node on the left ;
- Move all elements in this missing element node to the left node ;
- Remove nodes that are missing elements ;
- If the parent node is the root node and there are no elements , Release it and make the merged node the new root node ; If the number of elements of the parent node is less than the minimum , Rebalance parent nodes .
Yes B It is more complicated to delete elements from a tree , But still to keep B Tree balance is the main , And do not make it cause feature failure .

![[paper sharing] pata: fuzzing with path aware Taint Analysis](/img/f6/627344c5da588afcf70302ef29d134.png)






