当前位置:网站首页>Disk storage chain B-tree and b+ tree
Disk storage chain B-tree and b+ tree
2022-07-07 18:12:00 【cheems~】
Disk storage chain B Trees and B+ Trees
- Preface
- Disk structure analysis and data storage principle
- B The definition of a tree
- B Trees and B+ The difference between trees
- B Two kinds of splitting of tree insertion
- B The borrowing before and after tree deletion is merged with the node
- Keywords are in leaf nodes , Delete directly
- The current node is an inner node , And the left child at least includes M/2 Key words , Then borrow and delete
- The current node is an inner node , And the right child at least includes M/2 Key words , Then borrow backward and delete
- Both children are M/2-1 Key words , Then merge and delete
- Delete merge code
- B Tree insertion , Delete , Traverse , Look for code
Preface
In this paper, b Trees and b+ Trees . Also on b The insertion and deletion of trees are introduced in detail , The code is attached at the end of the text .
The knowledge points of this column are through Zero sound education Online learning , Make a summary and write an article , Yes c/c++linux Readers interested in the course , You can click on the link C/C++ Background advanced server course introduction Check the service of the course in detail .
Disk structure analysis and data storage principle
We know that the common data structures are linked lists , Trees , Pictures and so on , And trees can be divided into binary trees , Multi fork tree and so on . For a linked list , It can find the next node , The tree can not only find the next data node , The tree can find two , Find three , Find more ( Several forks ), If a node has several forks, you can find several nodes , Through a node, you can find n Nodes , This is a tree structure .
So why should there be a multi fork tree ? Academically , Usually interpreted as : The tree is for descending . Then why should we lower the floor ?
For a multitree , Within a node , There may be more than one key, Therefore, multi tree can not improve the efficiency of query ( Compared with a binary tree ). But it has one advantage ,“ Within a node , There may be more than one key”, This means that the number of nodes of a multi tree is relatively small , Since the number of nodes is small , Then the number of times to find nodes is less .
Be careful , The function of multi fork tree : Reduce floor height , Reduce the number of nodes , Then the number of times to find nodes will be reduced .
We know cpu Can directly access memory , Instead of directly accessing the disk . The computer sends an access instruction to the disk , The disk finds the corresponding disk track sector through rotation, and then reads it out and puts it into memory . Why is the disk slow , It's because the disk addressing speed is slow . Every address , The disk will rotate once . One access of memory is about 10ns, One access of disk is 100ms. For in memory , Multi fork tree has little effect , But for disks , If a node is saved 10 individual key, You can address many times less . So the function of multi tree is to reduce the layer height in order to reduce the number of addressing . This is why disk storage is suitable b Tree or b+ The reason for the tree .
Multiforked tree -> Reduce floor height -> Reduce the number of nodes -> Reduce disk IO -> Lifting performance
B The definition of a tree
One M rank B Trees T, The following conditions are met
- Each node has at most M Class subtree
- The root node has at least two subtrees
- Except for the root node , Every other branch node has at least M/2 Class subtree
- All the leaf nodes are on the same layer
- Yes k The branch nodes of Kezi trees exist k-1 Key words , Keywords are sorted in ascending order
- The number of keywords meets ceil( M/2 ) - 1 <= n <= M-1
B Trees and B+ The difference between trees
In the actual disk storage, we often choose b+ Trees
b+ Trees compare with b The advantages of trees :
- Keywords do not save data , Index only , All data is saved in the leaf node (b A tree holds data for each keyword )
- b+ The leaf nodes of the tree have pointers , And the leaf nodes themselves are connected in the order of keywords from small to large ( For range queries )
- b+ The middle node of the tree does not save data , So disk pages can hold more node elements , more “ Short and fat ”
B Two kinds of splitting of tree insertion
b The tree is in the process of inserting , Will check whether the current node can be split from top to bottom , If the keywords are full (k=M-1) Then split first , Insert again . And the insertion will be inserted into the leaf node .b There are two kinds of splits when a tree is inserted , One is root node splitting , One is non root node splitting .
Non root node splitting
Non root node splitting process :
1. Create a new node , Set to Z, The original node is set to Y
2. take Y The second half of the keyword and subtree are assigned to Z
3. take Y The key word in the middle key To the parent node
4. Parent node x Add a keyword key And subtree Z
Root node splitting
Root node splitting process :
1. Create a new node , Set to node
2. take b Treelike root Point to node
3. node The first subtree of points to the previous root node
4. Now? , Root node splitting is transformed into non root node splitting
5. Perform the non root node splitting process
Insert split code
// split
void btree_split_clild(struct btree *T, struct btree_node *x, int i) {
//y Nodes that need to be split
struct btree_node *y = x->children[i];
// New node
struct btree_node *z = btree_create_node(T->t, y->leaf);
int j = 0;
z->num = T->t - 1;
// hold y The second half of key And subtree copy To z in
for (j = 0; j < T->t - 1; j++) {
z->keys[j] = y->keys[j + T->t];
}
if (y->leaf == 0) {
for (j = 0; j < T->t; j++) {
z->children[j] = y->children[j + T->t];
}
}
//y Node modification quantity
y->num = T->t - 1;
// Put the parent node x Add one more key And subtree z
for (j = x->num; j >= i + 1; j--) {
x->children[j + 1] = x->children[j];
}
x->children[i + 1] = z;
for (j = x->num - 1; j >= i; j--) {
x->keys[j + 1] = x->keys[j];
}
x->keys[i] = y->keys[T->t - 1];
x->num++;
}
void btree_insert_nonfull(struct btree *T, struct btree_node *x, KEY_TYPE key) {
int i = x->num - 1;
if (x->leaf) {
while (i >= 0 && x->keys[i] > key) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = key;
x->num++;
}
else {
while (i >= 0 && x->keys[i] > key)i--;
if (x->children[i + 1]->num == 2 * T->t - 1) {
btree_split_clild(T, x, i + 1);
if (key > x->keys[i + 1])i++;
}
btree_insert_nonfull(T, x->children[i + 1], key);
}
}
void btree_insert(struct btree *T, KEY_TYPE key) {
struct btree_node *root = T->root;
// If the root node is full , The root node splits and then inserts
if (root->num == 2 * T->t - 1) {
btree_node *node = btree_create_node(T->t, 0);
T->root = node;
node->children[0] = root;
btree_split_clild(T, node, 0);
int i = 0;
if (key > node->keys[0]) i++;
btree_insert_nonfull(T, node->children[i], key);
}
else {
btree_insert_nonfull(T, root, key);
}
}
B The borrowing before and after tree deletion is merged with the node
Why delete keywords to borrow or merge ? Because we need to meet b The definition of a tree
Determine the number of sub tree keywords
- The two adjacent sub trees are M/2-1 , Then merge
- The left subtree is larger than M/2-1, Borrow to the left
- The right subtree is larger than M/2-1, Borrow right
Keywords are in leaf nodes , Delete directly
Keywords are in leaf nodes
- Delete directly
The current node is an inner node , And the left child at least includes M/2 Key words , Then borrow and delete
The current node is an inner node , And the left child at least includes M/2 Key words
- Excuse me first
- In the delete
The current node is an inner node , And the right child at least includes M/2 Key words , Then borrow backward and delete
The current node is an inner node , And the right child at least includes M/2 Key words
- Excuse me first
- In the delete
Both children are M/2-1 Key words , Then merge and delete
Both children are M/2-1 The key
- Merge first
- And then delete
Delete merge code
//{child[idx], key[idx], child[idx+1]}
void btree_merge(btree *T, btree_node *node, int idx) {
// Left and right subtrees
btree_node *left = node->children[idx];
btree_node *right = node->children[idx + 1];
int i = 0;
//data merge
left->keys[T->t - 1] = node->keys[idx];
for (i = 0; i < T->t - 1; i++) {
left->keys[T->t + i] = right->keys[i];
}
if (!left->leaf) {
for (i = 0; i < T->t; i++) {
left->children[T->t + i] = right->children[i];
}
}
left->num += T->t;
//destroy right
btree_destroy_node(right);
//node
for (i = idx + 1; i < node->num; i++) {
node->keys[i - 1] = node->keys[i];
node->children[i] = node->children[i + 1];
}
node->children[i + 1] = NULL;
node->num -= 1;
if (node->num == 0) {
T->root = left;
btree_destroy_node(node);
}
}
void btree_delete_key(btree *T, btree_node *node, KEY_TYPE key) {
// If it's deleted null Go straight back to
if (node == NULL) return;
int idx = 0, i;
// find key The location of
while (idx < node->num && key > node->keys[idx]) {
idx++;
}
// If key Is the inner node
if (idx < node->num && key == node->keys[idx]) {
// If the inner node is a leaf node , Direct deletion
if (node->leaf) {
for (i = idx; i < node->num - 1; i++) {
node->keys[i] = node->keys[i + 1];
}
node->keys[node->num - 1] = 0;
node->num--;
if (node->num == 0) {
// If the whole tree is deleted
free(node);
T->root = NULL;
}
return;
}
// Borrow from the previous node
else if (node->children[idx]->num >= T->t) {
btree_node *left = node->children[idx];
node->keys[idx] = left->keys[left->num - 1];
btree_delete_key(T, left, left->keys[left->num - 1]);
}
// The following node borrowing
else if (node->children[idx + 1]->num >= T->t) {
btree_node *right = node->children[idx + 1];
node->keys[idx] = right->keys[0];
btree_delete_key(T, right, right->keys[0]);
}
else {
// Merge
btree_merge(T, node, idx);// Merged on a subtree
btree_delete_key(T, node->children[idx], key);
}
}
else {
//key Not on this floor , Into the lower level
btree_node *child = node->children[idx];
if (child == NULL) {
printf("Cannot del key = %d\n", key);
return;
}
// Explain that you need to borrow
if (child->num == T->t - 1) {
//left child right Three nodes
btree_node *left = NULL;
btree_node *right = NULL;
if (idx - 1 >= 0)
left = node->children[idx - 1];
if (idx + 1 <= node->num)
right = node->children[idx + 1];
// It means that there is a borrowed place
if ((left && left->num >= T->t) || (right && right->num >= T->t)) {
int richR = 0;
if (right) {
richR = 1;
}
// If the right subtree is smaller than the left subtree key many , be richR=1
if (left && right) {
richR = (right->num > left->num) ? 1 : 0;
}
// Borrow from the next
if (right && right->num >= T->t && richR) {
child->keys[child->num] = node->keys[idx];// The key Bring it here , Take the subtree , The first subtree of the right node
child->children[child->num + 1] = right->children[0];
child->num++;
// The parent node borrows the first of the right node key
node->keys[idx] = right->keys[0];
// The right node is borrowed , Delete something
for (i = 0; i < right->num - 1; i++) {
right->keys[i] = right->keys[i + 1];
right->children[i] = right->children[i + 1];
}
right->keys[right->num - 1] = 0;
right->children[right->num - 1] = right->children[right->num];
right->children[right->num] = NULL;
right->num--;
}
// Borrow from the last
else {
for (i = child->num; i > 0; i--) {
child->keys[i] = child->keys[i - 1];
child->children[i + 1] = child->children[i];
}
child->children[1] = child->children[0];
// Bring the last subtree of the left subtree
child->children[0] = left->children[left->num];
// Take the parent node key
child->keys[0] = node->keys[idx - 1];
child->num++;
// The left child tree of the parent node key
node->keys[idx - 1] = left->keys[left->num - 1];
left->keys[left->num - 1] = 0;
left->children[left->num] = NULL;
left->num--;
}
}
// Merge
else if ((!left || (left->num == T->t - 1)) && (!right || (right->num == T->t - 1))) {
// hold node and left Merge
if (left && left->num == T->t - 1) {
btree_merge(T, node, idx - 1);
child = left;
}
// hold node and right Merge
else if (right && right->num == T->t - 1) {
btree_merge(T, node, idx);
}
}
}
// Recursion to the next level
btree_delete_key(T, child, key);
}
}
int btree_delete(btree *T, KEY_TYPE key) {
if (!T->root) return -1;
btree_delete_key(T, T->root, key);
return 0;
}
B Tree insertion , Delete , Traverse , Look for code
#include <stdio.h>
#include <stdlib.h>
//b tree
#define M 3// Even numbers are better , Easy to split . rank
typedef int KEY_TYPE;
//btree node
struct btree_node {
struct btree_node **children;// subtree
KEY_TYPE *keys;// keyword
int num;// Number of keywords
int leaf;// Is it a leaf node 1yes,0no
};
//btree
struct btree {
struct btree_node *root;
int t;
};
// Create a node
struct btree_node *btree_create_node(int t, int leaf) {
struct btree_node *node = (struct btree_node *) calloc(1, sizeof(struct btree_node));
if (node == NULL)return NULL;
node->num = 0;
node->keys = (KEY_TYPE *) calloc(1, (2 * t - 1) * sizeof(KEY_TYPE));
node->children = (struct btree_node **) calloc(1, (2 * t) * sizeof(struct btree_node *));
node->leaf = leaf;
return node;
}
// Destroy a node
struct btree_node *btree_destroy_node(struct btree_node *node) {
if (node) {
if (node->keys) {
free(node->keys);
}
if (node->children) {
free(node->children);
}
free(node);
}
}
// Create a btree
void btree_create(btree *T, int t) {
T->t = t;
struct btree_node *x = btree_create_node(t, 1);
T->root = x;
}
// split
void btree_split_clild(struct btree *T, struct btree_node *x, int i) {
//y Nodes that need to be split
struct btree_node *y = x->children[i];
// New node
struct btree_node *z = btree_create_node(T->t, y->leaf);
int j = 0;
z->num = T->t - 1;
// hold y The second half of key And subtree copy To z in
for (j = 0; j < T->t - 1; j++) {
z->keys[j] = y->keys[j + T->t];
}
if (y->leaf == 0) {
for (j = 0; j < T->t; j++) {
z->children[j] = y->children[j + T->t];
}
}
//y Node modification quantity
y->num = T->t - 1;
// Put the parent node x Add one more key And subtree z
for (j = x->num; j >= i + 1; j--) {
x->children[j + 1] = x->children[j];
}
x->children[i + 1] = z;
for (j = x->num - 1; j >= i; j--) {
x->keys[j + 1] = x->keys[j];
}
x->keys[i] = y->keys[T->t - 1];
x->num++;
}
void btree_insert_nonfull(struct btree *T, struct btree_node *x, KEY_TYPE key) {
int i = x->num - 1;
if (x->leaf) {
while (i >= 0 && x->keys[i] > key) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = key;
x->num++;
}
else {
while (i >= 0 && x->keys[i] > key)i--;
if (x->children[i + 1]->num == 2 * T->t - 1) {
btree_split_clild(T, x, i + 1);
if (key > x->keys[i + 1])i++;
}
btree_insert_nonfull(T, x->children[i + 1], key);
}
}
void btree_insert(struct btree *T, KEY_TYPE key) {
struct btree_node *root = T->root;
// If the root node is full , Root node splitting
if (root->num == 2 * T->t - 1) {
btree_node *node = btree_create_node(T->t, 0);
T->root = node;
node->children[0] = root;
btree_split_clild(T, node, 0);
int i = 0;
if (key > node->keys[0]) i++;
btree_insert_nonfull(T, node->children[i], key);
}
else {
btree_insert_nonfull(T, root, key);
}
}
//{child[idx], key[idx], child[idx+1]}
void btree_merge(btree *T, btree_node *node, int idx) {
// Left and right subtrees
btree_node *left = node->children[idx];
btree_node *right = node->children[idx + 1];
int i = 0;
//data merge
left->keys[T->t - 1] = node->keys[idx];
for (i = 0; i < T->t - 1; i++) {
left->keys[T->t + i] = right->keys[i];
}
if (!left->leaf) {
for (i = 0; i < T->t; i++) {
left->children[T->t + i] = right->children[i];
}
}
left->num += T->t;
//destroy right
btree_destroy_node(right);
//node
for (i = idx + 1; i < node->num; i++) {
node->keys[i - 1] = node->keys[i];
node->children[i] = node->children[i + 1];
}
node->children[i + 1] = NULL;
node->num -= 1;
if (node->num == 0) {
T->root = left;
btree_destroy_node(node);
}
}
void btree_delete_key(btree *T, btree_node *node, KEY_TYPE key) {
// If it's deleted null Go straight back to
if (node == NULL) return;
int idx = 0, i;
// find key The location of
while (idx < node->num && key > node->keys[idx]) {
idx++;
}
// If key Is the inner node
if (idx < node->num && key == node->keys[idx]) {
// If the inner node is a leaf node , Direct deletion
if (node->leaf) {
for (i = idx; i < node->num - 1; i++) {
node->keys[i] = node->keys[i + 1];
}
node->keys[node->num - 1] = 0;
node->num--;
if (node->num == 0) {
// If the whole tree is deleted
free(node);
T->root = NULL;
}
return;
}
// Borrow from the previous node
else if (node->children[idx]->num >= T->t) {
btree_node *left = node->children[idx];
node->keys[idx] = left->keys[left->num - 1];
btree_delete_key(T, left, left->keys[left->num - 1]);
}
// The following node borrowing
else if (node->children[idx + 1]->num >= T->t) {
btree_node *right = node->children[idx + 1];
node->keys[idx] = right->keys[0];
btree_delete_key(T, right, right->keys[0]);
}
else {
// Merge
btree_merge(T, node, idx);// Merged on a subtree
btree_delete_key(T, node->children[idx], key);
}
}
else {
//key Not on this floor , Into the lower level
btree_node *child = node->children[idx];
if (child == NULL) {
printf("Cannot del key = %d\n", key);
return;
}
// Explain that you need to borrow
if (child->num == T->t - 1) {
//left child right Three nodes
btree_node *left = NULL;
btree_node *right = NULL;
if (idx - 1 >= 0)
left = node->children[idx - 1];
if (idx + 1 <= node->num)
right = node->children[idx + 1];
// It means that there is a borrowed place
if ((left && left->num >= T->t) || (right && right->num >= T->t)) {
int richR = 0;
if (right) {
richR = 1;
}
// If the right subtree is smaller than the left subtree key many , be richR=1
if (left && right) {
richR = (right->num > left->num) ? 1 : 0;
}
// Borrow from the next
if (right && right->num >= T->t && richR) {
child->keys[child->num] = node->keys[idx];// The key Bring it here , Take the subtree , The first subtree of the right node
child->children[child->num + 1] = right->children[0];
child->num++;
// The parent node borrows the first of the right node key
node->keys[idx] = right->keys[0];
// The right node is borrowed , Delete something
for (i = 0; i < right->num - 1; i++) {
right->keys[i] = right->keys[i + 1];
right->children[i] = right->children[i + 1];
}
right->keys[right->num - 1] = 0;
right->children[right->num - 1] = right->children[right->num];
right->children[right->num] = NULL;
right->num--;
}
// Borrow from the last
else {
for (i = child->num; i > 0; i--) {
child->keys[i] = child->keys[i - 1];
child->children[i + 1] = child->children[i];
}
child->children[1] = child->children[0];
// Bring the last subtree of the left subtree
child->children[0] = left->children[left->num];
// Take the parent node key
child->keys[0] = node->keys[idx - 1];
child->num++;
// The left child tree of the parent node key
node->keys[idx - 1] = left->keys[left->num - 1];
left->keys[left->num - 1] = 0;
left->children[left->num] = NULL;
left->num--;
}
}
// Merge
else if ((!left || (left->num == T->t - 1)) && (!right || (right->num == T->t - 1))) {
// hold node and left Merge
if (left && left->num == T->t - 1) {
btree_merge(T, node, idx - 1);
child = left;
}
// hold node and right Merge
else if (right && right->num == T->t - 1) {
btree_merge(T, node, idx);
}
}
}
// Recursion to the next level
btree_delete_key(T, child, key);
}
}
int btree_delete(btree *T, KEY_TYPE key) {
if (!T->root) return -1;
btree_delete_key(T, T->root, key);
return 0;
}
void btree_traverse(btree_node *x) {
int i = 0;
for (i = 0; i < x->num; i++) {
if (x->leaf == 0)
btree_traverse(x->children[i]);
printf("%C ", x->keys[i]);
}
if (x->leaf == 0) btree_traverse(x->children[i]);
}
void btree_print(btree *T, btree_node *node, int layer) {
btree_node *p = node;
int i;
if (p) {
printf("\nlayer = %d keynum = %d is_leaf = %d\n", layer, p->num, p->leaf);
for (i = 0; i < node->num; i++)
printf("%c ", p->keys[i]);
printf("\n");
#if 0
printf("%p\n", p);
for(i = 0; i <= 2 * T->t; i++)
printf("%p ", p->childrens[i]);
printf("\n");
#endif
layer++;
for (i = 0; i <= p->num; i++)
if (p->children[i])
btree_print(T, p->children[i], layer);
}
else printf("the tree is empty\n");
}
int btree_bin_search(btree_node *node, int low, int high, KEY_TYPE key) {
int mid;
if (low > high || low < 0 || high < 0) {
return -1;
}
while (low <= high) {
mid = (low + high) / 2;
if (key > node->keys[mid]) {
low = mid + 1;
}
else {
high = mid - 1;
}
}
return low;
}
int main() {
btree T = {
0};
btree_create(&T, 3);
srand(48);
int i = 0;
char key[27] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
for (i = 0; i < 26; i++) {
//key[i] = rand() % 1000;
printf("%c ", key[i]);
btree_insert(&T, key[i]);
}
btree_print(&T, T.root, 0);
for (i = 0; i < 26; i++) {
printf("\n---------------------------------\n");
btree_delete(&T, key[25 - i]);
//btree_traverse(T.root);
btree_print(&T, T.root, 0);
}
}
边栏推荐
- Mui side navigation anchor positioning JS special effect
- Deep learning - make your own dataset
- 用存储过程、定时器、触发器来解决数据分析问题
- List selection JS effect with animation
- 2021年全国平均工资出炉,你达标了吗?
- Machine vision (1) - Overview
- What are the financial products in 2022? What are suitable for beginners?
- SD_DATA_SEND_SHIFT_REGISTER
- TaffyDB开源的JS数据库
- mui侧边导航锚点定位js特效
猜你喜欢
Face recognition attendance system based on Baidu flying plasma platform (easydl)
使用OneDNS完美解决办公网络优化问题
带动画的列表选中js特效
[PaddleSeg源码阅读] PaddleSeg Validation 中添加 Boundary IoU的计算(1)——val.py文件细节提示
Import requirements in batches during Yolo training Txt
五种网络IO模型
Tips of the week 136: unordered containers
JS pull down the curtain JS special effect display layer
swiper左右切换滑块插件
讨论| 坦白局,工业 AR 应用为什么难落地?
随机推荐
mui侧边导航锚点定位js特效
Summary of debian10 system problems
<代码随想录二刷>链表
Mobile app takeout ordering personal center page
性能测试过程和计划
ICer知识点杂烩(后附大量题目,持续更新中)
【蓝桥杯集训100题】scratch从小到大排序 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第17题
如何在软件研发阶段落地安全实践
备份阿里云实例-oss-browser
保证接口数据安全的10种方案
Unlike the relatively short-lived industrial chain of consumer Internet, the industrial chain of industrial Internet is quite long
数学分析_笔记_第11章:Fourier级数
Performance test process and plan
Face recognition attendance system based on Baidu flying plasma platform (easydl)
cf:C. Factorials and Powers of Two【dp + 排序 + 选不选板子 + 选若干个数等于已知和的最少数】
TaffyDB开源的JS数据库
科学家首次观察到“电子漩涡” 有助于设计出更高效的电子产品
In depth understanding of USB communication protocol
同消费互联网的较为短暂的产业链不同,产业互联网的产业链是相当漫长的
通过 Play Integrity API 的 nonce 字段提高应用安全性