当前位置：网站首页>Disk storage chain B-tree and b+ tree

Disk storage chain B-tree and b+ tree

2022-07-07 18:12:00 【cheems~】

Disk storage chain B Trees and B+ Trees

Preface
Disk structure analysis and data storage principle
B The definition of a tree
B Trees and B+ The difference between trees
B Two kinds of splitting of tree insertion
B The borrowing before and after tree deletion is merged with the node
B Tree insertion , Delete , Traverse , Look for code

Preface

In this paper, b Trees and b+ Trees . Also on b The insertion and deletion of trees are introduced in detail , The code is attached at the end of the text .

The knowledge points of this column are through Zero sound education Online learning , Make a summary and write an article , Yes c/c++linux Readers interested in the course , You can click on the link C/C++ Background advanced server course introduction Check the service of the course in detail .

Disk structure analysis and data storage principle

We know that the common data structures are linked lists , Trees , Pictures and so on , And trees can be divided into binary trees , Multi fork tree and so on . For a linked list , It can find the next node , The tree can not only find the next data node , The tree can find two , Find three , Find more ( Several forks ), If a node has several forks, you can find several nodes , Through a node, you can find n Nodes , This is a tree structure .

So why should there be a multi fork tree ？ Academically , Usually interpreted as ： The tree is for descending . Then why should we lower the floor ？

For a multitree , Within a node , There may be more than one key, Therefore, multi tree can not improve the efficiency of query （ Compared with a binary tree ）. But it has one advantage ,“ Within a node , There may be more than one key”, This means that the number of nodes of a multi tree is relatively small , Since the number of nodes is small , Then the number of times to find nodes is less .

Be careful , The function of multi fork tree ： Reduce floor height , Reduce the number of nodes , Then the number of times to find nodes will be reduced .

We know cpu Can directly access memory , Instead of directly accessing the disk . The computer sends an access instruction to the disk , The disk finds the corresponding disk track sector through rotation, and then reads it out and puts it into memory . Why is the disk slow , It's because the disk addressing speed is slow . Every address , The disk will rotate once . One access of memory is about 10ns, One access of disk is 100ms. For in memory , Multi fork tree has little effect , But for disks , If a node is saved 10 individual key, You can address many times less . So the function of multi tree is to reduce the layer height in order to reduce the number of addressing . This is why disk storage is suitable b Tree or b+ The reason for the tree .

Multiforked tree -> Reduce floor height -> Reduce the number of nodes -> Reduce disk IO -> Lifting performance

B The definition of a tree

One M rank B Trees T, The following conditions are met

Each node has at most M Class subtree
The root node has at least two subtrees
Except for the root node , Every other branch node has at least M/2 Class subtree
All the leaf nodes are on the same layer
Yes k The branch nodes of Kezi trees exist k-1 Key words , Keywords are sorted in ascending order
The number of keywords meets ceil( M/2 ) - 1 <= n <= M-1

B Trees and B+ The difference between trees

In the actual disk storage, we often choose b+ Trees

b+ Trees compare with b The advantages of trees ：

Keywords do not save data , Index only , All data is saved in the leaf node （b A tree holds data for each keyword ）
b+ The leaf nodes of the tree have pointers , And the leaf nodes themselves are connected in the order of keywords from small to large （ For range queries ）
b+ The middle node of the tree does not save data , So disk pages can hold more node elements , more “ Short and fat ”

B Two kinds of splitting of tree insertion

b The tree is in the process of inserting , Will check whether the current node can be split from top to bottom , If the keywords are full (k=M-1) Then split first , Insert again . And the insertion will be inserted into the leaf node .b There are two kinds of splits when a tree is inserted , One is root node splitting , One is non root node splitting .

Non root node splitting

Non root node splitting process ：
1. Create a new node , Set to Z, The original node is set to Y
2. take Y The second half of the keyword and subtree are assigned to Z
3. take Y The key word in the middle key To the parent node
4. Parent node x Add a keyword key And subtree Z

Insert picture description here

Root node splitting

Root node splitting process ：
1. Create a new node , Set to node
2. take b Treelike root Point to node
3. node The first subtree of points to the previous root node
4. Now? , Root node splitting is transformed into non root node splitting
5. Perform the non root node splitting process

Insert picture description here

Insert split code

// split 
void btree_split_clild(struct btree *T, struct btree_node *x, int i) {
    
    //y  Nodes that need to be split 
    struct btree_node *y = x->children[i];
    // New node 
    struct btree_node *z = btree_create_node(T->t, y->leaf);
    int j = 0;
    z->num = T->t - 1;
    // hold y The second half of key And subtree copy To z in 
    for (j = 0; j < T->t - 1; j++) {
    
        z->keys[j] = y->keys[j + T->t];
    }
    if (y->leaf == 0) {
    
        for (j = 0; j < T->t; j++) {
    
            z->children[j] = y->children[j + T->t];
        }
    }
    //y Node modification quantity 
    y->num = T->t - 1;
    // Put the parent node x Add one more key And subtree z
    for (j = x->num; j >= i + 1; j--) {
    
        x->children[j + 1] = x->children[j];
    }
    x->children[i + 1] = z;
    for (j = x->num - 1; j >= i; j--) {
    
        x->keys[j + 1] = x->keys[j];
    }
    x->keys[i] = y->keys[T->t - 1];
    x->num++;
}

void btree_insert_nonfull(struct btree *T, struct btree_node *x, KEY_TYPE key) {
    
    int i = x->num - 1;
    if (x->leaf) {
    
        while (i >= 0 && x->keys[i] > key) {
    
            x->keys[i + 1] = x->keys[i];
            i--;
        }
        x->keys[i + 1] = key;
        x->num++;
    }
    else {
    
        while (i >= 0 && x->keys[i] > key)i--;
        if (x->children[i + 1]->num == 2 * T->t - 1) {
    
            btree_split_clild(T, x, i + 1);
            if (key > x->keys[i + 1])i++;
        }
        btree_insert_nonfull(T, x->children[i + 1], key);
    }
}

void btree_insert(struct btree *T, KEY_TYPE key) {
    
    struct btree_node *root = T->root;
    // If the root node is full , The root node splits and then inserts 
    if (root->num == 2 * T->t - 1) {
    
        btree_node *node = btree_create_node(T->t, 0);
        T->root = node;
        node->children[0] = root;
        btree_split_clild(T, node, 0);
        int i = 0;
        if (key > node->keys[0]) i++;
        btree_insert_nonfull(T, node->children[i], key);
    }
    else {
    
        btree_insert_nonfull(T, root, key);
    }
}

B The borrowing before and after tree deletion is merged with the node

Why delete keywords to borrow or merge ？ Because we need to meet b The definition of a tree

Determine the number of sub tree keywords

The two adjacent sub trees are M/2-1 , Then merge
The left subtree is larger than M/2-1, Borrow to the left
The right subtree is larger than M/2-1, Borrow right

Keywords are in leaf nodes , Delete directly

Keywords are in leaf nodes

Delete directly

Insert picture description here

The current node is an inner node , And the left child at least includes M/2 Key words , Then borrow and delete

The current node is an inner node , And the left child at least includes M/2 Key words

Excuse me first
In the delete

Insert picture description here

The current node is an inner node , And the right child at least includes M/2 Key words , Then borrow backward and delete

The current node is an inner node , And the right child at least includes M/2 Key words

Excuse me first
In the delete

Insert picture description here

Both children are M/2-1 Key words , Then merge and delete

Both children are M/2-1 The key

Merge first
And then delete

Insert picture description here

Delete merge code

//{child[idx], key[idx], child[idx+1]}
void btree_merge(btree *T, btree_node *node, int idx) {
    
    // Left and right subtrees 
    btree_node *left = node->children[idx];
    btree_node *right = node->children[idx + 1];

    int i = 0;

    //data merge
    left->keys[T->t - 1] = node->keys[idx];
    for (i = 0; i < T->t - 1; i++) {
    
        left->keys[T->t + i] = right->keys[i];
    }
    if (!left->leaf) {
    
        for (i = 0; i < T->t; i++) {
    
            left->children[T->t + i] = right->children[i];
        }
    }
    left->num += T->t;

    //destroy right
    btree_destroy_node(right);

    //node
    for (i = idx + 1; i < node->num; i++) {
    
        node->keys[i - 1] = node->keys[i];
        node->children[i] = node->children[i + 1];
    }
    node->children[i + 1] = NULL;
    node->num -= 1;

    if (node->num == 0) {
    
        T->root = left;
        btree_destroy_node(node);
    }
}

void btree_delete_key(btree *T, btree_node *node, KEY_TYPE key) {
    
    // If it's deleted null Go straight back to 
    if (node == NULL) return;
    int idx = 0, i;
    // find key The location of 
    while (idx < node->num && key > node->keys[idx]) {
    
        idx++;
    }
    // If key Is the inner node 
    if (idx < node->num && key == node->keys[idx]) {
    
        // If the inner node is a leaf node , Direct deletion 
        if (node->leaf) {
    
            for (i = idx; i < node->num - 1; i++) {
    
                node->keys[i] = node->keys[i + 1];
            }
            node->keys[node->num - 1] = 0;
            node->num--;
            if (node->num == 0) {
     // If the whole tree is deleted 
                free(node);
                T->root = NULL;
            }
            return;
        }
            // Borrow from the previous node 
        else if (node->children[idx]->num >= T->t) {
    
            btree_node *left = node->children[idx];
            node->keys[idx] = left->keys[left->num - 1];
            btree_delete_key(T, left, left->keys[left->num - 1]);
        }
            // The following node borrowing 
        else if (node->children[idx + 1]->num >= T->t) {
    
            btree_node *right = node->children[idx + 1];
            node->keys[idx] = right->keys[0];
            btree_delete_key(T, right, right->keys[0]);
        }
        else {
    // Merge 
            btree_merge(T, node, idx);// Merged on a subtree 
            btree_delete_key(T, node->children[idx], key);
        }
    }
    else {
    
        //key Not on this floor , Into the lower level 
        btree_node *child = node->children[idx];
        if (child == NULL) {
    
            printf("Cannot del key = %d\n", key);
            return;
        }
        // Explain that you need to borrow 
        if (child->num == T->t - 1) {
    
            //left child right Three nodes 
            btree_node *left = NULL;
            btree_node *right = NULL;
            if (idx - 1 >= 0)
                left = node->children[idx - 1];
            if (idx + 1 <= node->num)
                right = node->children[idx + 1];
            // It means that there is a borrowed place 
            if ((left && left->num >= T->t) || (right && right->num >= T->t)) {
    
                int richR = 0;
                if (right) {
    
                    richR = 1;
                }
                // If the right subtree is smaller than the left subtree key many , be richR=1
                if (left && right) {
    
                    richR = (right->num > left->num) ? 1 : 0;
                }
                // Borrow from the next 
                if (right && right->num >= T->t && richR) {
    
                    child->keys[child->num] = node->keys[idx];// The key Bring it here , Take the subtree , The first subtree of the right node 
                    child->children[child->num + 1] = right->children[0];
                    child->num++;
                    // The parent node borrows the first of the right node key
                    node->keys[idx] = right->keys[0];
                    // The right node is borrowed , Delete something 
                    for (i = 0; i < right->num - 1; i++) {
    
                        right->keys[i] = right->keys[i + 1];
                        right->children[i] = right->children[i + 1];
                    }
                    right->keys[right->num - 1] = 0;
                    right->children[right->num - 1] = right->children[right->num];
                    right->children[right->num] = NULL;
                    right->num--;
                }
                    // Borrow from the last 
                else {
    
                    for (i = child->num; i > 0; i--) {
    
                        child->keys[i] = child->keys[i - 1];
                        child->children[i + 1] = child->children[i];
                    }
                    child->children[1] = child->children[0];
                    // Bring the last subtree of the left subtree 
                    child->children[0] = left->children[left->num];
                    // Take the parent node key
                    child->keys[0] = node->keys[idx - 1];
                    child->num++;
                    // The left child tree of the parent node key
                    node->keys[idx - 1] = left->keys[left->num - 1];
                    left->keys[left->num - 1] = 0;
                    left->children[left->num] = NULL;
                    left->num--;
                }
            }
                // Merge 
            else if ((!left || (left->num == T->t - 1)) && (!right || (right->num == T->t - 1))) {
    
                // hold node and left Merge 
                if (left && left->num == T->t - 1) {
    
                    btree_merge(T, node, idx - 1);
                    child = left;
                }
                    // hold node and right Merge 
                else if (right && right->num == T->t - 1) {
    
                    btree_merge(T, node, idx);
                }
            }
        }
        // Recursion to the next level 
        btree_delete_key(T, child, key);
    }
}


int btree_delete(btree *T, KEY_TYPE key) {
    
    if (!T->root) return -1;
    btree_delete_key(T, T->root, key);
    return 0;
}

B Tree insertion , Delete , Traverse , Look for code

#include <stdio.h>
#include <stdlib.h>

//b tree


#define M 3// Even numbers are better , Easy to split . rank 
typedef int KEY_TYPE;

//btree node 
struct btree_node {
    
    struct btree_node **children;// subtree 
    KEY_TYPE *keys;// keyword 
    int num;// Number of keywords 
    int leaf;// Is it a leaf node  1yes,0no
};
//btree
struct btree {
    
    struct btree_node *root;
    int t;
};

// Create a node 
struct btree_node *btree_create_node(int t, int leaf) {
    
    struct btree_node *node = (struct btree_node *) calloc(1, sizeof(struct btree_node));
    if (node == NULL)return NULL;

    node->num = 0;
    node->keys = (KEY_TYPE *) calloc(1, (2 * t - 1) * sizeof(KEY_TYPE));
    node->children = (struct btree_node **) calloc(1, (2 * t) * sizeof(struct btree_node *));
    node->leaf = leaf;

    return node;
}

// Destroy a node 
struct btree_node *btree_destroy_node(struct btree_node *node) {
    
    if (node) {
    
        if (node->keys) {
    
            free(node->keys);
        }
        if (node->children) {
    
            free(node->children);
        }
        free(node);
    }
}

// Create a btree
void btree_create(btree *T, int t) {
    
    T->t = t;
    struct btree_node *x = btree_create_node(t, 1);
    T->root = x;
}

// split 
void btree_split_clild(struct btree *T, struct btree_node *x, int i) {
    
    //y  Nodes that need to be split 
    struct btree_node *y = x->children[i];
    // New node 
    struct btree_node *z = btree_create_node(T->t, y->leaf);
    int j = 0;
    z->num = T->t - 1;
    // hold y The second half of key And subtree copy To z in 
    for (j = 0; j < T->t - 1; j++) {
    
        z->keys[j] = y->keys[j + T->t];
    }
    if (y->leaf == 0) {
    
        for (j = 0; j < T->t; j++) {
    
            z->children[j] = y->children[j + T->t];
        }
    }
    //y Node modification quantity 
    y->num = T->t - 1;
    // Put the parent node x Add one more key And subtree z
    for (j = x->num; j >= i + 1; j--) {
    
        x->children[j + 1] = x->children[j];
    }
    x->children[i + 1] = z;
    for (j = x->num - 1; j >= i; j--) {
    
        x->keys[j + 1] = x->keys[j];
    }
    x->keys[i] = y->keys[T->t - 1];
    x->num++;
}

void btree_insert_nonfull(struct btree *T, struct btree_node *x, KEY_TYPE key) {
    
    int i = x->num - 1;
    if (x->leaf) {
    
        while (i >= 0 && x->keys[i] > key) {
    
            x->keys[i + 1] = x->keys[i];
            i--;
        }
        x->keys[i + 1] = key;
        x->num++;
    }
    else {
    
        while (i >= 0 && x->keys[i] > key)i--;
        if (x->children[i + 1]->num == 2 * T->t - 1) {
    
            btree_split_clild(T, x, i + 1);
            if (key > x->keys[i + 1])i++;
        }
        btree_insert_nonfull(T, x->children[i + 1], key);
    }
}

void btree_insert(struct btree *T, KEY_TYPE key) {
    
    struct btree_node *root = T->root;
    // If the root node is full , Root node splitting 
    if (root->num == 2 * T->t - 1) {
    
        btree_node *node = btree_create_node(T->t, 0);
        T->root = node;
        node->children[0] = root;
        btree_split_clild(T, node, 0);
        int i = 0;
        if (key > node->keys[0]) i++;
        btree_insert_nonfull(T, node->children[i], key);

    }
    else {
    
        btree_insert_nonfull(T, root, key);
    }
}

//{child[idx], key[idx], child[idx+1]}
void btree_merge(btree *T, btree_node *node, int idx) {
    
    // Left and right subtrees 
    btree_node *left = node->children[idx];
    btree_node *right = node->children[idx + 1];

    int i = 0;

    //data merge
    left->keys[T->t - 1] = node->keys[idx];
    for (i = 0; i < T->t - 1; i++) {
    
        left->keys[T->t + i] = right->keys[i];
    }
    if (!left->leaf) {
    
        for (i = 0; i < T->t; i++) {
    
            left->children[T->t + i] = right->children[i];
        }
    }
    left->num += T->t;

    //destroy right
    btree_destroy_node(right);

    //node
    for (i = idx + 1; i < node->num; i++) {
    
        node->keys[i - 1] = node->keys[i];
        node->children[i] = node->children[i + 1];
    }
    node->children[i + 1] = NULL;
    node->num -= 1;

    if (node->num == 0) {
    
        T->root = left;
        btree_destroy_node(node);
    }
}

void btree_delete_key(btree *T, btree_node *node, KEY_TYPE key) {
    
    // If it's deleted null Go straight back to 
    if (node == NULL) return;
    int idx = 0, i;
    // find key The location of 
    while (idx < node->num && key > node->keys[idx]) {
    
        idx++;
    }
    // If key Is the inner node 
    if (idx < node->num && key == node->keys[idx]) {
    
        // If the inner node is a leaf node , Direct deletion 
        if (node->leaf) {
    
            for (i = idx; i < node->num - 1; i++) {
    
                node->keys[i] = node->keys[i + 1];
            }
            node->keys[node->num - 1] = 0;
            node->num--;
            if (node->num == 0) {
     // If the whole tree is deleted 
                free(node);
                T->root = NULL;
            }
            return;
        }
            // Borrow from the previous node 
        else if (node->children[idx]->num >= T->t) {
    
            btree_node *left = node->children[idx];
            node->keys[idx] = left->keys[left->num - 1];
            btree_delete_key(T, left, left->keys[left->num - 1]);
        }
            // The following node borrowing 
        else if (node->children[idx + 1]->num >= T->t) {
    
            btree_node *right = node->children[idx + 1];
            node->keys[idx] = right->keys[0];
            btree_delete_key(T, right, right->keys[0]);
        }
        else {
    // Merge 
            btree_merge(T, node, idx);// Merged on a subtree 
            btree_delete_key(T, node->children[idx], key);
        }
    }
    else {
    
        //key Not on this floor , Into the lower level 
        btree_node *child = node->children[idx];
        if (child == NULL) {
    
            printf("Cannot del key = %d\n", key);
            return;
        }
        // Explain that you need to borrow 
        if (child->num == T->t - 1) {
    
            //left child right Three nodes 
            btree_node *left = NULL;
            btree_node *right = NULL;
            if (idx - 1 >= 0)
                left = node->children[idx - 1];
            if (idx + 1 <= node->num)
                right = node->children[idx + 1];
            // It means that there is a borrowed place 
            if ((left && left->num >= T->t) || (right && right->num >= T->t)) {
    
                int richR = 0;
                if (right) {
    
                    richR = 1;
                }
                // If the right subtree is smaller than the left subtree key many , be richR=1
                if (left && right) {
    
                    richR = (right->num > left->num) ? 1 : 0;
                }
                // Borrow from the next 
                if (right && right->num >= T->t && richR) {
    
                    child->keys[child->num] = node->keys[idx];// The key Bring it here , Take the subtree , The first subtree of the right node 
                    child->children[child->num + 1] = right->children[0];
                    child->num++;
                    // The parent node borrows the first of the right node key
                    node->keys[idx] = right->keys[0];
                    // The right node is borrowed , Delete something 
                    for (i = 0; i < right->num - 1; i++) {
    
                        right->keys[i] = right->keys[i + 1];
                        right->children[i] = right->children[i + 1];
                    }
                    right->keys[right->num - 1] = 0;
                    right->children[right->num - 1] = right->children[right->num];
                    right->children[right->num] = NULL;
                    right->num--;
                }
                    // Borrow from the last 
                else {
    
                    for (i = child->num; i > 0; i--) {
    
                        child->keys[i] = child->keys[i - 1];
                        child->children[i + 1] = child->children[i];
                    }
                    child->children[1] = child->children[0];
                    // Bring the last subtree of the left subtree 
                    child->children[0] = left->children[left->num];
                    // Take the parent node key
                    child->keys[0] = node->keys[idx - 1];
                    child->num++;
                    // The left child tree of the parent node key
                    node->keys[idx - 1] = left->keys[left->num - 1];
                    left->keys[left->num - 1] = 0;
                    left->children[left->num] = NULL;
                    left->num--;
                }
            }
                // Merge 
            else if ((!left || (left->num == T->t - 1)) && (!right || (right->num == T->t - 1))) {
    
                // hold node and left Merge 
                if (left && left->num == T->t - 1) {
    
                    btree_merge(T, node, idx - 1);
                    child = left;
                }
                    // hold node and right Merge 
                else if (right && right->num == T->t - 1) {
    
                    btree_merge(T, node, idx);
                }
            }
        }
        // Recursion to the next level 
        btree_delete_key(T, child, key);
    }
}


int btree_delete(btree *T, KEY_TYPE key) {
    
    if (!T->root) return -1;
    btree_delete_key(T, T->root, key);
    return 0;
}

void btree_traverse(btree_node *x) {
    
    int i = 0;

    for (i = 0; i < x->num; i++) {
    
        if (x->leaf == 0)
            btree_traverse(x->children[i]);
        printf("%C ", x->keys[i]);
    }

    if (x->leaf == 0) btree_traverse(x->children[i]);
}

void btree_print(btree *T, btree_node *node, int layer) {
    
    btree_node *p = node;
    int i;
    if (p) {
    
        printf("\nlayer = %d keynum = %d is_leaf = %d\n", layer, p->num, p->leaf);
        for (i = 0; i < node->num; i++)
            printf("%c ", p->keys[i]);
        printf("\n");
#if 0
        printf("%p\n", p);
        for(i = 0; i <= 2 * T->t; i++)
            printf("%p ", p->childrens[i]);
        printf("\n");
#endif
        layer++;
        for (i = 0; i <= p->num; i++)
            if (p->children[i])
                btree_print(T, p->children[i], layer);
    }
    else printf("the tree is empty\n");
}


int btree_bin_search(btree_node *node, int low, int high, KEY_TYPE key) {
    
    int mid;
    if (low > high || low < 0 || high < 0) {
    
        return -1;
    }
    while (low <= high) {
    
        mid = (low + high) / 2;
        if (key > node->keys[mid]) {
    
            low = mid + 1;
        }
        else {
    
            high = mid - 1;
        }
    }

    return low;
}

int main() {
    
    btree T = {
    0};

    btree_create(&T, 3);
    srand(48);

    int i = 0;
    char key[27] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    for (i = 0; i < 26; i++) {
    
        //key[i] = rand() % 1000;
        printf("%c ", key[i]);
        btree_insert(&T, key[i]);
    }
    btree_print(&T, T.root, 0);
    for (i = 0; i < 26; i++) {
    
        printf("\n---------------------------------\n");
        btree_delete(&T, key[25 - i]);
        //btree_traverse(T.root);
        btree_print(&T, T.root, 0);
    }
}