当前位置:网站首页>Skip table: principle introduction, advantages and disadvantages of skiplist
Skip table: principle introduction, advantages and disadvantages of skiplist
2022-07-03 05:15:00 【A wild man about to succeed】
skiplist Introduce
There is no strict correspondence between the number of nodes in the two adjacent linked lists , It's a random number of layers for each node (level). such as , It's a random number of nodes 3, Then chain it into the first 1 Layer to tier 3 In the list of three layers . To express clearly , The following figure shows how to form a step-by-step insertion operation skiplist The process of :

If we look for 23

skiplist Performance analysis of the algorithm
skiplist Each insertion is independent , According to the following algorithm ( because random() Random generation of )
The process of calculating random numbers while performing insert operations , It's a critical process , It's right skiplist The statistical characteristics of have a very important impact . This is not an ordinary random number with uniform distribution , Its calculation process is as follows :
First , Every node must have a node 1 Layer pointer ( Every node is in the 1 In the chain of layers ).
If a node has a node i layer (i>=1) The pointer ( That is, the node is already in the 1 Layer to tier i In the layer list ), So it has a number (i+1) The probability of layer pointer is p.
The maximum number of layers of a node is not allowed to exceed a maximum value , Write it down as MaxLevel.
The pseudo code for calculating the number of random layers is shown below :
randomLevel() level := 1 // random() Return to one [0...1) The random number while random() < p and level < MaxLevel do level := level + 1 return level
randomLevel() There are two parameters in the pseudo code of , One is p, One is MaxLevel. stay Redis Of skiplist In the implementation , The values of these two parameters are :
p = 1/4 MaxLevel = 32
Spatial complexity
In this part , Let's briefly analyze skiplist Time complexity and space complexity , So that for skiplist There is an intuitive understanding of the performance of . If you're not particularly paranoid about algorithm performance analysis , So you can skip this section for the time being .
Let's first calculate the average number of pointers per node ( Probability expectation ). The number of pointers a node contains , Equivalent to the extra space cost of this algorithm (overhead), Can be used to measure spatial complexity .
According to the front randomLevel() Pseudo code of , It's easy for us to see , The higher the number of nodes generated , The lower the probability . The quantitative analysis is as follows :
The number of node layers shall be at least 1. But more than 1 The number of node layers of , Satisfy a probability distribution .
The number of node layers is exactly equal to 1 The probability of is 1-p.
The number of node layers is greater than or equal to 2 The probability of is p, And the number of nodes is exactly equal to 2 The probability of is p(1-p).
The number of node layers is greater than or equal to 3 The probability of is p^2, And the number of nodes is exactly equal to 3 The probability of is p^2(1-p).
The number of node layers is greater than or equal to 4 The probability of is p^3, And the number of nodes is exactly equal to 4 The probability of is p^3(1-p).
......
therefore , Average number of layers of a node ( That is, the average number of pointers contained ), The calculation is as follows :

Now it's easy to figure out :
When p=1/2 when , The average number of pointers per node is 2;
When p=1/4 when , The average number of pointers per node is 1.33. This is also Redis Inside skiplist Implementation overhead in space .
Time complexity
Now let's say that from one level to i The node of x set out , You need to climb up to the left k layer . Now we have two possibilities :
If node x There is No (i+1) Layer pointer , So we need to move up . In this case, the probability is p.
If node x No first (i+1) Layer pointer , So we need to go left . In this case, the probability is (1-p).
These two situations are shown in the figure below :

use C(k) It means to climb up k The average search path length of each level ( Probability expectation ), that :
C(0)=0 C(k)=(1-p)×( In the picture above b The search length of ) + p×( In the picture above c The search length of )
Plug in , Get one The difference equation And simplify :
C(k)=(1-p)(C(k)+1) + p(C(k-1)+1) C(k)=1/p+C(k-1) C(k)=k/p
The result means that , Every time we climb 1 A hierarchy , Need to go on the search path 1/p Step . And the total number of layers we need to climb is equal to the whole number of layers skiplist The total number of floors -1.
So next we need to analyze when skiplist There is n When there are nodes , Its total number of floors Probability mean How much is the . This problem is intuitively easy to understand . According to the number of layers of nodes, the random algorithm , Easy to come by :
The first 1 The layer list is fixed with n Nodes ;
The first 2 On average, there are n*p Nodes ;
The first 3 On average, there are n*p^2 Nodes ;
...
therefore , From 1 Floor to top , The average number of nodes in each layer of linked list is an exponential decreasing sequence of equal proportion . Easy to calculate , The average of the total number of layers is log1/pn, The average number of nodes at the highest level is 1/p.
Sum up , In a rough calculation , The average search length is approximately equal to :
C(log1/pn-1)=(log1/pn-1)/p
namely , The average time complexity is zero O(log n).
skiplist Balanced tree 、 Hash table comparison
skiplist And various balance trees ( Such as AVL、 A red-black tree, etc ) The elements of are arranged in order , And the hash table is not ordered . therefore , You can only do a single... On a hash table key Lookup , It's not suitable to do range search . So called range search , This refers to finding all nodes whose size is between two specified values .
When doing range finding , Balance the tree ratio skiplist The operation should be complicated . On the balance tree , After we find the small value of the specified range , We also need to continue to search for other nodes with no more than a large value in the order of traversal . If the balance tree is not modified to some extent , The middle order traversal here is not easy to achieve . And in the skiplist It's very easy to do range lookup on , Just after finding the small value , Right. 1 Layer list can be traversed in several steps .
The insertion and deletion of the balance tree may Adjust the subtree , Complex logic , and skiplist The insertion and deletion of only need to modify the fingers of adjacent nodes The needle , Simple and fast operation .
In terms of memory usage ,skiplist More flexible than a balanced tree . Generally speaking , Each node of the balance tree contains 2 A pointer to the ( Point to the left and right subtrees respectively ), and skiplist Each node contains an average of 1/(1-p), It depends on the parameters p Size . If image Redis The realization is the same in , take p=1/4, So the average node contains 1.33 A pointer to the , Better than a balanced tree .
Find a single key,skiplist And the time complexity of the balance tree is O(log n), It's about the same ; and The hash table keeps a low hash value collision probability Under the premise of , Find time complexity near O(1), Higher performance some . So we usually use all kinds of Map or dictionary structure , Most of them are based on hash table .
Compare the difficulty of algorithm implementation ,skiplist It's much simpler than a balanced tree .
Redis Why skiplist Instead of a balance tree ?
In front of us, we are concerned about skiplist And the balance tree 、 Hash table comparison , In fact, it is not difficult to see Redis Use in skiplist The reason why we don't need a balanced tree . Now let's see , For this question ,Redis The author of @antirez How do you say it :
There are a few reasons:
\1) They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
\2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
\3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.
边栏推荐
- Go practice -- use JWT (JSON web token) in golang
- Notes | numpy-10 Iterative array
- 1110 complete binary tree (25 points)
- Problems encountered in fuzzy query of SQL statements
- RT thread flow notes I startup, schedule, thread
- 【批处理DOS-CMD命令-汇总和小结】-CMD窗口的设置与操作命令-关闭cmd窗口、退出cmd环境(exit、exit /b、goto :eof)
- Overview of basic knowledge of C language
- Go practice -- factory mode of design patterns in golang (simple factory, factory method, abstract factory)
- Yolov5 input (II) | CSDN creative punch in
- Web APIs exclusivity
猜你喜欢

JS scope

Handler understands the record

微服务常见面试题

Go practice -- design patterns in golang's singleton

Coordinatorlayout appbarrayout recyclerview item exposure buried point misalignment analysis

Compile and decompile GCC common instructions
![[clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree](/img/0f/bc8c44aee7a2c9dccac050b1060017.jpg)
[clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree
![[research materials] annual report of China's pension market in 2021 - Download attached](/img/24/622aeeb38de16ac84128b362ceeddb.jpg)
[research materials] annual report of China's pension market in 2021 - Download attached

Ueditor, FCKeditor, kindeditor editor vulnerability

JS dynamic table creation
随机推荐
XML配置文件
Webapidom get page elements
Webrtc M96 release notes (SDP abolishes Plan B and supports opus red redundant coding)
[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached
[backtrader source code analysis 4] use Python to rewrite the first function of backtrader: time2num, which improves the efficiency by 2.2 times
1110 complete binary tree (25 points)
Dynamic programming - related concepts, (tower problem)
Detailed explanation of yolov5 training own data set
谷歌 | 蛋白序列的深度嵌入和比对
Go practice -- closures in golang (anonymous functions, closures)
Audio Focus Series: write a demo to understand audio focus and audiomananger
Go practice -- design patterns in golang's singleton
Pan details of deep learning
Redis Introduction et explication des types de données
Force GCC to compile 32-bit programs on 64 bit platform
Handler understands the record
编译GCC遇到的“pthread.h” not found问题
大学校园IP网络广播-厂家基于校园局域网的大学校园IP广播方案设计指南
How to connect the network: Chapter 1 CSDN creation punch in
JS scope