当前位置:网站首页>Bloom filter
Bloom filter
2022-07-03 00:17:00 【JunesFour】
The bloon filter
List of articles
1. Use scenarios
- Making ⽤word⽂ Stall time ,word How to tell if a word is spelled correctly .
- ⽹ Collateral climbing ⾍ Program , How to keep it from climbing the same url⻚⾯? Error allowed .
- spam ( SMS ) How to design the filtering algorithm ? Error allowed .
- When the police handle a case , How to judge a suspect ⼈ Whether in ⽹ On the escape list ? Control error False positive rate .
- How to solve the cache penetration problem ? Error allowed .
2. The basic idea
You can go through more than one Hash Function maps an element to several points in a bitmap , Mark these points as true, At query time , By querying whether these points are true, You can judge whether this element exists in the bloom filter . So bloom filter phase ⽐ Traditional query structure ( for example :hash,set,map And so on ) more ⾼ effect , Occupy ⽤ More space
⼩.
Be careful :
- The bloon filter cannot delete elements , Because it does not store specific elements , Only a few points mapped to the element are stored , And each point may be covered by the results of multiple element mapping .
- Bloom filter can only judge that an element does not exist or may exist , When using hash All points mapped by the function are true when , This element may exist , As long as one doesn't true, It must not exist .
3. Implementation of the bloom filter
form
Bitmap (bit Array )+ n individual hash function .
principle
When ⼀ Elements plus ⼊ Bitmap time , adopt k individual hash Function maps this element to the of the bitmap k A little bit , And set them as 1; When retrieving , Re pass k individual hash Function operation to detect the of bitmap k Whether all points are 1; If there is any reason not to 1 The point of , So I don't think it exists ; If it's all 1, There may be ( There is an error ).
There are only two states for each slot in the bitmap (0 perhaps 1),⼀ Slots are set to 1 state , But it is not clear how many times it has been set ; I don't know how many str1 Hash mapping and which hash Function mapping ; So don't ⽀ Hold the delete operation .
4. The design of Bloom filter
In practice, we should ⽤ In the process , How the bloom filter makes ⽤? How many hash function , Bitmap of how much space to allocate , save
How many elements are stored ? In addition, how to control the false positive rate ( The bloan filter can define ⼀ Must not exist , Not clear ⼀ There must be , Then there is
Your judgment is wrong , False positive rate is the probability of wrong judgment )?
We usually use the following four parameters to solve the above problem :
- n – The number of elements in the bloom filter , Pictured above Only str1 and str2 Two elements that n=2.
- p – False positive rate , stay 0-1 Between 0.000000.
- m – Space occupied by bitmap .
- k – hash Number of functions .
The formula is as follows :
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
k = round((m / n) * log(2))
The following two parameters m
and k
From the above two parameters n
and p
To calculate the , You can calculate by yourself , You can also enter the first two values on the relevant website and calculate the last two values :
k individual hash function
Among the above calculation results ,k=23
, In practice, , We won't really choose 23 individual hash function , It's a double hash To simulate 23 individual hash function :
// Mining ⽤⼀ individual hash function , to hash Pass on different species ⼦ Offset value
// #define MIX_UINT64(v) ((uint32_t)((v>>32)^(v)))
uint64_t hash1 = MurmurHash2_x64(key, len, Seed);
uint64_t hash2 = MurmurHash2_x64(key, len, MIX_UINT64(hash1));
for (i = 0; i < k; i++) // k yes hash Number of functions
{
Pos[i] = (hash1 + i*hash2) % m; // m It's a bitmap ⼤⼩
}
边栏推荐
- Bean load control
- Container runtime analysis
- 请问大家在什么网站上能查到英文文献?
- Where can I find the English literature of the thesis (except HowNet)?
- List of major chip Enterprises
- 95页智慧教育解决方案2022
- Mutual exclusion and synchronization of threads
- 洛谷_P1149 [NOIP2008 提高组] 火柴棒等式_枚举打表
- MySQL advanced learning notes (III)
- 带角度的检测框 | 校准的深度特征用于目标检测(附实现源码)
猜你喜欢
[shutter] open the third-party shutter project
JDBC tutorial
67页新型智慧城市整体规划建设方案(附下载)
Request and response
95 pages of smart education solutions 2022
What are the projects of metauniverse and what are the companies of metauniverse
接口差异测试——Diffy工具
35 pages dangerous chemicals safety management platform solution 2022 Edition
Interpretation of new plug-ins | how to enhance authentication capability with forward auth
Architecture: database architecture design
随机推荐
The privatization deployment of SaaS services is the most efficient | cloud efficiency engineer points north
Sourcetree details
Seckill system design
教育学大佬是怎么找外文参考文献的?
Bigder:32/100 测试发现的bug开发认为不是bug怎么处理
Create an interactive experience of popular games, and learn about the real-time voice of paileyun unity
Interface difference test - diffy tool
67 page overall planning and construction plan for a new smart city (download attached)
Leetcode skimming - game 280
Realization of mask recognition based on OpenCV
返回二叉树中最大的二叉搜索子树的大小
maya渔屋建模
How QT exports data to PDF files (qpdfwriter User Guide)
leetcode 650. 2 Keys Keyboard 只有两个键的键盘(中等)
FRP reverse proxy +msf get shell
Difference between NVIDIA n card and amda card
Bypass AV with golang
判断二叉树是否为满二叉树
Program analysis and Optimization - 9 appendix XLA buffer assignment
95页智慧教育解决方案2022