当前位置:网站首页>Bloom filter
Bloom filter
2022-07-03 00:17:00 【JunesFour】
The bloon filter
List of articles
1. Use scenarios
- Making ⽤word⽂ Stall time ,word How to tell if a word is spelled correctly .
- ⽹ Collateral climbing ⾍ Program , How to keep it from climbing the same url⻚⾯? Error allowed .
- spam ( SMS ) How to design the filtering algorithm ? Error allowed .
- When the police handle a case , How to judge a suspect ⼈ Whether in ⽹ On the escape list ? Control error False positive rate .
- How to solve the cache penetration problem ? Error allowed .
2. The basic idea
You can go through more than one Hash Function maps an element to several points in a bitmap , Mark these points as true, At query time , By querying whether these points are true, You can judge whether this element exists in the bloom filter . So bloom filter phase ⽐ Traditional query structure ( for example :hash,set,map And so on ) more ⾼ effect , Occupy ⽤ More space
⼩.
Be careful :
- The bloon filter cannot delete elements , Because it does not store specific elements , Only a few points mapped to the element are stored , And each point may be covered by the results of multiple element mapping .
- Bloom filter can only judge that an element does not exist or may exist , When using hash All points mapped by the function are true when , This element may exist , As long as one doesn't true, It must not exist .
3. Implementation of the bloom filter
form
Bitmap (bit Array )+ n individual hash function .

principle
When ⼀ Elements plus ⼊ Bitmap time , adopt k individual hash Function maps this element to the of the bitmap k A little bit , And set them as 1; When retrieving , Re pass k individual hash Function operation to detect the of bitmap k Whether all points are 1; If there is any reason not to 1 The point of , So I don't think it exists ; If it's all 1, There may be ( There is an error ).
There are only two states for each slot in the bitmap (0 perhaps 1),⼀ Slots are set to 1 state , But it is not clear how many times it has been set ; I don't know how many str1 Hash mapping and which hash Function mapping ; So don't ⽀ Hold the delete operation .

4. The design of Bloom filter
In practice, we should ⽤ In the process , How the bloom filter makes ⽤? How many hash function , Bitmap of how much space to allocate , save
How many elements are stored ? In addition, how to control the false positive rate ( The bloan filter can define ⼀ Must not exist , Not clear ⼀ There must be , Then there is
Your judgment is wrong , False positive rate is the probability of wrong judgment )?
We usually use the following four parameters to solve the above problem :
- n – The number of elements in the bloom filter , Pictured above Only str1 and str2 Two elements that n=2.
- p – False positive rate , stay 0-1 Between 0.000000.
- m – Space occupied by bitmap .
- k – hash Number of functions .
The formula is as follows :
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
k = round((m / n) * log(2))
The following two parameters m and k From the above two parameters n and p To calculate the , You can calculate by yourself , You can also enter the first two values on the relevant website and calculate the last two values :

k individual hash function
Among the above calculation results ,k=23 , In practice, , We won't really choose 23 individual hash function , It's a double hash To simulate 23 individual hash function :
// Mining ⽤⼀ individual hash function , to hash Pass on different species ⼦ Offset value
// #define MIX_UINT64(v) ((uint32_t)((v>>32)^(v)))
uint64_t hash1 = MurmurHash2_x64(key, len, Seed);
uint64_t hash2 = MurmurHash2_x64(key, len, MIX_UINT64(hash1));
for (i = 0; i < k; i++) // k yes hash Number of functions
{
Pos[i] = (hash1 + i*hash2) % m; // m It's a bitmap ⼤⼩
}
边栏推荐
- Returns the maximum distance between two nodes of a binary tree
- Leetcode skimming - game 280
- Mutual exclusion and synchronization of threads
- 经济学外文文献在哪查?
- 返回二叉树中最大的二叉搜索子树的根节点
- Pytorch里面多任务Loss是加起来还是分别backward?
- 写论文可以去哪些网站搜索参考文献?
- 130 pages of PPT from the brick boss introduces the new features of Apache spark 3.2 & 3.3 in depth
- 基于OpenCV实现口罩识别
- Bypass AV with golang
猜你喜欢

Digital collection trading website domestic digital collection trading platform

Difference between NVIDIA n card and amda card

開源了 | 文心大模型ERNIE-Tiny輕量化技術,又准又快,效果全開

JS interviewer wants to know how much you understand call, apply, bind no regrets series

秒杀系统设计
![洛谷_P1149 [NOIP2008 提高组] 火柴棒等式_枚举打表](/img/4a/ab732c41ea8a939fa0983fec475622.png)
洛谷_P1149 [NOIP2008 提高组] 火柴棒等式_枚举打表

Open source | Wenxin big model Ernie tiny lightweight technology, which is accurate and fast, and the effect is fully open

Angled detection frame | calibrated depth feature for target detection (with implementation source code)

带角度的检测框 | 校准的深度特征用于目标检测(附实现源码)

What are the projects of metauniverse and what are the companies of metauniverse
随机推荐
Happy Lantern Festival, how many of these technical lantern riddles can you guess correctly?
Open source | Wenxin big model Ernie tiny lightweight technology, which is accurate and fast, and the effect is fully open
Container runtime analysis
Leetcode relaxation question - day of the week
MFC file operation
How to specify const array in the global scope of rust- How to specify const array in global scope in Rust?
Many to one, one to many processing
zhvoice
Create an interactive experience of popular games, and learn about the real-time voice of paileyun unity
Using tensorflow to realize voiceprint recognition
PR FAQ, what about PR preview video card?
实用系列丨免费可商用视频素材库
MFC gets the current time
Develop knowledge points
来自数砖大佬的 130页 PPT 深入介绍 Apache Spark 3.2 & 3.3 新功能
顶级 DevOps 工具链大盘点
Chinatelecom has maintained a strong momentum in the mobile phone user market, but China Mobile has opened a new track
QT 如何将数据导出成PDF文件(QPdfWriter 使用指南)
【OJ】两个数组的交集(set、哈希映射 ...)
Difference between NVIDIA n card and amda card