当前位置:网站首页>Bloom filter
Bloom filter
2022-07-03 00:17:00 【JunesFour】
The bloon filter
List of articles
1. Use scenarios
- Making ⽤word⽂ Stall time ,word How to tell if a word is spelled correctly .
- ⽹ Collateral climbing ⾍ Program , How to keep it from climbing the same url⻚⾯? Error allowed .
- spam ( SMS ) How to design the filtering algorithm ? Error allowed .
- When the police handle a case , How to judge a suspect ⼈ Whether in ⽹ On the escape list ? Control error False positive rate .
- How to solve the cache penetration problem ? Error allowed .
2. The basic idea
You can go through more than one Hash Function maps an element to several points in a bitmap , Mark these points as true, At query time , By querying whether these points are true, You can judge whether this element exists in the bloom filter . So bloom filter phase ⽐ Traditional query structure ( for example :hash,set,map And so on ) more ⾼ effect , Occupy ⽤ More space
⼩.
Be careful :
- The bloon filter cannot delete elements , Because it does not store specific elements , Only a few points mapped to the element are stored , And each point may be covered by the results of multiple element mapping .
- Bloom filter can only judge that an element does not exist or may exist , When using hash All points mapped by the function are true when , This element may exist , As long as one doesn't true, It must not exist .
3. Implementation of the bloom filter
form
Bitmap (bit Array )+ n individual hash function .

principle
When ⼀ Elements plus ⼊ Bitmap time , adopt k individual hash Function maps this element to the of the bitmap k A little bit , And set them as 1; When retrieving , Re pass k individual hash Function operation to detect the of bitmap k Whether all points are 1; If there is any reason not to 1 The point of , So I don't think it exists ; If it's all 1, There may be ( There is an error ).
There are only two states for each slot in the bitmap (0 perhaps 1),⼀ Slots are set to 1 state , But it is not clear how many times it has been set ; I don't know how many str1 Hash mapping and which hash Function mapping ; So don't ⽀ Hold the delete operation .

4. The design of Bloom filter
In practice, we should ⽤ In the process , How the bloom filter makes ⽤? How many hash function , Bitmap of how much space to allocate , save
How many elements are stored ? In addition, how to control the false positive rate ( The bloan filter can define ⼀ Must not exist , Not clear ⼀ There must be , Then there is
Your judgment is wrong , False positive rate is the probability of wrong judgment )?
We usually use the following four parameters to solve the above problem :
- n – The number of elements in the bloom filter , Pictured above Only str1 and str2 Two elements that n=2.
- p – False positive rate , stay 0-1 Between 0.000000.
- m – Space occupied by bitmap .
- k – hash Number of functions .
The formula is as follows :
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
k = round((m / n) * log(2))
The following two parameters m and k From the above two parameters n and p To calculate the , You can calculate by yourself , You can also enter the first two values on the relevant website and calculate the last two values :

k individual hash function
Among the above calculation results ,k=23 , In practice, , We won't really choose 23 individual hash function , It's a double hash To simulate 23 individual hash function :
// Mining ⽤⼀ individual hash function , to hash Pass on different species ⼦ Offset value
// #define MIX_UINT64(v) ((uint32_t)((v>>32)^(v)))
uint64_t hash1 = MurmurHash2_x64(key, len, Seed);
uint64_t hash2 = MurmurHash2_x64(key, len, MIX_UINT64(hash1));
for (i = 0; i < k; i++) // k yes hash Number of functions
{
Pos[i] = (hash1 + i*hash2) % m; // m It's a bitmap ⼤⼩
}
边栏推荐
- Hit the industry directly! The propeller launched the industry's first model selection tool
- yolov5detect. Py comment
- Create an interactive experience of popular games, and learn about the real-time voice of paileyun unity
- sysdig分析容器系统调用
- MFC文件操作
- TypeError: Cannot read properties of undefined (reading ***)
- CADD course learning (4) -- obtaining proteins without crystal structure (Swiss model)
- cocospods 的使用
- 论文的英文文献在哪找(除了知网)?
- FRP reverse proxy +msf get shell
猜你喜欢

Digital twin smart factory develops digital twin factory solutions

Open source | Wenxin big model Ernie tiny lightweight technology, which is accurate and fast, and the effect is fully open

TypeError: Cannot read properties of undefined (reading ***)

How do educators find foreign language references?

The privatization deployment of SaaS services is the most efficient | cloud efficiency engineer points north

QT 如何将数据导出成PDF文件(QPdfWriter 使用指南)

Happy Lantern Festival, how many of these technical lantern riddles can you guess correctly?

Architecture: load balancing

35页危化品安全管理平台解决方案2022版

Seckill system design
随机推荐
開源了 | 文心大模型ERNIE-Tiny輕量化技術,又准又快,效果全開
95页智慧教育解决方案2022
返回二叉树中最大的二叉搜索子树的大小
Installing redis under Linux
Where can I check the foreign literature of economics?
SQL query statement parameters are written successfully
来自数砖大佬的 130页 PPT 深入介绍 Apache Spark 3.2 & 3.3 新功能
[OJ] intersection of two arrays (set, hash mapping...)
Leetcode DP three step problem
Container runtime analysis
Create an interactive experience of popular games, and learn about the real-time voice of paileyun unity
The privatization deployment of SaaS services is the most efficient | cloud efficiency engineer points north
开源了 | 文心大模型ERNIE-Tiny轻量化技术,又准又快,效果全开
写论文可以去哪些网站搜索参考文献?
Bigder:32/100 测试发现的bug开发认为不是bug怎么处理
Luogu_ P1149 [noip2008 improvement group] matchstick equation_ Enumeration and tabulation
经济学外文文献在哪查?
JVM foundation review
In February 2022, the ranking list of domestic databases: oceanbase regained its popularity with "three consecutive increases", and gaussdb is expected to achieve the largest increase this month
What is the standard format of a 2000-3000 word essay for college students' classroom homework?