当前位置:网站首页>Bloom filter
Bloom filter
2022-07-03 00:17:00 【JunesFour】
The bloon filter
List of articles
1. Use scenarios
- Making ⽤word⽂ Stall time ,word How to tell if a word is spelled correctly .
- ⽹ Collateral climbing ⾍ Program , How to keep it from climbing the same url⻚⾯? Error allowed .
- spam ( SMS ) How to design the filtering algorithm ? Error allowed .
- When the police handle a case , How to judge a suspect ⼈ Whether in ⽹ On the escape list ? Control error False positive rate .
- How to solve the cache penetration problem ? Error allowed .
2. The basic idea
You can go through more than one Hash Function maps an element to several points in a bitmap , Mark these points as true, At query time , By querying whether these points are true, You can judge whether this element exists in the bloom filter . So bloom filter phase ⽐ Traditional query structure ( for example :hash,set,map And so on ) more ⾼ effect , Occupy ⽤ More space
⼩.
Be careful :
- The bloon filter cannot delete elements , Because it does not store specific elements , Only a few points mapped to the element are stored , And each point may be covered by the results of multiple element mapping .
- Bloom filter can only judge that an element does not exist or may exist , When using hash All points mapped by the function are true when , This element may exist , As long as one doesn't true, It must not exist .
3. Implementation of the bloom filter
form
Bitmap (bit Array )+ n individual hash function .

principle
When ⼀ Elements plus ⼊ Bitmap time , adopt k individual hash Function maps this element to the of the bitmap k A little bit , And set them as 1; When retrieving , Re pass k individual hash Function operation to detect the of bitmap k Whether all points are 1; If there is any reason not to 1 The point of , So I don't think it exists ; If it's all 1, There may be ( There is an error ).
There are only two states for each slot in the bitmap (0 perhaps 1),⼀ Slots are set to 1 state , But it is not clear how many times it has been set ; I don't know how many str1 Hash mapping and which hash Function mapping ; So don't ⽀ Hold the delete operation .

4. The design of Bloom filter
In practice, we should ⽤ In the process , How the bloom filter makes ⽤? How many hash function , Bitmap of how much space to allocate , save
How many elements are stored ? In addition, how to control the false positive rate ( The bloan filter can define ⼀ Must not exist , Not clear ⼀ There must be , Then there is
Your judgment is wrong , False positive rate is the probability of wrong judgment )?
We usually use the following four parameters to solve the above problem :
- n – The number of elements in the bloom filter , Pictured above Only str1 and str2 Two elements that n=2.
- p – False positive rate , stay 0-1 Between 0.000000.
- m – Space occupied by bitmap .
- k – hash Number of functions .
The formula is as follows :
n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))))
k = round((m / n) * log(2))
The following two parameters m and k From the above two parameters n and p To calculate the , You can calculate by yourself , You can also enter the first two values on the relevant website and calculate the last two values :

k individual hash function
Among the above calculation results ,k=23 , In practice, , We won't really choose 23 individual hash function , It's a double hash To simulate 23 individual hash function :
// Mining ⽤⼀ individual hash function , to hash Pass on different species ⼦ Offset value
// #define MIX_UINT64(v) ((uint32_t)((v>>32)^(v)))
uint64_t hash1 = MurmurHash2_x64(key, len, Seed);
uint64_t hash2 = MurmurHash2_x64(key, len, MIX_UINT64(hash1));
for (i = 0; i < k; i++) // k yes hash Number of functions
{
Pos[i] = (hash1 + i*hash2) % m; // m It's a bitmap ⼤⼩
}
边栏推荐
- Go自定义排序
- Chapter 4 of getting started with MySQL: data types stored in data tables
- Use of cocospods
- Chapter 3 of getting started with MySQL: database creation and operation
- 论文的英文文献在哪找(除了知网)?
- 请问大家在什么网站上能查到英文文献?
- Which websites can I search for references when writing a thesis?
- 1380. Lucky numbers in the matrix
- 论文的设计方案咋写?
- How much do you know about synchronized?
猜你喜欢

How do educators find foreign language references?

秒杀系统设计

Open Source | Wenxin Big Model Ernie Tiny Lightweight Technology, Accurate and Fast, full Open Effect

Angled detection frame | calibrated depth feature for target detection (with implementation source code)

教育学大佬是怎么找外文参考文献的?

来自数砖大佬的 130页 PPT 深入介绍 Apache Spark 3.2 & 3.3 新功能

What are the recommended thesis translation software?

MySQL Foundation
![MATLAB signal processing [Q & a notes-1]](/img/53/ae081820fe81ce28e1f04914678a6f.png)
MATLAB signal processing [Q & a notes-1]

95 pages of smart education solutions 2022
随机推荐
AcWing_ 188. Warrior cattle_ bfs
Digital collection trading website domestic digital collection trading platform
论文的设计方案咋写?
Thinkadmin V6 arbitrary file read vulnerability (cve-2020-25540)
Open source | Wenxin big model Ernie tiny lightweight technology, which is accurate and fast, and the effect is fully open
JS interviewer wants to know how much you understand call, apply, bind no regrets series
Bigder:32/100 测试发现的bug开发认为不是bug怎么处理
Use of cocospods
Flexible combination of applications is a false proposition that has existed for 40 years
[array] binary search
Interface difference test - diffy tool
返回二叉树中最大的二叉搜索子树的根节点
JSON data transfer parameters
MFC 获取当前时间
Digital twin smart factory develops digital twin factory solutions
Mapper agent development
collections. What is the purpose of chainmap- What is the purpose of collections. ChainMap?
Digital twin visualization solution digital twin visualization 3D platform
Master the development of facial expression recognition based on deep learning (based on paddlepaddle)
顶级 DevOps 工具链大盘点