当前位置:网站首页>Bloom filter
Bloom filter
2022-06-25 19:09:00 【Daiyuanpei】
One 、 Introduce
The bloom filter is a binary array , Its function is to judge whether a number exists in this array ,0 Does not exist ,1 Indicates presence , As shown in the figure below :

Two 、 add to
If you want to deposit “ Hello ” This string , First of all, it will go through n A hash function calculates , The accountant calculates different hash values , Then map these hash values into the array , The binary number of the corresponding position is modified to 1, As shown in the figure below :

Subscript to be 3、5、7 The position of was changed to 1
3、 ... and 、 Inquire about
For example, query “ Hello ” This string , First, through n A hash function calculates that the corresponding array position is 3、5、7, The binary number of these three positions must be 1 To express “ Hello ” This string does not exist , As long as one of these three positions is not 1, Then this string does not exist .
Four 、 Delete
The bloon filter has a false deletion in the deletion operation .
For example, now there are two strings “ Hello ” and “hello”, If you go through a series of hash operations , Finally, it is concluded that the position of both in the array is subscript 2 The location of , Then this subscript position represents two data at the same time ,“ Hello ” and “hello”, As shown in the figure below :

When you want to delete “ Hello ” When this string , Will subscript 2 The position of the binary number is changed to 0, Said to delete , But it will also delete “hello” This string , Therefore, the bloom filter has the phenomenon of false deletion .
5、 ... and 、 Advantages and disadvantages
1、 advantage
1、 The bloom filter consists of binary arrays , So it takes up very little space
2、 Query and insert operations are fast , The position calculated according to the hash value can be found in the array
The corresponding time complexity is O(n),n Depends on the number of hash functions , If there is only one hash function , So time complexity is O(1)
3、 Confidentiality is good. , All stored are binary data , Don't store the data itself
2、 shortcoming
1、 There is a false deletion during the deletion operation
2、 Due to different data, the calculated hash value may be the same , So there are misjudgments
hypothesis “hello” and “ Hello ” The hash value calculated from the two data is the same , But at this time, only... Exists in the bloom filter “ Hello ” This data , When you want to query “hello” when , When the calculated hash value looks for a position in the array, it is found that the position is 1, The bloom filter will tell “hello” This data exists , But it doesn't really exist .
6、 ... and 、 Miscalculation rate
When using Bloom filter, you can set a parameter error rate :
1、 The lower the misjudgment rate , The less likely it is to misjudge , But the more hash functions , The more space it takes up
2、 The greater the misjudgment rate , The more likely it is to misjudge , But the fewer hash functions , The smaller the occupied space
Why use more than one hash function , And to use multiple hash functions ?
1、 If there is only one hash function , Then the hash values calculated from different data have a great probability of being the same , It is easy to misjudge
2、 If you use multiple hash functions , Each hash function uses a different hash algorithm , In this case , The calculated subscript values are not easy to be the same
So the more hash functions , The more hash values are calculated , The lower the error rate , But the larger the corresponding space
7、 ... and 、 solve Redis Cache penetration problem
1、 Cache penetration refers to , The requested data does not exist in Redis in , Causes the request to reach the database directly , The database cannot carry too much traffic, resulting in downtime .
2、 The bloom filter can be used to store all the data that can be accessed key, There is no the key Directly filtered , There is key Then further query the cache and database
When the request arrives, first look for Redis The bloom filter , If you don't hit , Go straight back to ( If the data is not in the bloom filter, the probability is not in the database ), In this way, a large number of invalid requests will not reach the database
If it hits , Just to go Redis And the database
边栏推荐
- 揭秘GES超大规模图计算引擎HyG:图切分
- User management and permissions
- PHP数据库连接version1.1
- Favorite PHP debugging methods
- Jump jump games auxiliary (manual version) py code implementation
- 削足适履 - 谈谈赛道上的坡道改造
- Is CICC wealth safe? How long does it take to open an account
- 网络安全检测与防范 练习题(三)
- LeetCode-101-对称二叉树
- [C language practice - print the upper triangle and its deformation (with blank version)]
猜你喜欢

Ruffian Heng embedded semimonthly issue 57

一晚上做了一个xpath终结者:xpath-helper-plus

Cutting feet to fit shoes - talking about the ramp reconstruction on the track

What are Baidu collection skills? 2022 Baidu article collection skills

Leetcode-78-subset

Elastic high-performance computing on the cloud supports the rapid development of the life science industry, reducing costs and increasing efficiency

Detailed explanation of oauth2 - Introduction (I)

Can GoogleSEO only do content without external chain? (e6zzseo)

ECS 7-day practical training camp (Advanced route) -- day01 -- setting up FTP service based on ECS

JVM | runtime data area (heap space)
随机推荐
mysql my. Understanding CNF depends on configuration
R语言plotly可视化:plotly可视化二维直方图等高线图(Basic 2D Histogram Contour)
Training of long and difficult sentences in postgraduate entrance examination day92
Analysis on development status and development suggestions of e-commerce industry in Xinjiang in 2020 [figure]
如何快速关闭8080端口
Training of long and difficult sentences in postgraduate entrance examination day90
Electronic package to generate EXE file
网络安全检测与防范 测试题(一)
Analysis on the development trend of China's intense pulsed light equipment industry in 2021: the market scale is growing, and the proportion of imported brands is large [figure]
LeetCode-78-子集
JVM|运行时数据区(堆空间)
Why are life science enterprises on the cloud in succession?
Elastic high-performance computing on the cloud supports the rapid development of the life science industry, reducing costs and increasing efficiency
Sorting out the latest data mining competition scheme!
揭秘GES超大规模图计算引擎HyG:图切分
TCP/IP 测试题(三)
How to quickly close port 8080
LeetCode-101-对称二叉树
GenICam GenTL 标准 ver1.5(1)
【历史上的今天】6 月 25 日:笔记本之父诞生;Windows 98 发布;通用产品代码首次商用