当前位置:网站首页>Bloom filter
Bloom filter
2022-06-25 19:09:00 【Daiyuanpei】
One 、 Introduce
The bloom filter is a binary array , Its function is to judge whether a number exists in this array ,0 Does not exist ,1 Indicates presence , As shown in the figure below :

Two 、 add to
If you want to deposit “ Hello ” This string , First of all, it will go through n A hash function calculates , The accountant calculates different hash values , Then map these hash values into the array , The binary number of the corresponding position is modified to 1, As shown in the figure below :

Subscript to be 3、5、7 The position of was changed to 1
3、 ... and 、 Inquire about
For example, query “ Hello ” This string , First, through n A hash function calculates that the corresponding array position is 3、5、7, The binary number of these three positions must be 1 To express “ Hello ” This string does not exist , As long as one of these three positions is not 1, Then this string does not exist .
Four 、 Delete
The bloon filter has a false deletion in the deletion operation .
For example, now there are two strings “ Hello ” and “hello”, If you go through a series of hash operations , Finally, it is concluded that the position of both in the array is subscript 2 The location of , Then this subscript position represents two data at the same time ,“ Hello ” and “hello”, As shown in the figure below :

When you want to delete “ Hello ” When this string , Will subscript 2 The position of the binary number is changed to 0, Said to delete , But it will also delete “hello” This string , Therefore, the bloom filter has the phenomenon of false deletion .
5、 ... and 、 Advantages and disadvantages
1、 advantage
1、 The bloom filter consists of binary arrays , So it takes up very little space
2、 Query and insert operations are fast , The position calculated according to the hash value can be found in the array
The corresponding time complexity is O(n),n Depends on the number of hash functions , If there is only one hash function , So time complexity is O(1)
3、 Confidentiality is good. , All stored are binary data , Don't store the data itself
2、 shortcoming
1、 There is a false deletion during the deletion operation
2、 Due to different data, the calculated hash value may be the same , So there are misjudgments
hypothesis “hello” and “ Hello ” The hash value calculated from the two data is the same , But at this time, only... Exists in the bloom filter “ Hello ” This data , When you want to query “hello” when , When the calculated hash value looks for a position in the array, it is found that the position is 1, The bloom filter will tell “hello” This data exists , But it doesn't really exist .
6、 ... and 、 Miscalculation rate
When using Bloom filter, you can set a parameter error rate :
1、 The lower the misjudgment rate , The less likely it is to misjudge , But the more hash functions , The more space it takes up
2、 The greater the misjudgment rate , The more likely it is to misjudge , But the fewer hash functions , The smaller the occupied space
Why use more than one hash function , And to use multiple hash functions ?
1、 If there is only one hash function , Then the hash values calculated from different data have a great probability of being the same , It is easy to misjudge
2、 If you use multiple hash functions , Each hash function uses a different hash algorithm , In this case , The calculated subscript values are not easy to be the same
So the more hash functions , The more hash values are calculated , The lower the error rate , But the larger the corresponding space
7、 ... and 、 solve Redis Cache penetration problem
1、 Cache penetration refers to , The requested data does not exist in Redis in , Causes the request to reach the database directly , The database cannot carry too much traffic, resulting in downtime .
2、 The bloom filter can be used to store all the data that can be accessed key, There is no the key Directly filtered , There is key Then further query the cache and database
When the request arrives, first look for Redis The bloom filter , If you don't hit , Go straight back to ( If the data is not in the bloom filter, the probability is not in the database ), In this way, a large number of invalid requests will not reach the database
If it hits , Just to go Redis And the database
边栏推荐
- PHP数据库连接version1.1
- The meanings of /32, /48, /64 in IPv6 addresses
- PHP database connection version1.1
- QQ机器人闪照转发/撤回消息转发【最新beta2版本】
- [C language practice - print the upper triangle and its deformation (with blank version)]
- Android Development Notes - Quick Start (from sqllite to room licentiousness) 2
- 2017 reading (word memory)
- Command records of common data types for redis cli operations
- Ali visual AI training camp -day03- construction of electronic photo album (face and expression recognition)
- 削足适履 - 谈谈赛道上的坡道改造
猜你喜欢

两轮市场红海,利尔达芯智行如何乘风破浪?

Kotlin Compose 终结toDo项目 点击可以编辑修改todo

Under what circumstances do you need to manually write the @bean to the container to complete the implementation class

云上弹性高性能计算,支持生命科学产业高速发展、降本增效

158 Bar _ Modèle Power bi utilise Dax + SVG pour créer des diagrammes d'affaires presque toutes les possibilités

Analysis on employment compensation of 2021 college graduates: the average monthly starting salary of doctors, masters, undergraduates and junior colleges is 14823 yuan, 10113 yuan, 5825 yuan and 3910

Divine reversion EA

Huawei released two promotion plans to promote AI talent development and scientific research innovation

Sorting out the latest data mining competition scheme!

广州华锐互动VR全景为各行各业带来发展
随机推荐
Guangzhou Sinovel interactive creates VR Exhibition Hall panoramic online virtual exhibition hall
R语言使用DALEX包的model_profile函数基于条件依赖CDP方法解释多个分类模型中某个连续特征和目标值y的关系(Conditional Dependence Plots)
Guangzhou Sinovel interactive VR panorama brings development to all walks of life
Elastic high-performance computing on the cloud supports the rapid development of the life science industry, reducing costs and increasing efficiency
Favorite PHP debugging methods
Solidity get quarterly time
如何快速关闭8080端口
What should I pay attention to in GoogleSEO content station optimization?
Analysis on the development trend of China's intense pulsed light equipment industry in 2021: the market scale is growing, and the proportion of imported brands is large [figure]
mysql事务讲解
Current situation and trend analysis of China's glass packaging containers in 2021: the revenue of glass packaging containers increases year by year [figure]
158_ Model_ Power Bi uses DAX + SVG to open up almost all possibilities for making business charts
electron 基础项目搭建 &&主线程和渲染线程的通信
The meanings of /32, /48, /64 in IPv6 addresses
【C语言练习——打印上三角及其变形(带空格版)】
Sorting out the latest data mining competition scheme!
JS get data
Huawei released two promotion plans to promote AI talent development and scientific research innovation
【历史上的今天】6 月 25 日:笔记本之父诞生;Windows 98 发布;通用产品代码首次商用
Tcp/ip test questions (II)