当前位置:网站首页>Bloom filter
Bloom filter
2022-06-30 21:36:00 【atwdy】
The bloon filter , It can be simply understood as Bloom filter = One digit group + A series of hash functions , The number of bits in the array can only be 0 or 1, Each value takes only one bit Array of , All values in the initial time digit group are set to 0.
Bloom filter is suitable for judging whether an element exists in a scene with a large amount of data , The basis of judgment is : Use each... Separately for this element hash function , According to the calculation hash The value of the corresponding position in the value check digit group is 0 still 1, If all hash The values of the value positions are 1, It means that this element may exist ; If there is any hash Value position is 0, It means that this element must not exist .
The reason for this judgment is , The bloom filter saves data , It is to put the elements to be saved into each hash Function calculation , And according to each hash The function calculates hash Value sets the value of the corresponding position of the bit array to 1. Because of this , So when querying, if this element exists , Then the value of each position must be 1, because hash A function can only have one input corresponding to one output . In turn, , Even if the value of each position is 1 Nor does it mean that this element must exist , because hash One output of a function can correspond to multiple inputs .
You can also see from the data saving process , Bloom filter does not save data itself when saving data , Only the data is marked as existing in the bit array .
Because the bloom filter is based on a fixed... When saving and querying data hash Function calculation lookup , So the time complexity is of constant order , In terms of spatial efficiency, the number of bits is the same , And every position in the bit array is well reused , So the space efficiency is also very high .
Bloom filters are difficult to delete , At the logical level, the premise of deleting an element is that the element exists , But the bloom filter does not guarantee the existence of this element . On the technical level, it can be seen from some materials that bit arrays can be changed into integer arrays , When data is saved and deleted, it is realized by cumulative counting , However, if the data volume is too large, the loop count may occur , Loop count means that when the count value is accumulated to the maximum value that can be represented by the computer, it will increase again from 0 Start , This is related to the computer's low-level numerical operation .
Therefore, summarize the characteristics of Bloom filter :
1. Inaccurate prediction . It is determined that if you are not there, you must not be there , It is judged that the probability is , Or maybe not .
2. Do not save the data itself .
3. Efficient data storage and query , But it is difficult to delete .
Demo
java The digit group is provided in BitSet This class ,python There is no concept of array in , The following is an implementation of a list analog array python Version of the bloom filter , You can judge a given url Does it already exist .
class BloomFilterDemo():
__MAX_SIZE = 2 << 20
__bits = []
__seeds = [3, 5, 7, 11, 13, 31, 37, 61]
__functions = []
def __init__(self):
self.__bits = [0 for i in range(self.__MAX_SIZE)]
self.__functions = [HashFunction(self.__MAX_SIZE, self.__seeds[j]) for j in range(len(self.__seeds))]
def add(self, url):
if url is not None and url != "":
for fun in self.__functions:
self.__bits[fun.hash_code(url)] = 1
def existed(self, url):
if url is not None and url != "":
result = True
for fun in self.__functions:
result = result & (True if self.__bits[fun.hash_code(url)] == 1 else False)
return result
class HashFunction():
__size = 0
__seed = 0
def __init__(self, size, seed):
self.__size = size
self.__seed = seed
def hash_code(self, value):
result = 0
for i in range(len(value)):
result = self.__seed * result + ord(value[i])
return self.__size - 1 & result
if __name__ == '__main__':
filter = BloomFilterDemo()
print(filter.existed("www.baidu.com"))
filter.add("www.baidu.com")
print(filter.existed("www.baidu.com"))

边栏推荐
- What happens when word encounters an error while trying to open a file?
- Two skylines
- MySQL batch update
- Encryption and decryption and the application of OpenSSL
- ceshi deces
- 1-18 创建最基本的express服务器&创建路由的API模块
- Ten security measures against unauthorized access attacks
- 【回溯】全排列 leetcode46
- 激发新动能 多地发力数字经济
- It is urgent for enterprises to protect API security
猜你喜欢

Prediction and regression of stacking integrated model
![[untitled]](/img/42/47a8c8faaed33a1d9e864cb2ef7b72.png)
[untitled]

How to move forward when facing confusion in scientific research? How to give full play to women's advantages in scientific research?

qsort函数和模拟实现qsort函数

《ClickHouse原理解析与应用实践》读书笔记(3)

Oracle 数据库表结构 Excel 导出

PyTorch量化实践(1)

多态在代码中的体现

Jupyterbook clear console output

What about degradation of text generation model? Simctg tells you the answer
随机推荐
Auto-created primary key used when not defining a primary key
Multi table operation - foreign key constraint
1-21 JSONP接口
Adobe Photoshop (PS) - script development - remove file bloated script
开源实习经验分享:openEuler软件包加固测试
USBCAN分析仪的配套CAN和CANFD综合测试软件LKMaster软件解决工程师CAN总线测试难题
布隆过滤器
Deployment and use of Nacos
jenkins下载插件下载不了,解决办法
将el-table原样导出为excel表格
1-11 创建线上文件服务
申请Vector 总线协议彩图壁纸挂画,非常棒哦!
1-3 使用SQL管理数据库
1-7 Path路径模块
twelve thousand three hundred and forty-five
Metauniverse may become a new direction of Internet development
笔记【JUC包以及Future介绍】
Fletter nested hell? No, constraintlayout to save!
AKK菌——下一代有益菌
本地浏览器打开远程服务器上的Jupyter Notebook/Lab以及常见问题&设置