当前位置:网站首页>Bloom filter
Bloom filter
2022-06-30 21:36:00 【atwdy】
The bloon filter , It can be simply understood as Bloom filter = One digit group + A series of hash functions , The number of bits in the array can only be 0 or 1, Each value takes only one bit Array of , All values in the initial time digit group are set to 0.
Bloom filter is suitable for judging whether an element exists in a scene with a large amount of data , The basis of judgment is : Use each... Separately for this element hash function , According to the calculation hash The value of the corresponding position in the value check digit group is 0 still 1, If all hash The values of the value positions are 1, It means that this element may exist ; If there is any hash Value position is 0, It means that this element must not exist .
The reason for this judgment is , The bloom filter saves data , It is to put the elements to be saved into each hash Function calculation , And according to each hash The function calculates hash Value sets the value of the corresponding position of the bit array to 1. Because of this , So when querying, if this element exists , Then the value of each position must be 1, because hash A function can only have one input corresponding to one output . In turn, , Even if the value of each position is 1 Nor does it mean that this element must exist , because hash One output of a function can correspond to multiple inputs .
You can also see from the data saving process , Bloom filter does not save data itself when saving data , Only the data is marked as existing in the bit array .
Because the bloom filter is based on a fixed... When saving and querying data hash Function calculation lookup , So the time complexity is of constant order , In terms of spatial efficiency, the number of bits is the same , And every position in the bit array is well reused , So the space efficiency is also very high .
Bloom filters are difficult to delete , At the logical level, the premise of deleting an element is that the element exists , But the bloom filter does not guarantee the existence of this element . On the technical level, it can be seen from some materials that bit arrays can be changed into integer arrays , When data is saved and deleted, it is realized by cumulative counting , However, if the data volume is too large, the loop count may occur , Loop count means that when the count value is accumulated to the maximum value that can be represented by the computer, it will increase again from 0 Start , This is related to the computer's low-level numerical operation .
Therefore, summarize the characteristics of Bloom filter :
1. Inaccurate prediction . It is determined that if you are not there, you must not be there , It is judged that the probability is , Or maybe not .
2. Do not save the data itself .
3. Efficient data storage and query , But it is difficult to delete .
Demo
java The digit group is provided in BitSet This class ,python There is no concept of array in , The following is an implementation of a list analog array python Version of the bloom filter , You can judge a given url Does it already exist .
class BloomFilterDemo():
__MAX_SIZE = 2 << 20
__bits = []
__seeds = [3, 5, 7, 11, 13, 31, 37, 61]
__functions = []
def __init__(self):
self.__bits = [0 for i in range(self.__MAX_SIZE)]
self.__functions = [HashFunction(self.__MAX_SIZE, self.__seeds[j]) for j in range(len(self.__seeds))]
def add(self, url):
if url is not None and url != "":
for fun in self.__functions:
self.__bits[fun.hash_code(url)] = 1
def existed(self, url):
if url is not None and url != "":
result = True
for fun in self.__functions:
result = result & (True if self.__bits[fun.hash_code(url)] == 1 else False)
return result
class HashFunction():
__size = 0
__seed = 0
def __init__(self, size, seed):
self.__size = size
self.__seed = seed
def hash_code(self, value):
result = 0
for i in range(len(value)):
result = self.__seed * result + ord(value[i])
return self.__size - 1 & result
if __name__ == '__main__':
filter = BloomFilterDemo()
print(filter.existed("www.baidu.com"))
filter.add("www.baidu.com")
print(filter.existed("www.baidu.com"))

边栏推荐
- sdfsdf
- Sqlserver gets the data of numbers, Chinese and characters in the string
- CA I ah, several times Oh, ah, a sentence IU home Oh
- Electronic scheme development - Intelligent rope skipping scheme
- Coefficient of variation method matlab code [easy to understand]
- 开发属于自己的包
- 激发新动能 多地发力数字经济
- 的撒啊苏丹看老司机
- NCAT detailed introduction (Reprint)
- Open source internship experience sharing: openeuler software package reinforcement test
猜你喜欢
随机推荐
Nacos部署及使用
Dm8: generate DM AWR Report
ArcGIS构建发布简单路网Network数据服务及Rest调用测试
1-3 使用SQL管理数据库
1-20 预检请求
What about degradation of text generation model? Simctg tells you the answer
Double solid histogram / double y-axis
twelve thousand three hundred and forty-five
1-15 nodemon
【回溯】全排列 II leetcode47
Encryption and decryption and the application of OpenSSL
How to move forward when facing confusion in scientific research? How to give full play to women's advantages in scientific research?
1-10 respond to client content according to different URLs
Jupyterbook clear console output
1-19 利用CORS解决接口跨域问题
兴奋神经递质——谷氨酸与大脑健康
Phoenix architecture: an architect's perspective
升级kube出现unknown flag: --network-plugin
. NETCORE redis geo type
你我他是谁







