当前位置:网站首页>Bloom filter
Bloom filter
2022-06-30 21:36:00 【atwdy】
The bloon filter , It can be simply understood as Bloom filter = One digit group + A series of hash functions , The number of bits in the array can only be 0 or 1, Each value takes only one bit Array of , All values in the initial time digit group are set to 0.
Bloom filter is suitable for judging whether an element exists in a scene with a large amount of data , The basis of judgment is : Use each... Separately for this element hash function , According to the calculation hash The value of the corresponding position in the value check digit group is 0 still 1, If all hash The values of the value positions are 1, It means that this element may exist ; If there is any hash Value position is 0, It means that this element must not exist .
The reason for this judgment is , The bloom filter saves data , It is to put the elements to be saved into each hash Function calculation , And according to each hash The function calculates hash Value sets the value of the corresponding position of the bit array to 1. Because of this , So when querying, if this element exists , Then the value of each position must be 1, because hash A function can only have one input corresponding to one output . In turn, , Even if the value of each position is 1 Nor does it mean that this element must exist , because hash One output of a function can correspond to multiple inputs .
You can also see from the data saving process , Bloom filter does not save data itself when saving data , Only the data is marked as existing in the bit array .
Because the bloom filter is based on a fixed... When saving and querying data hash Function calculation lookup , So the time complexity is of constant order , In terms of spatial efficiency, the number of bits is the same , And every position in the bit array is well reused , So the space efficiency is also very high .
Bloom filters are difficult to delete , At the logical level, the premise of deleting an element is that the element exists , But the bloom filter does not guarantee the existence of this element . On the technical level, it can be seen from some materials that bit arrays can be changed into integer arrays , When data is saved and deleted, it is realized by cumulative counting , However, if the data volume is too large, the loop count may occur , Loop count means that when the count value is accumulated to the maximum value that can be represented by the computer, it will increase again from 0 Start , This is related to the computer's low-level numerical operation .
Therefore, summarize the characteristics of Bloom filter :
1. Inaccurate prediction . It is determined that if you are not there, you must not be there , It is judged that the probability is , Or maybe not .
2. Do not save the data itself .
3. Efficient data storage and query , But it is difficult to delete .
Demo
java The digit group is provided in BitSet This class ,python There is no concept of array in , The following is an implementation of a list analog array python Version of the bloom filter , You can judge a given url Does it already exist .
class BloomFilterDemo():
__MAX_SIZE = 2 << 20
__bits = []
__seeds = [3, 5, 7, 11, 13, 31, 37, 61]
__functions = []
def __init__(self):
self.__bits = [0 for i in range(self.__MAX_SIZE)]
self.__functions = [HashFunction(self.__MAX_SIZE, self.__seeds[j]) for j in range(len(self.__seeds))]
def add(self, url):
if url is not None and url != "":
for fun in self.__functions:
self.__bits[fun.hash_code(url)] = 1
def existed(self, url):
if url is not None and url != "":
result = True
for fun in self.__functions:
result = result & (True if self.__bits[fun.hash_code(url)] == 1 else False)
return result
class HashFunction():
__size = 0
__seed = 0
def __init__(self, size, seed):
self.__size = size
self.__seed = seed
def hash_code(self, value):
result = 0
for i in range(len(value)):
result = self.__seed * result + ord(value[i])
return self.__size - 1 & result
if __name__ == '__main__':
filter = BloomFilterDemo()
print(filter.existed("www.baidu.com"))
filter.add("www.baidu.com")
print(filter.existed("www.baidu.com"))

边栏推荐
- Understanding polymorphism
- Radar data processing technology
- Adobe Photoshop (PS) - script development - remove file bloated script
- 1-18 创建最基本的express服务器&创建路由的API模块
- vim 常用快捷键
- 【回溯】全排列 leetcode46
- Understand what MySQL index push down (ICP) is in one article
- Ssh server configuration file parameter permitrootlogin introduction
- Can flinksql two Kafka streams join?
- The 16th Heilongjiang Provincial Collegiate Programming Contest
猜你喜欢

pytorch geometric torch-scatter和torch-sparse安装报错问题解决

《ClickHouse原理解析与应用实践》读书笔记(2)

Deployment and use of Nacos

asp. Net core JWT delivery

Open source internship experience sharing: openeuler software package reinforcement test

qsort函数和模拟实现qsort函数

Oracle 数据库表结构 Excel 导出

jupyter notebook/lab 切换conda环境

Dm8: generate DM AWR Report

申请Vector 总线协议彩图壁纸挂画,非常棒哦!
随机推荐
【无标题】
1-10 respond to client content according to different URLs
Double solid histogram / double y-axis
Upgrade Kube with unknown flag: --network plugin
1-15 nodemon
FreeRTOS record (IX. an example of a bare metal project transferring to FreeRTOS)
Clickhouse native monitoring item, system table description
pytorch geometric torch-scatter和torch-sparse安装报错问题解决
《ClickHouse原理解析与应用实践》读书笔记(2)
开发属于自己的包
1-11 创建线上文件服务
1-7 path module
clickhouse原生監控項,系統錶描述
网络营销之四大误解
1-14 express managed static resources
qsort函数和模拟实现qsort函数
[grade evaluator] how to register a grade evaluator? How many passes?
asp. Net core JWT delivery
布隆过滤器
What about degradation of text generation model? Simctg tells you the answer