当前位置：网站首页>Bloom filter

Bloom filter

2022-06-30 21:36:00 【atwdy】

The bloon filter , It can be simply understood as Bloom filter = One digit group + A series of hash functions , The number of bits in the array can only be 0 or 1, Each value takes only one bit Array of , All values in the initial time digit group are set to 0.

Bloom filter is suitable for judging whether an element exists in a scene with a large amount of data , The basis of judgment is ： Use each... Separately for this element hash function , According to the calculation hash The value of the corresponding position in the value check digit group is 0 still 1, If all hash The values of the value positions are 1, It means that this element may exist ; If there is any hash Value position is 0, It means that this element must not exist .

The reason for this judgment is , The bloom filter saves data , It is to put the elements to be saved into each hash Function calculation , And according to each hash The function calculates hash Value sets the value of the corresponding position of the bit array to 1. Because of this , So when querying, if this element exists , Then the value of each position must be 1, because hash A function can only have one input corresponding to one output . In turn, , Even if the value of each position is 1 Nor does it mean that this element must exist , because hash One output of a function can correspond to multiple inputs .

You can also see from the data saving process , Bloom filter does not save data itself when saving data , Only the data is marked as existing in the bit array .

Because the bloom filter is based on a fixed... When saving and querying data hash Function calculation lookup , So the time complexity is of constant order , In terms of spatial efficiency, the number of bits is the same , And every position in the bit array is well reused , So the space efficiency is also very high .

Bloom filters are difficult to delete , At the logical level, the premise of deleting an element is that the element exists , But the bloom filter does not guarantee the existence of this element . On the technical level, it can be seen from some materials that bit arrays can be changed into integer arrays , When data is saved and deleted, it is realized by cumulative counting , However, if the data volume is too large, the loop count may occur , Loop count means that when the count value is accumulated to the maximum value that can be represented by the computer, it will increase again from 0 Start , This is related to the computer's low-level numerical operation .

Therefore, summarize the characteristics of Bloom filter ：
1. Inaccurate prediction . It is determined that if you are not there, you must not be there , It is judged that the probability is , Or maybe not .
2. Do not save the data itself .
3. Efficient data storage and query , But it is difficult to delete .

Demo

java The digit group is provided in BitSet This class ,python There is no concept of array in , The following is an implementation of a list analog array python Version of the bloom filter , You can judge a given url Does it already exist .

class BloomFilterDemo():
    __MAX_SIZE = 2 << 20
    __bits = []
    __seeds = [3, 5, 7, 11, 13, 31, 37, 61]
    __functions = []

    def __init__(self):
        self.__bits = [0 for i in range(self.__MAX_SIZE)]
        self.__functions = [HashFunction(self.__MAX_SIZE, self.__seeds[j]) for j in range(len(self.__seeds))]

    def add(self, url):
        if url is not None and url != "":
            for fun in self.__functions:
                self.__bits[fun.hash_code(url)] = 1

    def existed(self, url):
        if url is not None and url != "":
            result = True
            for fun in self.__functions:
                result = result & (True if self.__bits[fun.hash_code(url)] == 1 else False)
            return result


class HashFunction():
    __size = 0
    __seed = 0

    def __init__(self, size, seed):
        self.__size = size
        self.__seed = seed

    def hash_code(self, value):
        result = 0
        for i in range(len(value)):
            result = self.__seed * result + ord(value[i])
        return self.__size - 1 & result


if __name__ == '__main__':
    filter = BloomFilterDemo()
    print(filter.existed("www.baidu.com"))
    filter.add("www.baidu.com")
    print(filter.existed("www.baidu.com"))

Insert picture description here

原网站

版权声明
本文为[atwdy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206302121122835.html

当前位置：网站首页>Bloom filter

Bloom filter

Demo

边栏推荐

猜你喜欢

随机推荐