当前位置:网站首页>Bloom filter
Bloom filter
2022-07-26 09:16:00 【Or turn around】
Prior to Distributed cache In the article , As mentioned, one solution for cache penetration is the bloan filter . This article will give a detailed introduction to bloom filter .
Principle of bloon filter
The bloon filter (BloomFilter) It is actually a data structure , It includes a long binary vector and a series of random mapping functions . It is mainly used to retrieve whether an element is in a collection .
The advantage is that both space efficiency and query time efficiency are very high , The disadvantage is that there is a certain misjudgment rate , And element deletion is not supported .
Inside the bron filter , A single digit group is maintained bitArray. At the beginning, all bits are 0.
When inserting an element , Calculate the hash value of this element through multiple hash functions , At the index of the array corresponding to the hash value , take 0 Set as 1.
Here is an example to illustrate . The initial array is as follows :
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|---|
Insert the first element a, Suppose three hash functions are used to calculate , The hash values obtained are 1,2,3, Then the array ( from 1 Start ) On the corresponding position 0 Set up 1, Get the new array as follows :
| 1 | 1 | 1 | 0 | 0 | 0 | 0 |
|---|
Then insert the second element b, Suppose three hash functions are used to calculate , The hash values obtained are 4,5,6, Then... At the corresponding position of the array 0 Set up 1, Get the new array as follows :
| 1 | 1 | 1 | 1 | 1 | 1 | 0 |
|---|
Now judge the element a Is it in the filter , Then view the array 1,2,3 The number in position , All for 1 explain a In the filter .
To judge elements c Is it in the bloom filter , After three hash functions, the hash value is obtained 1,5,7. Check the array 1,5,7 Is there 0, Location of discovery 7 The number is 0, So it was decided c be not in .
So if the element c The hash value calculated by three hash functions is 1,5,6 Well ? At this point, the result is the element c In this filter , This is miscalculation .
Anyone who knows the hash table knows ,c The reason why elements are misjudged is calculation c Hash collision occurred when the hash value of , And the three hash values collided .
A conclusion can be drawn from the above example : The bloom filter can accurately determine that an element does not exist , But you can't judge the existence of an element 100% .
When more and more elements are added to the bloom filter , Occupied bit More and more , At this time, the probability of hash collision is higher , The probability of miscarriage of justice increases .
When all bit All bits are set to 1 when , At this time, the bloom filter loses its filtering function , At this point, any element will be judged to exist . therefore , It is very important to choose a suitable filter length .
It is also easy to understand that element deletion is not supported , Because the hash value of the element may have hash collision ( Only when all hash values collide will it be misjudged ).
Such as element c The hash value of 1,5,7. among ,1 And element a Hash value collision ,5 And element b Hash value collision ,7 No collision .
If you delete an element c, Also is to 1,5,7 Set as 0, Obviously, at this time, the element a And elements b Your judgment will be wrong .
accuracy
To improve the accuracy of Bloom filter , There are two main points :
- Multiple hash functions , Reduce hash The probability of collision .
- Expand the array range , send hash Values are evenly distributed , Reduce hash The probability of collision .
Although Bloom filters have been reduced as much as possible hash The probability of collision , But it can't be completely eliminated . About the array size of Bloom filter and the corresponding hash The choice of the number of functions , The specific derivation process is not detailed here , Give a direct conclusion :
(1) set up m Is the size of the array ,n Is the number of elements , For fixed m/n, The best number of hash functions is :k = (m/n)ln2( The more the number, the more the bloom filter bit The position is 1 The faster , And the lower the efficiency of the bloom filter ).
(2) The misjudgment rate and the values of each parameter are as follows :

stay redis Application in
Redis stay 4.0 In later versions, the bloom filter plug-in is provided , Its functions can be used after installation .
stay https://github.com/RedisBloom/RedisBloom Download the latest release Source code , Decompress and compile in the compilation server to get the dynamic library :redisbloom.so
Copy the dynamic library to redis Under the table of contents . modify redis.conf:
################################## MODULES #####################################
# Load modules at startup. If the server is not able to load modules
# it will abort. It is possible to use multiple loadmodule directives.
loadmodule /usr/local/redis-5.0.5/redisbloom.so
# loadmodule /path/to/my_module.so
# loadmodule /path/to/other_module.so
Load... In the configuration file redisbloom This module.
stay redis Specify the configuration file at startup :nohup ./redis-server /usr/local/redis-5.0.5/redis.conf &
Of course , In production environment redis You can't just restart , You can hot load directly through commands redis plug-in unit , As shown in the figure below :
127.0.0.1:6379> module list
(empty list or set)
127.0.0.1:6379> module load /usr/local/redis-5.0.5/redisbloom.so
OK
127.0.0.1:6379> module list
1) 1) "name"
2) "bf"
3) "ver"
4) (integer) 999999
The module is called bf, The module version number is 999999.
The operation command of the bron filter is as follows :
- bf.add: Additive elements
- bf.exists: Query whether the element exists
- bf.madd: Add multiple elements
- bf.mexists: Query whether multiple elements exist
Add elements and determine whether they exist :
127.0.0.1:6379> bf.add users 2
(integer) 1
127.0.0.1:6379> bf.exists users 1
(integer) 0
127.0.0.1:6379> bf.exists users 2
(integer) 1
Detailed command :
Java Bloom filter in
stay google Of guava The toolkit implements BloomFilter:
import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class GuavaBloomFilter {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), 100000, 0.01);
bloomFilter.put("10086");
System.out.println(bloomFilter.mightContain("123456"));
System.out.println(bloomFilter.mightContain("10086"));
}
}
Reference material
[1]. https://www.zhihu.com/question/38573286
[2]. https://www.jianshu.com/p/f04edbded1b9
边栏推荐
- 谷粒学院的全部学习源码
- Mutual transformation of array structure and tree structure
- Original root and NTT 5000 word explanation
- 【final关键字的使用】
- Use of off heap memory
- The idea shortcut key ALT realizes the whole column operation
- Error: Cannot find module ‘umi‘ 问题处理
- Thread Join 和Object wait 的区别
- Laravel框架日志文件存放在哪里?怎么用?
- 【ARKit、RealityKit】把图片转为3D模型
猜你喜欢
随机推荐
Go intelligent robot alpha dog, alpha dog robot go
Hbuilderx runs the wechat developer tool "fail to open ide" to solve the error
838. 堆排序
2022年上海市安全员C证考试试题及模拟考试
Error: cannot find module 'UMI' problem handling
十大蓝筹NFT近半年数据横向对比
volatile 靠的是MESI协议解决可见性问题?(下)
(2006,Mysql Server has gone away)问题处理
服务器内存故障预测居然可以这样做!
Announcement | FISCO bcos v3.0-rc4 is released, and the new Max version can support massive transactions on the chain
Horizontal comparison of the data of the top ten blue chip NFTs in the past half year
PHP page value transfer
Babbitt | metauniverse daily must read: does the future of metauniverse belong to large technology companies or to the decentralized Web3 world
839. 模拟堆
Server memory failure prediction can actually do this!
Simple message mechanism of unity
Pytoch realizes logistic regression
Polynomial open root
NFT与数字藏品到底有何区别?
Introduction to excellent verilog/fpga open source project (30) - brute force MD5









