当前位置:网站首页>Understand bloomfilter in one article
Understand bloomfilter in one article
2022-07-04 12:40:00 【Red hat spongebob】
1 What is a bloon filter
The bloon filter (Bloom Filter) It was by bloon (Burton Howard Bloom) stay 1970 Put forward in . It's actually a very long binary vector and a series of random mapping functions , The bloom filter essentially consists of a length of m The bit vector of .
The basic bloom filter supports two operations : test and add to .
purpose : The bloom filter can be used to retrieve whether an element is in a collection .
Its advantage is that the space efficiency and query time are far more than the general algorithm , The disadvantage is that it has certain error recognition rate and deletion difficulty .
( Misidentification : The element doesn't exist but is , There will be no element in and not in , That is, it must be , No, it can't be )
2 Bloon filter and HashSet What's the difference
The bloon filter :
The basic idea : Through one Hash Function maps an element to a Bit array (Bit Array) A point in . thus , We just need to see if this point is 1 We'll know if it's in the set . This is the basic idea of bloon filter .
advantage : Small footprint , Efficient .
shortcoming : There is an error rate , Cannot delete element .
HashSet:
- The basic idea : utilize Hash The function converts the HashCode Take out , Put it in an internal array or linked list , When adding, deleting, modifying and checking elements, they are generally based on the elements HashCode To operate .
- advantage : Easy to use , The principle of simple .
- shortcoming : Large memory consumption , Low performance .
Sum up , Bloom filter is mainly applicable to the de duplication of a large amount of data , and HashSet Such data structures are suitable for the de duplication and inspection of small and medium-sized data .
3 Using the bloon filter
There are many ways to implement Bloom filters , such as :
- Guava Realization
- Redis Realization
- Realize it by yourself
- …
Here we use Google Guava Demonstrate the built-in bloom filter :
rely on :
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>23.0</version>
</dependency>
Code :
public class Test {
/** * Filter default capacity */
private static final int DEFAULT_SIZE = 1000000;
/** * Custom error rate , Default 0.03, Section (0,1) */
private static final double FPP = 0.02;
public static void main(String[] args) {
// Initialize a store string Bloom filters for data , The default error rate is 0.03
BloomFilter<String> bf = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), DEFAULT_SIZE, FPP);
// Used to store all actual key, For the existence of
Set<String> sets = new HashSet<>(DEFAULT_SIZE);
// Used to store all actual key, Used for removal
List<String> lists = new ArrayList<>(DEFAULT_SIZE);
// Insert a random string
for (int i = 0; i < DEFAULT_SIZE; i++) {
String uuid = UUID.randomUUID().toString();
// Add data
bf.put(uuid);
sets.add(uuid);
lists.add(uuid);
}
int rightNum = 0;
int wrongNum = 0;
for (int i = 0; i < 10000; i++) {
// 0-10000 Between , Can be 100 The number of dividers is 100 individual (100 Multiple )
String data = i % 100 == 0 ? lists.get(i / 100) : UUID.randomUUID().toString();
// This is used here. might, It doesn't look very confident , So if the bloom filter determines that it exists , We still have to go. sets Zhongshi hammer
if (bf.mightContain(data)) {
if (sets.contains(data)) {
rightNum++;
continue;
}
wrongNum++;
}
}
BigDecimal percent = new BigDecimal(wrongNum).divide(new BigDecimal(9900), 2, RoundingMode.HALF_UP);
BigDecimal bingo = new BigDecimal(9900 - wrongNum).divide(new BigDecimal(9900), 2, RoundingMode.HALF_UP);
System.out.println(" stay 100W Of the elements , Judge 100 It's an actual element , The bloon filter thinks there is :" + rightNum);
System.out.println(" stay 100W Of the elements , Judge 9900 An element that doesn't actually exist , Mistakenly believed to exist :" + wrongNum + ", shooting :" + bingo + ", Miscalculation rate :" + percent);
}
}
Running results :
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-e97FaPon-1644722500048)( Understand bloom filter (BloomFilter)].assets/image-20220212151410175.png)
principle :

The core principle of Bloom filter is to use digit group , Use elements many times Hash Map to bit , When judging whether it exists , Map the searched elements in the same way , When an element may have multiple bits , When every bit is 1 It is proved that the element exists , On the contrary, it does not exist .
demonstration :
# Storage elements :
abc ---> {2,3,4} All for 1
def ---> {5,6,7} All for 1
# Look for the element :
abc ---> {2,3,4} All for 1 1==1&&1==1&&1==1 --->true There is
aaa ---> {2,3,3} All for 1 1==1&&1==1&&0==1 --->false non-existent
# There is an error :
acf ---> {2,3,6} All for 1 1==1&&1==1&&1==1 --->true There is
4 Handwritten simple bloom filter
public class MyBloomFilter {
/** * Use addition hash Algorithm , So it defines a 3 A prime array of elements */
private static final int[] primes = new int[]{
2, 3, 5};
/** * Use eight different prime numbers , It's equivalent to building 8 Different algorithms */
private Hash[] hashList = new Hash[primes.length];
/** * Create a length of 10 Billion bits */
private BitSet bits = new BitSet(1000000000);
public MyBloomFilter() {
for (int i = 0; i < primes.length; i++) {
// Use 3 A prime number , establish 3 Species algorithm
hashList[i] = new Hash(primes[i]);
}
}
/** * Additive elements * @param value */
public void add(String value) {
for (Hash f : hashList) {
bits.set(f.hash(value), true);
}
}
/** * Determine whether it is in the bloom filter * @param value * @return */
public boolean contains(String value) {
if (value == null) {
return false;
}
boolean ret = true;
for (Hash f : hashList) {
// see 3 Values on bits
ret = ret && bits.get(f.hash(value));
}
return ret;
}
/** * Add hash Algorithm */
public static class Hash {
private int prime;
public Hash(int prime) {
this.prime = prime;
}
public int hash(String key) {
int hash, i;
for (hash = key.length(), i = 0; i < key.length(); i++) {
hash += key.charAt(i);
}
return (hash % prime);
}
}
public static void main(String[] args) {
MyBloomFilter bloom = new MyBloomFilter();
System.out.println(bloom.contains("2222"));
bloom.add("2222");
// maintain 1 Billion online users
for (int i = 1; i < 100000000; i++) {
bloom.add(String.valueOf(i));
}
long begin = System.currentTimeMillis();
System.out.println(begin);
System.out.println(bloom.contains("2222"));
long end = System.currentTimeMillis();
System.out.println(end);
System.out.println(" Judge 2222 Is there any use :" + (end - begin));
}
}
Running results :
Reference article :
https://blog.csdn.net/wuzhiwei549/article/details/106714765
https://www.jasondavies.com/bloomfilter/
https://www.cnblogs.com/liyulong1982/p/6013002.html
https://www.jianshu.com/p/7634eaea3e26
边栏推荐
- [Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 23
- 17.内存分区与分页
- [Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 17
- Practice of retro SOAP Protocol
- Wechat video Number launches "creator traffic package"
- A few words explain redis cache penetration, breakdown, avalanche, and redis sentinel
- [Yu Yue education] 233 pre school children's language education reference questions in the spring of 2019 of the National Open University
- 'using an alias column in the where clause in PostgreSQL' - using an alias column in the where clause in PostgreSQL
- Global and Chinese markets for soluble suture 2022-2028: Research Report on technology, participants, trends, market size and share
- Global and Chinese market of ice water machines 2022-2028: Research Report on technology, participants, trends, market size and share
猜你喜欢

When synchronized encounters this thing, there is a big hole, pay attention!
![[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 11](/img/6a/398d9cceecdd9d7c9c4613d8b5ca27.jpg)
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 11

Fastlane one click package / release app - usage record and stepping on pit

Introduction to the button control elevatedbutton of the fleet tutorial (the tutorial includes the source code)
![Entitas learning [iv] other common knowledge points](/img/1c/f899f4600fef07ce39189e16afc44a.jpg)
Entitas learning [iv] other common knowledge points

Complementary knowledge of auto encoder
![[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 19](/img/7c/f728e88ca36524f92c56213370399b.jpg)
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 19

C语言数组

MySQL advanced review
![[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 8](/img/16/33f5623625ba817e6e022b5cb7ff5d.jpg)
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 8
随机推荐
vim 出现 Another program may be editing the same file. If this is the case 的解决方法
Clion configuration of opencv
Fastlane one click package / release app - usage record and stepping on pit
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 6
[notes] in depth explanation of assets, resources and assetbundles
Kivy教程之 08 倒计时App实现timer调用(教程含源码)
Global and Chinese markets of digital PCR and real-time PCR 2022-2028: Research Report on technology, participants, trends, market size and share
What if the chat record is gone? How to restore wechat chat records on Apple Mobile
MySQL performance optimization index
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 24
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 11
'using an alias column in the where clause in PostgreSQL' - using an alias column in the where clause in PostgreSQL
How to realize the function of Sub Ledger of applet?
Mongodb vs mysql, which is more efficient
Global and Chinese market of ice water machines 2022-2028: Research Report on technology, participants, trends, market size and share
Flet教程之 按钮控件 ElevatedButton入门(教程含源码)
asp. Core is compatible with both JWT authentication and cookies authentication
Global and Chinese markets for soluble suture 2022-2028: Research Report on technology, participants, trends, market size and share
2022, 6G is heating up
C语言数组