当前位置:网站首页>Understand bloomfilter in one article
Understand bloomfilter in one article
2022-07-04 12:40:00 【Red hat spongebob】
1 What is a bloon filter
The bloon filter (Bloom Filter) It was by bloon (Burton Howard Bloom) stay 1970 Put forward in . It's actually a very long binary vector and a series of random mapping functions , The bloom filter essentially consists of a length of m The bit vector of .
The basic bloom filter supports two operations : test and add to .
purpose : The bloom filter can be used to retrieve whether an element is in a collection .
Its advantage is that the space efficiency and query time are far more than the general algorithm , The disadvantage is that it has certain error recognition rate and deletion difficulty .
( Misidentification : The element doesn't exist but is , There will be no element in and not in , That is, it must be , No, it can't be )
2 Bloon filter and HashSet What's the difference
The bloon filter :
The basic idea : Through one Hash Function maps an element to a Bit array (Bit Array) A point in . thus , We just need to see if this point is 1 We'll know if it's in the set . This is the basic idea of bloon filter .
advantage : Small footprint , Efficient .
shortcoming : There is an error rate , Cannot delete element .
HashSet:
- The basic idea : utilize Hash The function converts the HashCode Take out , Put it in an internal array or linked list , When adding, deleting, modifying and checking elements, they are generally based on the elements HashCode To operate .
- advantage : Easy to use , The principle of simple .
- shortcoming : Large memory consumption , Low performance .
Sum up , Bloom filter is mainly applicable to the de duplication of a large amount of data , and HashSet Such data structures are suitable for the de duplication and inspection of small and medium-sized data .
3 Using the bloon filter
There are many ways to implement Bloom filters , such as :
- Guava Realization
- Redis Realization
- Realize it by yourself
- …
Here we use Google Guava Demonstrate the built-in bloom filter :
rely on :
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>23.0</version>
</dependency>
Code :
public class Test {
/** * Filter default capacity */
private static final int DEFAULT_SIZE = 1000000;
/** * Custom error rate , Default 0.03, Section (0,1) */
private static final double FPP = 0.02;
public static void main(String[] args) {
// Initialize a store string Bloom filters for data , The default error rate is 0.03
BloomFilter<String> bf = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), DEFAULT_SIZE, FPP);
// Used to store all actual key, For the existence of
Set<String> sets = new HashSet<>(DEFAULT_SIZE);
// Used to store all actual key, Used for removal
List<String> lists = new ArrayList<>(DEFAULT_SIZE);
// Insert a random string
for (int i = 0; i < DEFAULT_SIZE; i++) {
String uuid = UUID.randomUUID().toString();
// Add data
bf.put(uuid);
sets.add(uuid);
lists.add(uuid);
}
int rightNum = 0;
int wrongNum = 0;
for (int i = 0; i < 10000; i++) {
// 0-10000 Between , Can be 100 The number of dividers is 100 individual (100 Multiple )
String data = i % 100 == 0 ? lists.get(i / 100) : UUID.randomUUID().toString();
// This is used here. might, It doesn't look very confident , So if the bloom filter determines that it exists , We still have to go. sets Zhongshi hammer
if (bf.mightContain(data)) {
if (sets.contains(data)) {
rightNum++;
continue;
}
wrongNum++;
}
}
BigDecimal percent = new BigDecimal(wrongNum).divide(new BigDecimal(9900), 2, RoundingMode.HALF_UP);
BigDecimal bingo = new BigDecimal(9900 - wrongNum).divide(new BigDecimal(9900), 2, RoundingMode.HALF_UP);
System.out.println(" stay 100W Of the elements , Judge 100 It's an actual element , The bloon filter thinks there is :" + rightNum);
System.out.println(" stay 100W Of the elements , Judge 9900 An element that doesn't actually exist , Mistakenly believed to exist :" + wrongNum + ", shooting :" + bingo + ", Miscalculation rate :" + percent);
}
}
Running results :
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-e97FaPon-1644722500048)( Understand bloom filter (BloomFilter)].assets/image-20220212151410175.png)
principle :

The core principle of Bloom filter is to use digit group , Use elements many times Hash Map to bit , When judging whether it exists , Map the searched elements in the same way , When an element may have multiple bits , When every bit is 1 It is proved that the element exists , On the contrary, it does not exist .
demonstration :
# Storage elements :
abc ---> {2,3,4} All for 1
def ---> {5,6,7} All for 1
# Look for the element :
abc ---> {2,3,4} All for 1 1==1&&1==1&&1==1 --->true There is
aaa ---> {2,3,3} All for 1 1==1&&1==1&&0==1 --->false non-existent
# There is an error :
acf ---> {2,3,6} All for 1 1==1&&1==1&&1==1 --->true There is
4 Handwritten simple bloom filter
public class MyBloomFilter {
/** * Use addition hash Algorithm , So it defines a 3 A prime array of elements */
private static final int[] primes = new int[]{
2, 3, 5};
/** * Use eight different prime numbers , It's equivalent to building 8 Different algorithms */
private Hash[] hashList = new Hash[primes.length];
/** * Create a length of 10 Billion bits */
private BitSet bits = new BitSet(1000000000);
public MyBloomFilter() {
for (int i = 0; i < primes.length; i++) {
// Use 3 A prime number , establish 3 Species algorithm
hashList[i] = new Hash(primes[i]);
}
}
/** * Additive elements * @param value */
public void add(String value) {
for (Hash f : hashList) {
bits.set(f.hash(value), true);
}
}
/** * Determine whether it is in the bloom filter * @param value * @return */
public boolean contains(String value) {
if (value == null) {
return false;
}
boolean ret = true;
for (Hash f : hashList) {
// see 3 Values on bits
ret = ret && bits.get(f.hash(value));
}
return ret;
}
/** * Add hash Algorithm */
public static class Hash {
private int prime;
public Hash(int prime) {
this.prime = prime;
}
public int hash(String key) {
int hash, i;
for (hash = key.length(), i = 0; i < key.length(); i++) {
hash += key.charAt(i);
}
return (hash % prime);
}
}
public static void main(String[] args) {
MyBloomFilter bloom = new MyBloomFilter();
System.out.println(bloom.contains("2222"));
bloom.add("2222");
// maintain 1 Billion online users
for (int i = 1; i < 100000000; i++) {
bloom.add(String.valueOf(i));
}
long begin = System.currentTimeMillis();
System.out.println(begin);
System.out.println(bloom.contains("2222"));
long end = System.currentTimeMillis();
System.out.println(end);
System.out.println(" Judge 2222 Is there any use :" + (end - begin));
}
}
Running results :
Reference article :
https://blog.csdn.net/wuzhiwei549/article/details/106714765
https://www.jasondavies.com/bloomfilter/
https://www.cnblogs.com/liyulong1982/p/6013002.html
https://www.jianshu.com/p/7634eaea3e26
边栏推荐
- In 2022, financial products are not guaranteed?
- The detailed installation process of Ninja security penetration system (Ninjitsu OS V3). Both old and new VM versions can be installed through personal testing, with download sources
- [Android reverse] function interception instance (③ refresh CPU cache | ④ process interception function | ⑤ return specific results)
- C language: find the palindrome number whose 100-999 is a multiple of 7
- Global and Chinese markets for environmental disinfection robots 2022-2028: Research Report on technology, participants, trends, market size and share
- C語言函數
- 01. Basics - MySQL overview
- Entity framework calls Max on null on records - Entity Framework calling Max on null on records
- R language -- readr package reads and writes data
- [Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 10
猜你喜欢

DDS-YYDS

Servlet learning notes

Abnormal mode of ARM processor
![[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 20](/img/d5/4bce239b522696b5312b1346336b5f.jpg)
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 20

轻松玩转三子棋

Ml and NLP are still developing rapidly in 2021. Deepmind scientists recently summarized 15 bright research directions in the past year. Come and see which direction is suitable for your new pit
![[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 18](/img/1a/94ef8be5c06c2d1c52fc8ce7f03ea7.jpg)
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 18

16.内存使用与分段

Leetcode day 17

Single spa, Qiankun, Friday access practice
随机推荐
[notes] streamingassets
Possible to restore a backup of SQL Server 2014 on SQL Server 2012?
Talk about "in C language"
昨天的事情想说一下
IIS error, unable to start debugging on the webserver
[Chongqing Guangdong education] National Open University spring 2019 2727 tax basis reference questions
Globalsign's SSL certificate products
Fastlane 一键打包/发布APP - 使用记录及踩坑
World document to picture
轻松玩转三子棋
When synchronized encounters this thing, there is a big hole, pay attention!
Global and Chinese markets for soluble suture 2022-2028: Research Report on technology, participants, trends, market size and share
Translation D29 (with AC code POJ 27:mode of sequence)
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 9
The latest idea activation cracking tutorial, idea permanent activation code, the strongest in history
VBA, JSON interpretation, table structure -json string conversion
Kivy教程之 08 倒计时App实现timer调用(教程含源码)
Star leap plan | new projects are continuously being recruited! MSR Asia MSR Redmond joint research program invites you to apply!
C language function
01. Basics - MySQL overview