当前位置:网站首页>Bloom filter [proposed by bloom in 1970; redis cache penetration solution]
Bloom filter [proposed by bloom in 1970; redis cache penetration solution]
2022-07-03 18:07:00 【popofzk】
What is a bloon filter
The bloon filter (Bloom Filter), yes 1970 year , It was proposed by a young man named bloom , Fifty years ago .
It's actually a very long binary vector and a series of random mapping functions , You should all know , The data stored is not 0 Namely 1, The default is 0.
It is mainly used to judge whether an element is in a set ,0 It means that there is no data ,1 Represents the existence of a data .
Bloom filter uses
solve Redis Cache penetration
give an example : In reptiles , Filter the web address of the crawler , It's already in bloom , No more crawling .
give an example : Spam filtering , Judge whether each email address is on Bloom's blacklist , If it's spam, it's spam .
Principle of bloon filter
Storage process
The bloom filter says , It's a collection of binary data . When a data is added to this set , Experience the baptism of ( There are shortcomings here , Here can speak ):
- adopt K The hash function calculates , return K A calculated hash value
- these K individual hash Values are mapped to the corresponding K Two binary array subscripts
- take K The binary data corresponding to a subscript is changed to 1.
for example , The first hash function returns x, The second and third hash functions return y And z, that : X、Y、Z The corresponding binary is changed to 1.
The query process
The main function of Bloom filter is to query a data , Not in this binary set , The query process is as follows :
adopt K The hash function calculates , Corresponding to the calculated K individual hash value
adopt hash Value to find the corresponding binary array subscript
Judge : If there is a location where the binary data is 0, So the data doesn't exist . If it's all 1, The data exists in the collection .( There are shortcomings here , Here can speak )
Delete process
Generally, you can't delete the data in the bloom filter , This is one of the drawbacks , We're going to analyze that .
Advantages and disadvantages of Bloom filter
advantage
Because it stores binary data , So it takes up very little space
Its insertion and query speed is very fast , The time complexity is O(K), Think about it HashMap The process of
It's very confidential , Because it doesn't store any raw data , Only binary data
shortcoming
This is going back to the shortcomings we mentioned above .
Data is added by calculating data hash value , So it's very likely that this situation exists : Two different data compute the same hash value .
For example “ Hello ” and “hello”, If it turns out that hash Same value , Then they will change the binary data of the same subscript to 1. This is the time , You don't know what the subscript is 2 Binary system , It represents “ Hello ” still “hello”.
The disadvantages are as follows :
There is a miscalculation
If there is no picture above "hello", Only saved " Hello ", Then use "hello" When I came to check , Will judge "hello" In the collection .
because “ Hello ” and “hello” Of hash The values are the same , By the same hash value , The binary data found is the same , All are 1.Delete difficult
Take the example above , because “ Hello ” and “hello” Of hash Same value , The corresponding array subscript is the same .
At this time, I want to delete “ Hello ”, even “hello” All deleted together .(0 It means there's this data ,1 It means that there is no such data )
Implement the bloon filter
There are many ways to do it , One of them is Guava The implementation provided .
- introduce Guava pom To configure
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>29.0-jre</version>
</dependency>
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class BloomFilterCase {
/** * How much data is expected to be inserted */
private static int size = 1000000;
/** * Expected miscalculation rate */
private static double fpp = 0.01;
/** * The bloon filter */
private static BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size, fpp);
public static void main(String[] args) {
// Insert 10 Ten thousand sample data
for (int i = 0; i < size; i++) {
bloomFilter.put(i);
}
// With another 100000 test data , Test error rate
int count = 0;
for (int i = size; i < size + 100000; i++) {
if (bloomFilter.mightContain(i)) {
count++;
System.out.println(i + " misjudged ");
}
}
System.out.println(" The total number of misjudgments :" + count);
}
}
In depth analysis of the code
@VisibleForTesting
static <T> BloomFilter<T> create(
Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) {
....
}
Parameters
- funnel: data type ( It's usually called Funnels In a tool class )
- expectedInsertions: The number of values expected to be inserted
- fpp: Miscalculation rate ( The default value is 0.03)
- strategy: The hash algorithm
adjustment fpp Miscalculation rate
The situation a :fpp = 0.01
- The number of misjudgments :947
- Memory size :9585058 digit
Scenario summary
- The misjudgment rate can be determined by fpp Adjust the parameters
- fpp The smaller it is , The more memory space you need :0.01 need 900 Tens of thousands of digits ,0.03 need 700 Tens of thousands of digits .
- fpp The smaller it is , When you add data to a collection , More is needed hash Function operations are more hash value , To store in the corresponding array subscript .( Forget to look at the above process of bloom filtering stored data )
Redis Cache penetration solution : The bloon filter
First , The perforated filter is similar to a white list 、 The blacklist , The main element is to judge whether the element exists in a filter , Core here .
White list
scene :
The front end sends a query request , Through parameters key, First, pass the bron filter , If there is no filter ( White list ) in , Will be intercepted by the filter , Then directly return the empty data to the front end ; If key In the filter , The normal check will be carried out down Redis、mysql The process of ;
Here is a process 4: If mysql You can't find this in key, Explain what ? explain key On the white list , singular MySQL We can't find it , That is to say, bloom filter misjudged ! So this 4 The route is to delete this misjudgment , To prevent repeating the same request next time , However, due to the hash collision of Bloom filter, it is difficult to delete ( Prone to accidental injury ), So this line 4 It's practically impossible !
Be careful
- As a whole , As we mentioned earlier, the probability of misjudgment of Bloom filter is relatively small , So call directly MySQL The probability is small , Don't worry too much about this !
- All legal parameters key All should be put into the Bronx filter 、Redis in .
The blacklist
scene : A video website pushes videos to users
Bron filter function : When blacklist is used .
requirement : Videos that have been pushed , No longer push to users
technological process : When pushing a batch of videos to users , First determine whether these videos exist in the filter , If it exists, it will not be pushed to users , If it doesn't exist, push it to the user , At the same time, store the pushed video into the filter blacklist . Prevent the next push .
Code :
User entity class :
@Data
public class User implements Serializable {
public static String maYunPhone = "18890019390";
private Long id;
private String userName;
private Integer age;
}
/** * Solve cache penetration — White list */
public class RedissonBloomFilter {
/** * structure Redisson */
static RedissonClient redisson = null;
static RBloomFilter<String> bloomFilter = null;
static {
Config config = new Config();
config.useSingleServer().setAddress("redis://127.0.0.1:6379");
// structure Redisson
redisson = Redisson.create(config);
// Construct the bloom filter
bloomFilter = redisson.getBloomFilter("userIdFilter");
// Put the query data into Redis Cache and bloom filter
initData(redisson, bloomFilter);
}
public static void main(String[] args) {
User user = getUserById(2L);
System.out.println("user The object is :" + JSON.toJSONString(user));
}
private static void initData(RedissonClient redisson, RBloomFilter<String> bloomFilter) {
// Initialize the bloon filter : The expected element is 100000000L, The error rate is 3%
bloomFilter.tryInit(100000000L,0.01);
// take id by 1 The data of , Insert it into the bloon filter
bloomFilter.add("1");
bloomFilter.add("2");
// take id by 1 Corresponding user data , Insert into Redis In cache
redisson.getBucket("1").set("{id:1, userName:' Zhang San ', age:18}");
}
public static User getUserById(Long id) {
if (null == id) {
return null;
}
String idKey = id.toString();
// Start simulating cache penetration
// Front end query request key
if (bloomFilter.contains(idKey)) {
// Passed the filter white list verification , Go to Redis Query real data in
RBucket<Object> bucket = redisson.getBucket(idKey);
Object object = bucket.get();
// If Redis There's data , Return the data directly
if (null != object) {
System.out.println(" from Redis Found in the ");
String userStr = object.toString();
return JSON.parseObject(userStr, User.class);
}
// If Redis It's empty , To query the database
User user = selectByDb(idKey);
if (null == user) {
return null;
} else {
// Swipe the data back into the cache
redisson.getBucket(id.toString()).set(JSON.toJSONString(user));
}
return user;
}
return null;
}
private static User selectByDb(String id) {
System.out.println(" from MySQL Found in the ");
User user = new User();
user.setId(1L);
user.setUserName(" Zhang San ");
user.setAge(18);
return user;
}
}
Some of them are reproduced in :
https://www.cnblogs.com/itlaoge/p/14219693.html
边栏推荐
- How to expand the capacity of golang slice slice
- Write a program to process a list container of string type. Find a special value in the container 9.27: and delete it if found. Rewrite the above procedure with deque container.
- Interviewer: why is the value nil not equal to nil?
- On Data Mining
- [combinatorics] generating function (linear property | product property)
- 模块九作业
- [combinatorics] generating function (property summary | important generating function)*
- (8) HS corner detection
- 分布式的任务分发框架-Gearman
- win32:堆破壞的dump文件分析
猜你喜欢
Talk about the design and implementation logic of payment process
Computer graduation project PHP library book borrowing management system
Win32: dump file analysis of heap corruption
Interviewer: why is the value nil not equal to nil?
On Data Mining
BFS - topology sort
Valentine's day, send you a little red flower~
Theoretical description of linear equations and summary of methods for solving linear equations by eigen
Discussion sur la logique de conception et de mise en oeuvre du processus de paiement
Redis core technology and practice - learning notes (IX): slicing cluster
随机推荐
ES6类的继承
Closure and closure function
Create a new file from templates with bash script - create new file from templates with bash script
This diversion
BFS - topology sort
Image 24 bit depth to 8 bit depth
Analysis report on production and marketing demand and investment forecast of China's PVC industry from 2021 to 2026
统计图像中各像素值的数量
PHP MySQL inserts multiple pieces of data
On Data Mining
面试官:值为 nil 为什么不等于 nil ?
微服务组件Sentinel控制台调用
[combinatorics] generating function (summation property)
Win32: analyse du fichier dump pour la défaillance du tas
Kotlin的協程:上下文
Keepalived 设置不抢占资源
[combinatorics] generating function (use generating function to solve the number of solutions of indefinite equation example 2 | extended to integer solution)
[Tongxin UOS] scanner device management driver installation
Research Report on market demand and investment planning for the development of China's office chair industry, 2022-2028
[教程]在 CoreOS 上构建你的第一个应用