当前位置:网站首页>Bloom filter [proposed by bloom in 1970; redis cache penetration solution]
Bloom filter [proposed by bloom in 1970; redis cache penetration solution]
2022-07-03 18:07:00 【popofzk】
What is a bloon filter
The bloon filter (Bloom Filter), yes 1970 year , It was proposed by a young man named bloom , Fifty years ago .
It's actually a very long binary vector and a series of random mapping functions , You should all know , The data stored is not 0 Namely 1, The default is 0.
It is mainly used to judge whether an element is in a set ,0 It means that there is no data ,1 Represents the existence of a data .
Bloom filter uses
solve Redis Cache penetration
give an example : In reptiles , Filter the web address of the crawler , It's already in bloom , No more crawling .
give an example : Spam filtering , Judge whether each email address is on Bloom's blacklist , If it's spam, it's spam .
Principle of bloon filter
Storage process
The bloom filter says , It's a collection of binary data . When a data is added to this set , Experience the baptism of ( There are shortcomings here , Here can speak ):
- adopt K The hash function calculates , return K A calculated hash value
- these K individual hash Values are mapped to the corresponding K Two binary array subscripts
- take K The binary data corresponding to a subscript is changed to 1.
for example , The first hash function returns x, The second and third hash functions return y And z, that : X、Y、Z The corresponding binary is changed to 1.
The query process
The main function of Bloom filter is to query a data , Not in this binary set , The query process is as follows :
adopt K The hash function calculates , Corresponding to the calculated K individual hash value
adopt hash Value to find the corresponding binary array subscript
Judge : If there is a location where the binary data is 0, So the data doesn't exist . If it's all 1, The data exists in the collection .( There are shortcomings here , Here can speak )
Delete process
Generally, you can't delete the data in the bloom filter , This is one of the drawbacks , We're going to analyze that .
Advantages and disadvantages of Bloom filter
advantage
Because it stores binary data , So it takes up very little space
Its insertion and query speed is very fast , The time complexity is O(K), Think about it HashMap The process of
It's very confidential , Because it doesn't store any raw data , Only binary data
shortcoming
This is going back to the shortcomings we mentioned above .
Data is added by calculating data hash value , So it's very likely that this situation exists : Two different data compute the same hash value .
For example “ Hello ” and “hello”, If it turns out that hash Same value , Then they will change the binary data of the same subscript to 1. This is the time , You don't know what the subscript is 2 Binary system , It represents “ Hello ” still “hello”.
The disadvantages are as follows :
There is a miscalculation
If there is no picture above "hello", Only saved " Hello ", Then use "hello" When I came to check , Will judge "hello" In the collection .
because “ Hello ” and “hello” Of hash The values are the same , By the same hash value , The binary data found is the same , All are 1.Delete difficult
Take the example above , because “ Hello ” and “hello” Of hash Same value , The corresponding array subscript is the same .
At this time, I want to delete “ Hello ”, even “hello” All deleted together .(0 It means there's this data ,1 It means that there is no such data )
Implement the bloon filter
There are many ways to do it , One of them is Guava The implementation provided .
- introduce Guava pom To configure
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>29.0-jre</version>
</dependency>
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class BloomFilterCase {
/** * How much data is expected to be inserted */
private static int size = 1000000;
/** * Expected miscalculation rate */
private static double fpp = 0.01;
/** * The bloon filter */
private static BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size, fpp);
public static void main(String[] args) {
// Insert 10 Ten thousand sample data
for (int i = 0; i < size; i++) {
bloomFilter.put(i);
}
// With another 100000 test data , Test error rate
int count = 0;
for (int i = size; i < size + 100000; i++) {
if (bloomFilter.mightContain(i)) {
count++;
System.out.println(i + " misjudged ");
}
}
System.out.println(" The total number of misjudgments :" + count);
}
}
In depth analysis of the code
@VisibleForTesting
static <T> BloomFilter<T> create(
Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) {
....
}
Parameters
- funnel: data type ( It's usually called Funnels In a tool class )
- expectedInsertions: The number of values expected to be inserted
- fpp: Miscalculation rate ( The default value is 0.03)
- strategy: The hash algorithm
adjustment fpp Miscalculation rate
The situation a :fpp = 0.01
- The number of misjudgments :947
- Memory size :9585058 digit
Scenario summary
- The misjudgment rate can be determined by fpp Adjust the parameters
- fpp The smaller it is , The more memory space you need :0.01 need 900 Tens of thousands of digits ,0.03 need 700 Tens of thousands of digits .
- fpp The smaller it is , When you add data to a collection , More is needed hash Function operations are more hash value , To store in the corresponding array subscript .( Forget to look at the above process of bloom filtering stored data )
Redis Cache penetration solution : The bloon filter
First , The perforated filter is similar to a white list 、 The blacklist , The main element is to judge whether the element exists in a filter , Core here .
White list
scene :
The front end sends a query request , Through parameters key, First, pass the bron filter , If there is no filter ( White list ) in , Will be intercepted by the filter , Then directly return the empty data to the front end ; If key In the filter , The normal check will be carried out down Redis、mysql The process of ;
Here is a process 4: If mysql You can't find this in key, Explain what ? explain key On the white list , singular MySQL We can't find it , That is to say, bloom filter misjudged ! So this 4 The route is to delete this misjudgment , To prevent repeating the same request next time , However, due to the hash collision of Bloom filter, it is difficult to delete ( Prone to accidental injury ), So this line 4 It's practically impossible !
Be careful
- As a whole , As we mentioned earlier, the probability of misjudgment of Bloom filter is relatively small , So call directly MySQL The probability is small , Don't worry too much about this !
- All legal parameters key All should be put into the Bronx filter 、Redis in .
The blacklist
scene : A video website pushes videos to users
Bron filter function : When blacklist is used .
requirement : Videos that have been pushed , No longer push to users
technological process : When pushing a batch of videos to users , First determine whether these videos exist in the filter , If it exists, it will not be pushed to users , If it doesn't exist, push it to the user , At the same time, store the pushed video into the filter blacklist . Prevent the next push .
Code :
User entity class :
@Data
public class User implements Serializable {
public static String maYunPhone = "18890019390";
private Long id;
private String userName;
private Integer age;
}
/** * Solve cache penetration — White list */
public class RedissonBloomFilter {
/** * structure Redisson */
static RedissonClient redisson = null;
static RBloomFilter<String> bloomFilter = null;
static {
Config config = new Config();
config.useSingleServer().setAddress("redis://127.0.0.1:6379");
// structure Redisson
redisson = Redisson.create(config);
// Construct the bloom filter
bloomFilter = redisson.getBloomFilter("userIdFilter");
// Put the query data into Redis Cache and bloom filter
initData(redisson, bloomFilter);
}
public static void main(String[] args) {
User user = getUserById(2L);
System.out.println("user The object is :" + JSON.toJSONString(user));
}
private static void initData(RedissonClient redisson, RBloomFilter<String> bloomFilter) {
// Initialize the bloon filter : The expected element is 100000000L, The error rate is 3%
bloomFilter.tryInit(100000000L,0.01);
// take id by 1 The data of , Insert it into the bloon filter
bloomFilter.add("1");
bloomFilter.add("2");
// take id by 1 Corresponding user data , Insert into Redis In cache
redisson.getBucket("1").set("{id:1, userName:' Zhang San ', age:18}");
}
public static User getUserById(Long id) {
if (null == id) {
return null;
}
String idKey = id.toString();
// Start simulating cache penetration
// Front end query request key
if (bloomFilter.contains(idKey)) {
// Passed the filter white list verification , Go to Redis Query real data in
RBucket<Object> bucket = redisson.getBucket(idKey);
Object object = bucket.get();
// If Redis There's data , Return the data directly
if (null != object) {
System.out.println(" from Redis Found in the ");
String userStr = object.toString();
return JSON.parseObject(userStr, User.class);
}
// If Redis It's empty , To query the database
User user = selectByDb(idKey);
if (null == user) {
return null;
} else {
// Swipe the data back into the cache
redisson.getBucket(id.toString()).set(JSON.toJSONString(user));
}
return user;
}
return null;
}
private static User selectByDb(String id) {
System.out.println(" from MySQL Found in the ");
User user = new User();
user.setId(1L);
user.setUserName(" Zhang San ");
user.setAge(18);
return user;
}
}
Some of them are reproduced in :
https://www.cnblogs.com/itlaoge/p/14219693.html
边栏推荐
- BFS - topology sort
- [untitled]
- 毕业总结
- List的stream中Long对象与long判等问题记录
- Analyse ArrayList 3: suppression d'éléments
- Sensor 调试流程
- Baiwen.com 7 days Internet of things smart home learning experience punch in the next day
- Servlet specification Part II
- [combinatorics] generating function (generating function application scenario | using generating function to solve recursive equation)
- Draw some simple graphics with MFC
猜你喜欢
win32:堆破壞的dump文件分析
Five problems of database operation in commodity supermarket system
[combinatorics] generating function (summation property)
Win32: analyse du fichier dump pour la défaillance du tas
win32:堆破坏的dump文件分析
PHP MySQL create database
(9) Opencv Canny edge detection
List的stream中Long对象与long判等问题记录
Redis core technology and practice - learning notes (VI) how to achieve data consistency between master and slave Libraries
PHP MySQL inserts multiple pieces of data
随机推荐
Redis core technology and practice - learning notes (11): why not just string
Global and Chinese pediatric palliative care drug market development research and investment planning recommendations report 2022-2028
On Data Mining
[combinatorics] generating function (summation property)
Interviewer: why is the value nil not equal to nil?
远程办公工具分享|社区征文
Analysis report on production and marketing demand and investment forecast of China's PVC industry from 2021 to 2026
Write a program to process a list container of string type. Find a special value in the container 9.27: and delete it if found. Rewrite the above procedure with deque container.
PHP MySQL where clause
Draw some simple graphics with MFC
Graduation summary
Lesson 13 of the Blue Bridge Cup -- tree array and line segment tree [exercise]
Redis on local access server
Leetcode540: a single element in an ordered array
聊聊支付流程的設計與實現邏輯
PHP MySQL inserts data
Line by line explanation of yolox source code of anchor free series network (6) -- mixup data enhancement
Bidding procurement scheme management of Oracle project management system
[LINUX]CentOS 7 安装MYSQL时报错“No package mysql-server available“No package zabbix-server-mysql availabl
[enumeration] annoying frogs always step on my rice fields: (who is the most hateful? (POJ hundred practice 2812)