当前位置:网站首页>Redis' bloom filter
Redis' bloom filter
2022-07-28 05:29:00 【wang0112233】
1、 Use scenario of bloon filter
For example, there are several requirements :
①、 Originally 10 Million numbers , Now here comes... Again 10 Ten thousand numbers , We need to judge this quickly and accurately 10 Whether ten thousand numbers are in 10 In a hundred million numbers ?
Solution one : take 10 Hundreds of millions of numbers are stored in the database , Do database query , Accuracy has come to , But it will be slow .
Solution two : take 10 100 million numbers in memory , such as Redis In cache , Here we calculate the amount of memory used :10 Billion *8 byte =8GB, Query through memory , Accuracy and speed , But about 8gb Of memory space , It's a waste of memory space .
②、 Having been exposed to reptiles , There should be such a need , There are thousands of sites that need crawlers , For a new website url, How do we judge this url Whether we've climbed ?
The solutions are still the two above , Obviously , Not so good .
③、 In the same way, there is spam filtering .
So for things like this , Big data set , How to accurately and quickly judge whether a certain data is in a large amount of data set , And it doesn't take up memory , The bloon filter Came into being .
2、 Brief introduction of bloon filter
With the above questions , Let's see what a bloon filter is .
The bloon filter : A data structure , It's a long string of binary vectors , Think of it as a binary array . Since it's binary , So it doesn't contain 0, Namely 1, But the initial default values are 0.
As shown below :

①、 Add data
When introducing concepts , We say that we can think of a bloon filter as a container , So how to add a data to the bloom filter ?
As shown in the figure below : When you want to add an element to the bloom filter key when , We go through multiple hash function , Work out a value , Then set the grid where the value is set to 1.
such as , The figure below hash1(key)=1, Then in the first place 2 The grid will 0 Turn into 1( From an array 0 Start counting ),hash2(key)=7, Well, it's going to be 8 A grid position 1, By analogy .

②、 Determine if the data exists ?
You know how to add a data to the bloom filter , So here's a new data , How do we know if it's in this bloom filter ?
It's simple , We just need to pass the new data through the above defined hash functions , Calculate the values separately , Then see if the corresponding places are all 1, If there is one that is not 1 The situation of , So we can say , The new data must not exist in this bloom filter .
On the other hand , If the value calculated by the hash function , The corresponding places are 1, So we can be sure that : Does this data necessarily exist in this bloom filter ?
The answer is No , Because a lot of different data goes through hash The result of the function will be repeated , So there's a place where other data passes through hash Function set to 1.
We can come to a conclusion : The bloom filter can judge that certain data must not exist , But there is no way to judge that there must be .
③、 The advantages and disadvantages of the bloon filter
advantage : The advantages are obvious , Binary array , Very little memory , And the insertion and query speed is fast enough .
shortcoming : As data increases , The miscalculation rate will increase ; And there's no way to tell that the data must exist ; There is another important drawback , Unable to delete data .
3、Redis Implement the bloon filter
①、bitmaps
We know that computers use binary bits as the basic unit of underlying storage , A byte is equal to 8 position .
such as “big” A string is made up of three characters , These three characters correspond to ASCII The code is divided into 98、105、103, The corresponding binary storage is as follows :

stay Redis in ,Bitmaps A set of commands is provided to manipulate each bit in a string similar to the one above .
One 、 Set the value
setbit key offset value

We know "b" The binary representation of 0110 0010, We will be the first to 7 position ( from 0 Start ) Set to 1, that 0110 0011 It means the character “c”, So the last character “big” Turned into “cig”.
Two 、 Get value
gitbit key offset

3、 ... and 、 Get the bitmap. The specified range value is 1 The number of
bitcount key [start end]
If you don't specify , That is to get the full value of 1 The number of .
Be careful :start and end Specifies the The number of bytes , It's not a set of digits .

②、Redisson
Redis The bottom layer of implementing the bloom filter is through bitmap This data structure , As for how to achieve , There's no need to make wheels again , Introduce a client tool which is easy to use in the industry ——Redisson.
Redisson Is used in Java Operation in the program Redis The library of , utilize Redisson We can easily use Redis.
Let's go through Redisson To construct a bloom filter .
package com.ys.rediscluster.bloomfilter.redisson;
import org.redisson.Redisson;
import org.redisson.api.RBloomFilter;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;
public class RedissonBloomFilter {
public static void main(String[] args) {
Config config = new Config();
config.useSingleServer().setAddress("redis://192.168.14.104:6379");
config.useSingleServer().setPassword("123");
// structure Redisson
RedissonClient redisson = Redisson.create(config);
RBloomFilter<String> bloomFilter = redisson.getBloomFilter("phoneList");
// Initialize the bloon filter : The expected element is 100000000L, The error rate is 3%
bloomFilter.tryInit(100000000L,0.03);
// Will the number 10086 Insert it into the bloon filter
bloomFilter.add("10086");
// Determine whether the following numbers are in the bloom filter
System.out.println(bloomFilter.contains("123456"));//false
System.out.println(bloomFilter.contains("10086"));//true
}
}This is a single node Redis Realization way , If the amount of data is large , The expected error rate is very low , The memory provided by a single node is not enough , In this case, you can use a distributed bloom filter , You can also use it Redisson To achieve , I won't do code demonstration here , If you are interested, you can try .
4、guava Tools
Last but not least Redis How to implement the bloom filter .
guava I believe everyone has used the toolkit , This is from Google , It also provides the implementation of the bloom filter .
package com.ys.rediscluster.bloomfilter;
import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnel;
import com.google.common.hash.Funnels;
public class GuavaBloomFilter {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8),100000,0.01);
bloomFilter.put("10086");
System.out.println(bloomFilter.mightContain("123456"));
System.out.println(bloomFilter.mightContain("10086"));
}
}边栏推荐
- Multi module packaging: package: XXX does not exist
- mysql的日期与时间函数,varchar与date相互转换
- lamda 获取当前循环数,AtomicInteger
- regular expression
- 2021CSDN博客之星评选,互投
- The solution after the samesite by default cookies of Chrome browser 91 version are removed, and the solution that cross domain post requests in chrome cannot carry cookies
- Interpreting the source code of cfrunloopref
- mybaties foreach多选查询,index循环,取消and/or标签
- Reading notes of SMT practical guide 1
- 2022 summer practice (first week)
猜你喜欢

Edge calculation kubeedge+edgemash

VMware Workstation 与 Device/Credential Guard 不兼容。禁用 Device/Credential Guard

Professor dongjunyu made a report on the academic activities of "Tongxin sticks to the study of war and epidemic"

Digital twin solutions inject new momentum into the construction of chemical parks

Personal summary of restful interface use

From the basic concept of micro services to core components - explain and analyze through an example

restFul接口使用个人总结

Framework step by step easy-to-use process

FreeRTOS personal notes - task notification

Message forwarding mechanism -- save your program from crashing
随机推荐
2022 summer practice (PowerDesigner tutorial learning record) (first week)
【单例模式】懒汉模式的线程安全问题
Reading sdwebimage source code Notes
SMD component size metric English system corresponding description
List < long >, list < integer > convert each other
【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Mysql数据库索引(innodb引擎)
First acquaintance with C language (1)
JVM篇 笔记3:类加载与字节码技术
PC端-bug记录
Internal implementation principle of yymodel
Scope, execution process and life cycle of bean
Making RPM packages with nfpm
Non functional test
Tomato timing dimming table lamp touch chip-dlt8t10s-jericho
About MySQL group_ What concat has to say
You must configure either the server or JDBC driver (via the ‘serverTimezone)
多模块打包:程序包:xxx不存在
个人写的一个文件上传工具网站
regular expression