当前位置:网站首页>Interviewer: do you have any plan to request a lot of data that does not exist in redis to destroy the database?
Interviewer: do you have any plan to request a lot of data that does not exist in redis to destroy the database?
2022-06-27 06:51:00 【xy29981】
So how to keep these requests out ? The filter was born !
The bloon filter
The bloon filter (Bloom Filter) The general idea is , When the information you requested comes , First check the data you queried. Do I have , If so, push the request to the database , If not, go back to , how ?

Pictured , One bitmap Used to record ,bitmap The original values are all 0, When a data is saved , With three Hash The function is calculated three times Hash value , And will bitmap The corresponding position is set to 1.
Above picture ,bitmap Of 1,3,6 The location is marked as 1, At this time, if a data request comes , Still use the previous three Hash Function calculation Hash value , If it's the same data , It must still be mapped to 1,3,6 position , Then it can be judged that the data has been stored before , If the three locations of the new data map , One doesn't match , If mapped to 1,3,7 position , because 7 Is it 0, That is, this data has not been added to the database before , So go straight back .
The problem with the bloom filter
This way up here , You should have found out , There are some problems with the bloom filter :
The first on the one hand, , The bloom filter may misjudge :
If there is such a scenario , Put the packet 1 when , take bitmap Of 1,3,6 Bit set to 1, Put the packet 2 When will bitmap Of 3,6,7 Bit set to 1, At this point, a packet request that has not been saved 3, After hashing three times , Corresponding bitmap The loci are 1,6,7, This data has not been saved before , But because of the packet 1 and 2 When saving, set the corresponding point to 1, So request 3 It will also overwhelm the database , This situation , It will increase with the increase of stored data .

In the second , The bloom filter cannot delete data , There are two difficulties in deleting data :
One is , Due to the possibility of misjudgment , Not sure if the data exists in the database , For example, packets 3.
Two is , When you delete the flag on the bitmap corresponding to a packet , May affect other packets , For example, in the example above , If you delete a packet 1, It means that bitmap1,3,6 Bit is set to 0, At this point, the packet 2 To request , Does not exist , because 3,6 The two bits have been set to 0.
Bloom filter plus
In order to solve the problem of Bloom filter above , An enhanced bloom filter appears (Counting Bloom Filter), The idea of this filter is to combine the bitmap Replace with array , When a position in the array is mapped once +1, When deleted -1, This avoids the need to recalculate the remaining packets after the data is deleted by the ordinary bloom filter Hash The problem of , But there is still no way to avoid misjudgment .

Cuckoo filter
Poor query performance Because the bloom filter needs to use multiple hash Function to detect multiple different sites in a bitmap , These sites span a wide range of memory , It can lead to CPU Cache row hit rate is low . The latest interview questions have been sorted out , Click on Java Interview database Small program online brush questions .
Cuckoo filter claims to have solved this problem , It can effectively support reverse deletion . And make it an important selling point , Tempt you to give up the bloom filter and switch to the cuckoo filter .
Click on the official account ,Java dried food Timely delivery


Java Technology stack
Focus on sharing Java Technical dry cargo , Including multithreading 、JVM、Spring Boot、Spring Cloud、Intellij IDEA、Dubbo、Zookeeper、Redis、 Architecture design 、 Microservices 、 Message queue 、Git、 Interview questions 、 Programmer strategy 、 Latest news, etc .
511 Original content
official account
Why is it called cuckoo ?
There is an idiom ,「 Dog in the manger 」, So are cuckoos , Cuckoos never build their own nests . It lays its eggs in other people's nests , Let others help hatch . After the little cuckoo broke its shell , Because cuckoos are relatively large , It will be the adoptive mother's other children ( Or eggs ) Squeeze out of the nest —— Fell from a high altitude and died .
Cuckoo
The simplest cuckoo hash structure is a one-dimensional array structure , There will be two. hash The algorithm maps the new element to two positions in the array . If one of the two positions is empty , Then you can put the elements directly in .
But if both positions are full , It has to 「 Dog in the manger 」, Kick one at random , Then he occupied this position .
p1 = hash1(x) % l
p2 = hash2(x) % l
Copy code
Unlike cuckoos , The cuckoo hash algorithm will help these victims ( Squeezed eggs ) Look for other nests . Because each element can be placed in two places , As long as any free position , You can put it in .
So the sad egg will see if his other position is free , If it is empty , Move over and everyone will be happy . But what if this position is occupied by others ? good , Then it will do it again 「 Dog in the manger 」, Transfer the victim's role to others . Then the new victim will repeat the process until all the eggs find their nests .
Cuckoo hash problem
Recommend a Spring Boot Basic tutorials and practical examples :
https://github.com/javastacks/spring-boot-best-practice
Cuckoo filter
The cuckoo filter is the same as the cuckoo hash structure , It's also a one-dimensional array , But unlike cuckoo hash , Cuckoo hashes store the entire element , The cuckoo filter only stores the fingerprint information of the element ( How many? bit, Similar to bloom filter ). Here, the filter sacrifices the accuracy of the data for spatial efficiency . It is precisely because the fingerprint information of the element is stored , So there will be a misjudgment rate , This is the same as the bloom filter .
First of all, cuckoo filter will only use two hash function , But each position can hold multiple seats . these two items. hash The function selection is special , Because only fingerprint information can be stored in the filter . When the fingerprints in this position are run , It needs to calculate another dual position . The calculation of this dual position requires the element itself , Let's recall the previous hash position calculation formula .
fp = fingerprint(x)
p1 = hash1(x) % l
p2 = hash2(x) % l
We know p1 and x The fingerprints of , There is no way to directly calculate p2 Of . The latest interview questions have been sorted out , Click on Java Interview database Small program online brush questions .
special hash function
The clever thing about cuckoo filter is that it designs a unique hash function , So that it can be based on p1 and Elemental fingerprints Calculate directly p2, Without the need for a complete x Elements .
fp = fingerprint(x)
p1 = hash(x)
p2 = p1 ^ hash(fp) // Exclusive or
As can be seen from the above formula , When we know fp and p1, You can directly calculate p2. Similarly, if we know p2 and fp, It can also be calculated directly p1 —— Duality .
p1 = p2 ^ hash(fp)
So we don't need to know the current position is p1 still p2, Just put the current location and hash(fp) The dual position can be obtained by XOR calculation . And just make sure hash(fp) != 0 To make sure p1 != p2, In this way, there will be no problem that kicking yourself leads to a dead cycle .
Maybe you'll ask why it's here hash The function does not need to modulo the length of the array ? It's actually needed , However, the cuckoo filter forces that the length of the array must be 2 The index of , So modulo the length of an array is equivalent to modulo hash At the end of the value n position . During XOR operation , Ignore the low n position Just other bits . The calculated position p Keep it low n Bit is the final dual position .
Please look forward to my next article , thank you .

more java Advanced information , Interview information , Official account
边栏推荐
- 面试官:请你介绍一下缓存穿透、缓存空值、缓存雪崩、缓存击穿的,通俗易懂
- 2022 le fichier CISP - Pte (i) contient:
- Easyexcel: read Excel data into the list set
- Classical cryptosystem -- substitution and replacement
- 建模竞赛-光传送网建模与价值评估
- [QT notes] simple understanding of QT meta object system
- Redis 缓存穿透、缓存击穿、缓存雪崩
- matlab GUI界面仿真直流电机和交流电机转速仿真
- 快速实现蓝牙iBeacn功能详解
- Yolov6's fast and accurate target detection framework is open source
猜你喜欢
随机推荐
Maxcompute SQL 的查询结果条数受限1W
Memory barrier store buffer, invalid queue
Distribution gaussienne, régression linéaire, régression logistique
内存屏障今生之Store Buffer, Invalid Queue
TiDB 中的SQL 基本操作
Redis cache penetration, cache breakdown, cache avalanche
DMU software syntax highlighting VIM setting -- Learning Notes 6
Tidb database Quick Start Guide
2022 CISP-PTE(二)SQL注入
tracepoint
Optimistic and pessimistic affairs
Crawler learning 5--- anti crawling identification picture verification code (ddddocr and pyteseract measured effect)
Compatibility comparison between tidb and MySQL
Spark SQL common time functions
Configuring FTP, enterprise official website, database and other methods for ECS
OPPO面试整理,真正的八股文,狂虐面试官
小米面试官:听你说精通注册中心,我们来聊 3 天 3 夜
Gaussian distribution, linear regression, logistic regression
Idea one click log generation
仙人掌之歌——投石问路(1)








