当前位置:网站首页>Bloom filter bloom
Bloom filter bloom
2022-08-01 14:31:00 【IABQL】
Use the Bloom filter to filter out data that does not exist in the DB, effectively reducing the possibility of cache penetration.
Let's briefly describe the process:
Hash the data in the DB (usually several operations are required), and store a value of 1 in the calculated position.When a request comes in, first access the redis cache and find that the data does not exist, then access the Bloom filter, and obtain the data at that location after hash operation.If it is 1, it means that the data exists in the DB, then access the specific data in the DB, otherwise do not access the DB.Thereby reducing the amount of access to the DB.
Let's take a look at the specific process of bloom work:
Bloom filter consists of "bitmap array whose initial value is 0" and "N hash functions".When we are writing database data, we make a mark in the Bloom filter, so that the next time we query whether the data is in the database, we only need to query the Bloom filter. If the queried data is not marked, it means that it is not in the database.
Bloom filters complete the tagging in 3 actions:
The first step is to use N hash functions to hash the data respectively to obtain N hash values;
The second step is to pair the N hash values obtained in the first step with the bitmap arrayModulo the length to get the corresponding position of each hash value in the bitmap array.
The third step is to set the value of each hash value in the corresponding position of the bitmap array to 1;
For example, suppose there is a bitmap array with a length of 8 and a distribution of 3 hash functions.Long filter.

After the database writes the data x, when the data x is marked in the Bloom filter, the data x will be calculated by 3 hash functions to obtain 3 hash values, and then the 3 hash values will be paired8 Take the modulo, assuming that the result of the modulo is 1, 4, 6, and then set the value of the 1st, 4th, and 6th positions of the bitmap array to 1.When the application wants to query whether the data x is a database, it only needs to check whether the values in the 1st, 4th, and 6th positions of the bitmap array are all 1 through the Bloom filter. As long as one of the values is 0, it is considered that the data x is not in the database.
Because the Bloom filter is based on the hash function, there is the possibility of hash collision while searching efficiently. For example, data x and data y may both fall in the 1st, 4th, and 6th positions, but in fact, there may be no data y in the database, and there is a misjudgment.You can reduce hash conflicts and reduce misjudgments by increasing the number of hash operations.(Because the more operations, the probability of wanting to have 1 in all positions will decrease, and the natural misjudgment will decrease. But the more operations, the larger the required array length, and the slower the operation speed.So it should be implemented according to actual business needs).
So, querying the Bloom filter that the data exists does not necessarily prove that the data exists in the database, but if the data does not exist in the query, the number must not exist in the database.
Original link: https://blog.csdn.net/qq_34827674/article/details/123463175
边栏推荐
- 阿里巴巴测试开发岗P6面试题
- 灵魂发问:MySQL是如何解决幻读的?
- 全球都热炸了,谷歌服务器已经崩掉了
- 反序列化漏洞详解
- SQL每日一练(牛客新题库)——第2天: 条件查询
- redis主从同步方式(redis数据同步原理)
- 使用ffmpeg来查看视频的信息,fps,和width,height
- WPF如何自定义隐藏下拉框选项
- Chat technology in live broadcast system (8): Architecture practice of IM message module in vivo live broadcast system
- ffmpeg视频剪辑中报错Could not write header for output file #0 (incorrect codec parameters ?): ……
猜你喜欢

The role of the final keyword final and basic types, reference types

The soul asks: How does MySQL solve phantom reads?

PAT1165 Block Reversing(25)

全球都热炸了,谷歌服务器已经崩掉了

OpenSSL SSL_read: Connection was reset, errno 10054

1161. 最大层内元素和

gpio analog serial communication

【无标题】

考研大事件!这6件事考研人必须知道!

荣信文化通过注册:年营收3.8亿 王艺桦夫妇为实控人
随机推荐
安培龙IPO过会:年营收5亿 同创伟业与中移创新是股东
Two Permutations
搭建ntp时间服务器(安装sql2000配置服务器失败)
PAT 1163 Dijkstra Sequence(30)
关于Request复用的那点破事儿。研究明白了,给你汇报一下。
stm32l476芯片介绍(nvidia驱动无法找到兼容的图形硬件)
ThreadLocal保存用户登录信息
预防和制止家庭暴力 人身安全保护令司法解释今起施行
热心肠:关于肠道菌群和益生菌的10个观点
Performance Optimization - Resource Optimization Notes
2022年5月20日最全摸鱼游戏导航
魔众文档管理系统 v5.0.0
pd groupby后列变索引以及聚合列无列名的问题
1161. 最大层内元素和
2022-07-29 网工进阶(二十二)BGP-其他特性(路由过滤、团体属性、认证、AS欺骗、对等体组、子路由器、路由最大接收数量)
立新能源深交所上市:市值55亿 哈密国投与国有基金是股东
Pytorch —— 分布式模型训练
性能优化——动画优化笔记
sql中常用到的正则表达
【每日一题】1161. 最大层内元素和