当前位置:网站首页>09 bloom filter
09 bloom filter
2022-07-29 07:26:00 【51CTO】
reflection
If you want to judge often 1 Whether the elements exist , What would you do ?
It's easy to think of using hash tables (HashSet、HashMap), Take the element as key Go find * Time complexity :O(1), But space utilization is not high , Need to occupy more memory resources
If you need to write a web crawler to climb 10 100 million website data , To avoid climbing to duplicate websites , How to judge whether a website has been crawled ?
Obviously ,HashSet、HashMap Not a very good choice
Whether there is low time complexity 、 Solutions that use less memory ?
The bloon filter (Bloom Filter)
The bloon filter (Bloom Filter)
1970 It was proposed by bron
It is a probabilistic data structure with high spatial efficiency , It can be used to tell you : An element must not exist or may exist
Advantages and disadvantages
advantage : Space efficiency and query time are far more than general algorithms
shortcoming : There is a certain rate of miscalculation 、 Delete difficult
It is essentially a long binary vector and a series of random mapping functions (Hash function )
Common applications
Web blacklist system 、 Spam filtering system 、 Crawler's web address duplication system 、 Solve the problem of cache penetration
The principle of Bloom filter
Suppose the bloom filter consists of 20 Bit binary 、 3 It's a hash function , Each element can be processed by hash function to generate an index position
Additive elements : Set the index position generated by each hash function to 1
Query whether the element exists
* If there is a hash function, the generated index position is not 1, It means that there is no (100% accuracy )
* If the index position generated by each hash function is 1, It means existence ( There is a certain rate of miscalculation )

add to 、 The time complexity of query is :O(k) ,k Is the number of hash functions . The space complexity is :O(m) ,m Is the number of binary bits
The misjudgment rate of Bloom filter

Implementation of the bloom filter

Guava: Google Core Libraries For Java
https://mvnrepository.com/artifact/com.google.guava/guava
边栏推荐
- 使用自定义注解校验list的大小
- 作业7.28 文件IO与标准IO
- 我想问一下,我flink作业是以upsert-kafka的方式写入数据的,但是我在mysql里面去更
- 一篇长文---深入理解synchronized
- Vite3.0 has been released, can you still roll it (list of new features)
- WPF nested layout case
- zip gzip tar压缩进阶版
- Use vscode to configure Mysql to realize connection, query, and other functions
- Zabbix 其他基础监控项
- SpingBoot整合Quartz框架实现动态定时任务(支持实时增删改查任务)
猜你喜欢

微服务远程调用

个人博客系统(附源码)

ETL为什么经常变成ELT甚至LET?

Why does ETL often become ELT or even let?

Remote invocation of microservices

Error 1045 (28000) access denied for user 'root' @ 'localhost' solution

MySQL----多表查询

Operator3 - design an operator

Excel file reading and writing (creation and parsing)

Practice of online problem feedback module (XVII): realize the online download function of excel template
随机推荐
2022-07-28: what is the output of the following go language code? A:AA; B:AB; C:BA; D:BB。 package main import ( “fmt“ ) func main() { f
Introduction to logback filter
Leetcode buckle classic problem -- 4. Find the median of two positively ordered arrays
以太网接口介绍
MySQL - multi table query
Variables and encryption in ansible
MySQL uses the client and select methods to view the summary of blob type fields
Operator3-设计一个operator
[100 cases of unity practice] the single choice multiple choice judgment questions of unity universal question answering system are all common
logback 中FileAppender具有什么功能呢?
Gin routing, parameters, output
Synchronous / asynchronous, blocking / non blocking and IO
Gin template
Gin service exit
树莓派的启动流程
如何与斯堪尼亚SCANIA建立EDI连接?
Does Flink support sqlserver databases? Get the changes of SQLSERVER database
mysql 单表最多能存多少数据?
微服务远程调用
Paper reading (62):pointer networks