当前位置:网站首页>Hash table, hash function, bloom filter, consistency hash
Hash table, hash function, bloom filter, consistency hash
2022-07-04 23:28:00 【wrdoct】
List of articles
Preface
Hashtable 、 hash function 、 The bloon filter 、 Consistent Hashing .
One 、 Hashtable 、 hash function
The data stored in the hash table is key,value Type of , Hash function is also known as hash function .
For classic hash functions , It has the following 5 Some properties :
1、 The input field is infinite
2、 The output field is exhausted
3、 The input is the same, and the output must be the same
4、 When the input is different, the output may also be the same ( Hash collision )
5、 Different inputs will be evenly distributed in the output field ( Hash property of hash function )
Hashtable hash table(key,value) It's actually very simple , Is to put Key
Through a fixed algorithm function, it is called hash function Convert to a integer , Then put the number Remainder the length of the array , The remainder result is treated as an array Subscript , take value
Stored under the number Array space in .
And when you use a hash table for a query , Is to make it again Use hash function to convert key Convert to the corresponding array subscript , And locate in the space to get value, In this way , You can make full use of Positioning of arrays Able to locate data .
【 notes 】key The design of should be There are many kinds of guarantees , Make the low frequency 、 Intermediate frequency 、 There are quantities of high frequencies , In this way, it can be done according to the nature of the hash function Divide equally .
【 notes 】 for example , We have a 10TB Large files of exist on the distributed file system , Deposit is 100 Billion lines of string , And the strings are arranged out of order , Now we will count the repeated strings in this file .
hypothesis , We can call 100 Machine to calculate the file . that , Now how do we count repeated strings through hash function .
First , We need to separate these 100 machines from 0-99 Mark the number , Then we read the file line by line in the distributed file system ( Multiple machines read in parallel ), Calculated by hash function hashcode, What will be calculated hashcode Moyi 100, According to the simulated value , Store the line in the corresponding machine .
According to the properties of hash function , It's easy for us to see , The same string will be stored in the same machine .
Then we can go parallel 100 Taiwan machine , Calculate the corresponding data respectively , Greatly accelerate the speed of Statistics .
Be careful : this 10TB Documents are not divided equally 100GB, among 100 Taiwan machine , It's this 10TB In file Types of different strings , Divide equally To 100 Of the machines . If the data processed by a single machine is too large , You can follow the same method , Parallel multiple processes in one machine , Processing data .
Two 、 The bloon filter
If you want to Judge whether an element is in a set , The general idea is to save all the elements , And then through comparison to determine . But still You can use a hash table , Through a Hash Function maps an element to a bit array (Bit array) A point in . thus , We just need to see if this point is 1 You can see if it's in the collection . This is the basic idea of bloon filter .
The bloon filter :k A hash function calculates k Hash values , after modulus Black tracing ( The corresponding bits in the array Set as 1), Judgment is still k A hash function takes modulus to make judgment , Only all black ( Is full of 1) It belongs to , If one is white, it doesn't belong to . therefore It is possible to judge white as black , But it is impossible to judge black as white .
Three formulas :
n = sample size ;
p = Error rate ;
m = Optimal bit group size ;
k = Hash Select the optimal number of functions ;
The algorithm mistakenly believes that a Originally not in the set The elements of It is detected as being in the set (False Positives), This probability ( Error rate p) Determined by the following formula :
3、 ... and 、 Consistent Hashing
The above methods are Number of servers Take the mold , And the consistent hash algorithm is to 2^32 modulus .
Think of the 32nd power of two as a round , It's like a clock , The circle of a watch can be understood as being made by 60 A circle of points , And here we think of this circle as being made of 2^32 A circle of points , The schematic diagram is as follows :
The point directly above the circle represents 0,0 The first point on the right represents 1, And so on ,2、3、4、5、6…… until 2^32-1 , in other words 0 The first point on the left represents 2^32-1.
We put this by 2 Of 32 A ring of points to the power of is called hash Ring .
After hashing the objects to be cached , Find the nearest one in the hash ring clockwise , As the home server of cache .
This is how the consistent hash algorithm works , Determine which server an object should be cached on , Map both the cache server and the cached object to hash On the ring in the future , Starting from the location of the cached object , The first server encountered in a clockwise direction , Is the server where the current object will be cached .
Because the cached object and the server hash The value after is fixed , therefore , With the server unchanged , An image must be cached on a fixed server .
that , The next time you want to access this image , Just use the same algorithm again , You can figure out which server this image is cached on , Go directly to the corresponding server to find the corresponding image .
边栏推荐
- HMS core unified scanning service
- Redis: redis message publishing and subscription (understand)
- OSEK标准ISO_17356汇总介绍
- Docker镜像的缓存特性和Dockerfile
- CTF竞赛题解之stm32逆向入门
- In the enterprise, win10 turns on BitLocker to lock the disk, how to back up the system, how to recover when the system has problems, and how to recover quickly while taking into account system securi
- 使用快解析搭建自己的minecraft服务器
- Network namespace
- 初试为锐捷交换机跨设备型号升级版本(以RG-S2952G-E为例)
- The small program vant tab component solves the problem of too much text and incomplete display
猜你喜欢
Selected cutting-edge technical articles of Bi Ren Academy of science and technology
phpcms付费阅读功能支付宝支付
C language to quickly solve the reverse linked list
【js】-【排序-相关】-笔记
初试为锐捷交换机跨设备型号升级版本(以RG-S2952G-E为例)
Observable time series data downsampling practice in Prometheus
Redis getting started complete tutorial: hash description
【二叉树】节点与其祖先之间的最大差值
PMP证书续证流程
The difference between cout/cerr/clog
随机推荐
C语言快速解决反转链表
香港珠宝大亨,22亿“抄底”佐丹奴
[JS] - [dynamic planning] - Notes
字体设计符号组合多功能微信小程序源码
How to reduce the stock account Commission and stock speculation commission? Is it safe to open an online account
华泰证券低佣金的开户链接安全吗?
MIT-6.824-lab4B-2022(万字思路讲解-代码构建)
数据库基础知识
智力考验看成语猜古诗句微信小程序源码
微软禁用IE浏览器后,打开IE浏览器闪退解决办法
ECS settings SSH key login
Actual combat simulation │ JWT login authentication
CTF競賽題解之stm32逆向入門
VIM editor knowledge summary
C language to quickly solve the reverse linked list
【js】-【排序-相关】-笔记
Mit-6.824-lab4b-2022 (10000 word idea explanation - code construction)
The initial arrangement of particles in SPH (solved by two pictures)
【ODX Studio编辑PDX】-0.3-如何删除/修改Variant变体中继承的(Inherited)元素
Redis:Redis消息的发布与订阅(了解)