当前位置:网站首页>Hash table, hash function, bloom filter, consistency hash
Hash table, hash function, bloom filter, consistency hash
2022-07-04 23:28:00 【wrdoct】
List of articles
Preface
Hashtable 、 hash function 、 The bloon filter 、 Consistent Hashing .
One 、 Hashtable 、 hash function
The data stored in the hash table is key,value Type of , Hash function is also known as hash function .
For classic hash functions , It has the following 5 Some properties :
1、 The input field is infinite
2、 The output field is exhausted
3、 The input is the same, and the output must be the same
4、 When the input is different, the output may also be the same ( Hash collision )
5、 Different inputs will be evenly distributed in the output field ( Hash property of hash function )
Hashtable hash table(key,value) It's actually very simple , Is to put Key
Through a fixed algorithm function, it is called hash function Convert to a integer , Then put the number Remainder the length of the array , The remainder result is treated as an array Subscript , take value
Stored under the number Array space in .
And when you use a hash table for a query , Is to make it again Use hash function to convert key Convert to the corresponding array subscript , And locate in the space to get value, In this way , You can make full use of Positioning of arrays Able to locate data .
【 notes 】key The design of should be There are many kinds of guarantees , Make the low frequency 、 Intermediate frequency 、 There are quantities of high frequencies , In this way, it can be done according to the nature of the hash function Divide equally .
【 notes 】 for example , We have a 10TB Large files of exist on the distributed file system , Deposit is 100 Billion lines of string , And the strings are arranged out of order , Now we will count the repeated strings in this file .
hypothesis , We can call 100 Machine to calculate the file . that , Now how do we count repeated strings through hash function .
First , We need to separate these 100 machines from 0-99 Mark the number , Then we read the file line by line in the distributed file system ( Multiple machines read in parallel ), Calculated by hash function hashcode, What will be calculated hashcode Moyi 100, According to the simulated value , Store the line in the corresponding machine .
According to the properties of hash function , It's easy for us to see , The same string will be stored in the same machine .
Then we can go parallel 100 Taiwan machine , Calculate the corresponding data respectively , Greatly accelerate the speed of Statistics .
Be careful : this 10TB Documents are not divided equally 100GB, among 100 Taiwan machine , It's this 10TB In file Types of different strings , Divide equally To 100 Of the machines . If the data processed by a single machine is too large , You can follow the same method , Parallel multiple processes in one machine , Processing data .
Two 、 The bloon filter
If you want to Judge whether an element is in a set , The general idea is to save all the elements , And then through comparison to determine . But still You can use a hash table , Through a Hash Function maps an element to a bit array (Bit array) A point in . thus , We just need to see if this point is 1 You can see if it's in the collection . This is the basic idea of bloon filter .
The bloon filter :k A hash function calculates k Hash values , after modulus Black tracing ( The corresponding bits in the array Set as 1), Judgment is still k A hash function takes modulus to make judgment , Only all black ( Is full of 1) It belongs to , If one is white, it doesn't belong to . therefore It is possible to judge white as black , But it is impossible to judge black as white .
Three formulas :
n = sample size ;
p = Error rate ;
m = Optimal bit group size ;
k = Hash Select the optimal number of functions ;
The algorithm mistakenly believes that a Originally not in the set The elements of It is detected as being in the set (False Positives), This probability ( Error rate p) Determined by the following formula :
3、 ... and 、 Consistent Hashing
The above methods are Number of servers Take the mold , And the consistent hash algorithm is to 2^32 modulus .
Think of the 32nd power of two as a round , It's like a clock , The circle of a watch can be understood as being made by 60 A circle of points , And here we think of this circle as being made of 2^32 A circle of points , The schematic diagram is as follows :
The point directly above the circle represents 0,0 The first point on the right represents 1, And so on ,2、3、4、5、6…… until 2^32-1 , in other words 0 The first point on the left represents 2^32-1.
We put this by 2 Of 32 A ring of points to the power of is called hash Ring .
After hashing the objects to be cached , Find the nearest one in the hash ring clockwise , As the home server of cache .
This is how the consistent hash algorithm works , Determine which server an object should be cached on , Map both the cache server and the cached object to hash On the ring in the future , Starting from the location of the cached object , The first server encountered in a clockwise direction , Is the server where the current object will be cached .
Because the cached object and the server hash The value after is fixed , therefore , With the server unchanged , An image must be cached on a fixed server .
that , The next time you want to access this image , Just use the same algorithm again , You can figure out which server this image is cached on , Go directly to the corresponding server to find the corresponding image .
边栏推荐
- 初试为锐捷交换机跨设备型号升级版本(以RG-S2952G-E为例)
- Excel shortcut keys - always add
- 取得PMP证书需要多长时间?
- MariaDB's Galera cluster application scenario -- multi master and multi active databases
- [graph theory] topological sorting
- Financial markets, asset management and investment funds
- Qualcomm WLAN framework learning (30) -- components supporting dual sta
- VIM editor knowledge summary
- [JS] - [sort related] - Notes
- C language to quickly solve the reverse linked list
猜你喜欢
Hong Kong Jewelry tycoon, 2.2 billion "bargain hunting" Giordano
高通WLAN框架学习(30)-- 支持双STA的组件
Redis introduction complete tutorial: List explanation
C language to quickly solve the reverse linked list
Redis: redis message publishing and subscription (understand)
S32 Design Studio for ARM 2.2 快速入门
colResizable. JS auto adjust table width plug-in
CTF competition problem solution STM32 reverse introduction
Redis introduction complete tutorial: detailed explanation of ordered collection
phpcms付费阅读功能支付宝支付
随机推荐
The difference between debug and release
Is the account opening link of Huatai Securities with low commission safe?
PICT 生成正交测试用例教程
Tweenmax emoticon button JS special effect
Chinese verification of JS regular expressions (turn)
SPH中的粒子初始排列问题(两张图解决)
A complete tutorial for getting started with redis: understanding and using APIs
【kotlin】第三天
PMP证书续证流程
D3.js+Three. JS data visualization 3D Earth JS special effect
French scholars: the explicability of counter attack under optimal transmission theory
【js】-【动态规划】-笔记
Solve the problem that the virtual machine cannot be remotely connected through SSH service
Qt加法计算器(简单案例)
Servlet服务器端和客户端中文输出乱码问题
QT personal learning summary
phpcms付费阅读功能支付宝支付
C language to quickly solve the reverse linked list
LIst 相关待整理的知识点
机器人强化学习——Learning Synergies between Pushing and Grasping with Self-supervised DRL (2018)