当前位置:网站首页>Take you to resolve hash conflicts and implement a simple hash table,
Take you to resolve hash conflicts and implement a simple hash table,
2022-06-26 18:11:00 【A goblin】
List of articles
- 1. Concept
- 2, Conflict — Concept
- 3, Conflict — avoid
- 4, Conflict — avoid — Hash function design
- 5, Conflict - avoid - Load factor regulation ( Focus on mastering )
- 6, Conflict — solve
- 7, Hash — solve — Closed hash
- 8, Conflict — solve — Hash / Hash bucket ( Focus on mastering )
- 9, A solution to a serious conflict
- 10, Implement a hash table
- 11, Performance analysis
1. Concept
Sequence structure and balance tree , There is no corresponding relationship between the element key and its storage location , So when looking for an element , You have to go through key
Multiple comparisons of codes . The time complexity of sequential search is O(N), Balance the height of the tree in the tree , namely O( l o g 2 N log_2 N log2N), The efficiency of a search depends on having searched
Comparison times of elements in the process .
Ideal search method : Without any comparison , Get the elements to search directly from the table at a time . If you construct a storage structure , By some kind of letter
Count (hashFunc) A one-to-one mapping relationship can be established between the storage location of an element and its key , Then when searching, you can quickly
Find the element .
When in this structure :
Insert elements
According to the key of the element to be inserted , This function calculates the storage location of the element and stores it according to this location
Search element
Do the same calculation for the key of the element , Take the function value as the storage location of the element , Take the element comparison in this position in the structure , if
The keys are equal , Then the search is successful
This is called hash ( hash ) Method , The conversion function used in the hash method is called hash ( hash ) function , The constructed structure is called a hash table (Hash
Table)( Or a hash table )
for example : Data set {1,7,6,4,5,9};
The hash function is set to :hash(key) = key % capacity; capacity Is the total size of the underlying space of the storage element .

Using this method to search does not need to compare multiple keys , So the search speed is faster
problem : According to the above hash method , Insert a meta... Into the collection plain 44, What will happen ?
Obviously 44%10=4 , In this case, the array subscript is 4 An element already exists at the location of , This is the problem we face Hash Collisions ( Hash collision )
2, Conflict — Concept
For the keywords of two data elements k i k_i ki and k j k_j kj(i != j), Yes k i k_i ki != k j k_j kj, But there are :Hash( k i k_i ki) == Hash( k j k_j kj), namely : Different Keyword calculate the same hash address through the same hash hash number , This phenomenon is called hash collision or hash collision .
Data elements with different keys and the same hash address are called “ A synonym for ”.
3, Conflict — avoid
First , We need to be clear , Because the capacity of the underlying array of our hash table is often less than the actual number of keywords to be stored , This leads to a A question , Conflict is inevitable , But what we can do is try our best Reduce the conflict rate .
4, Conflict — avoid — Hash function design
One possible cause of hash conflict is : Hash function design is not reasonable . Hash function design principles :
- The definition field of hash function must include all keys to be stored , And if the hash table allows m An address , Its value range must be in 0 To m-1 Between
- The address calculated by hash function can be evenly distributed in the whole space
- Hash functions should be simpler
Common hash functions
1). Direct customization –( Commonly used )
Take a linear function of keyword as hash address :Hash(Key)= A*Key + B advantage : Simple 、 uniform shortcoming : You need to know in advance Distribution of key words Use scenarios : It is suitable for finding relatively small and continuous cases
2). Division and remainder –( Commonly used )
Set the number of addresses allowed in the hash table to m, Take one not greater than m, But close to or equal to m The prime number of p As divisor , According to the hash function : Hash(key) = key% p(p<=m), Convert key to hash address
3). Square with the middle method –( understand )
Assume keyword is 1234, Yes, its square is 1522756, Extract the middle 3 position 227 As a hash address ; Another example is that the keyword is 4321, Yes It's Square 18671041, Extract the middle 3 position 671( or 710) As a hash address The square middle method is more suitable for : I don't know the score of keywords cloth , And the number of digits is not very big
4). Folding method –( understand )
Folding is to divide keywords from left to right into several parts with equal digits ( The last part of the digits can be shorter ), And then add these parts together and sum it up , And press the hash table length , Take the last few bits as the hash address . Folding method is suitable for not knowing the distribution of keywords in advance , It is suitable for the situation with more keyword digits
5). Random number method –( understand )
Choose a random function , Take the random function value of a keyword as its hash address , namely H(key) = random(key), among random Is a random number function . This method is usually used when the keyword length is different
6). Mathematical analysis –( understand )
Equipped with n individual d digit , Everyone may have r Different symbols , this r Different symbols don't always appear the same frequency on you , Maybe somewhere These positions are evenly distributed , Every symbol has an equal chance of appearing , It is unevenly distributed on some bits, and only some symbols often appear . According to the The size of the hash table , Select a number of bits in which various symbols are evenly distributed as the hash address .
Digital analysis method is usually suitable for dealing with large number of key words , If the distribution of keywords is known in advance and several bits of keywords are evenly distributed Even conditions
Be careful : The more sophisticated the hash function is , The lower the likelihood of hash conflicts , But there is no way to avoid hash conflicts
5, Conflict - avoid - Load factor regulation ( Focus on mastering )
The load factor of the hash table is defined as : a = The number of elements in the table / The length of the hash table
a It is a marker factor of hash table fullness . Since the meter length is a fixed value ,
a And “ The number of elements in the table ” In direct proportion to , therefore ,a The bigger it is , Indicates that the more elements to fill in the table , The more likely there is to be a conflict : conversely ,a The smaller it is , Indicate that the fewer elements to fill in the table , The less likely there is a conflict . actually , The average lookup length of the hash table is the load factor a Function of , It's just that different methods of dealing with conflicts have different functions .For open addressing , The load factor is a particularly important factor , It should be strictly limited to 0.7-0.8 following . exceed 0. 8, Look up the table CPU Cache miss (cache,
missing) Follow the exponential curve . therefore , Some use open addressing hash library ,
Such as Java The system library limits the load factor to 0.75, Exceeding this value will resize Hash table .
The relationship between load factor and conflict rate is roughly demonstrated 
So when the conflict rate is unbearable , We need to reduce the conflict rate by reducing the load factor .
It is known that the number of existing keywords in the hash table is immutable , Then all we can adjust is the size of the array in the hash table .
6, Conflict — solve
Two common ways to resolve hash conflicts are : Closed hash and Hash
7, Hash — solve — Closed hash
Closed hash : It's also called open addressing , When a hash conflict occurs , If the hash table is not full , It means that there must be a space in the hash table , Then you can. hold key Stored in a conflict location “ next ” Go to an empty place . So how to find the next empty position ?
1) Linear detection
For example, the scene above , Now you need to insert the element 44, First calculate the hash address through the hash function , Subscript to be 4, therefore 44 In theory, it should be inserted in this Location , But the position has been set to the value 4 The elements of , That is, a hash conflict occurs .
Linear detection : Start from where the conflict occurred , Sequential backward detection , Until we find the next empty position .
Insert :
- Get the location of the element to be inserted in the hash table through the hash function
- If there is no element in the location, insert the new element directly , If there are elements in the location that have hash conflicts , Use linear detection to find Next empty position , Insert new elements

- When using closed hash to handle hash conflicts , The existing elements in the hash table cannot be deleted randomly , If you delete the element directly, it will affect other
Element search . For example, delete elements 4, If delete directly ,44 The search may be affected . Therefore, the linear detection adopts the standard Remember the pseudo deletion method to delete an element
2). Second detection
The drawback of linear detection is that conflicting data are stacked together , This has something to do with finding the next empty location , Because the way to find an empty position is to Look back one by one , Therefore, in order to avoid this problem , The way to find the next empty location is : H i H_i Hi = ( H 0 H_0 H0 + i 2 i^2 i2 )% m, perhaps : H i H_i Hi = ( H 0 H_0 H0 - i 2 i^2 i2 )% m. among :i = 1,2,3…, H 0 H_0 H0 Is through the hash function Hash(x) Key to element key The calculated position ,m Is the size of the table . about 2.1 If you want to insert 44, Conflict , The situation after use is :
Studies have shown that : When the length of the table is prime and the table loading factor a No more than 0.5 when , New entries must be able to insert , And not in any position Will be probed twice . So as long as there are half empty positions in the table , There will be no table full problem . When searching, you can not consider the situation that the table is full condition , However, when inserting, you must ensure that the load factor of the table a No more than 0.5, If it exceeds the limit, we must consider increasing capacity .
therefore : The biggest disadvantage of hash ratio is that the space utilization is relatively low , This is also the defect of hash .
8, Conflict — solve — Hash / Hash bucket ( Focus on mastering )
Hash method is also called chain address method ( Open chain method ), First, we use hash function to calculate hash address for key set , Keys with the same address belong to the same child aggregate , Each subset is called a bucket , The elements in each bucket are linked by a single chain table , The head nodes of each linked list are stored in the hash table .

As can be seen from the above figure , Each bucket in the hash is filled with elements with hash conflicts .
Hash , It can be considered that a search problem in a large set is transformed into a search problem in a small set .
9, A solution to a serious conflict
We just mentioned , In fact, hash bucket can be regarded as transforming the search problem of large set into the search problem of small set , If the conflict is serious , It means In fact, the search performance of small collections is not good , At this time, we can continue to transform the so-called small set search problem , for example :
1). Behind each bucket is another hash table
2). Behind each bucket is a search tree
10, Implement a hash table
I use the open hash method to solve the hash conflict problem
I use hash The method of dividing a function and leaving a remainder –( Commonly used )
Here we mainly implement three operations of hash table
1) newly added ( With capacity expansion operation )
2) lookup
3) Delete
// Hash conflicts are handled by hashing
public class MyHashMap {
static class Node{
public int key;
public int value;
public Node next;
public Node(int key, int value) {
this.key = key;
this.value = value;
}
}
private static final double LOAD_FACTOR = 0.75; // Define a load shadow constant / It is used to judge whether capacity expansion is needed
//array Namely hash Table body , Each element of the array is also the head node of a linked list
private Node[] array = new Node[101];
private int size = 0; // At present hash The number of elements in the table
private int hashFunc( int key){
// Seek the present key stay hash Storage location in the table , The actual solution is much more complicated than this
return key%array.length;
}
// If hash It already exists in the table key Just modify the original key Of value
// Insert new key value pairs if they do not exist
public void put(int key,int value){
//1, Need to put key Map to array subscript
int index = hashFunc(key);
// Get the node at this position in the current array ( That is, the head node of the linked list )
Node list = array[index];
// Judgment is key Whether it exists in the linked list
for (Node cur = list; cur != null; cur = cur.next){
if(cur.key == value){
// If it exists, modify the current node value
cur.value = value;
return;
}
}
// If the cycle ends , Did not find , It is inserted into the subscript of the corresponding array ,
// Here, the head plug is used , You can also tail insert
Node newNode = new Node(key,value);
newNode.next = list;
array[index] = newNode;
index++;
if(size/array.length > LOAD_FACTOR){
resize(); // Larger than the load factor, the capacity will be expanded
}
}
private void resize(){
// Scale operation
Node[] newArray = new Node[array.length*2];
for (int i = 0; i < array.length; i++) {
for(Node cur = array[i]; cur != null; cur = cur.next){
int index = cur.key%newArray.length;
Node newNode = new Node(cur.key,cur.value);
newNode.next = newArray[index].next;
newArray[index] = newNode;
}
}
array = newArray;
}
// Search operation , lookup key Corresponding value, If you find it , Just go back to value, Otherwise return to null
public Integer get(int key){
// First we have to take key Map to array subscript
int index = hashFunc(key);
// Take out the linked list of the subscript position
Node list = array[index];
// Traverse the linked list to find
for (Node cur = list; cur != null; cur = cur.next){
if(cur.key == key){
return cur.value; // eureka
}
}
// Did not find
return null;
}
// Delete key
public void remove(int key){
int index = hashFunc(key);
Node list = array[index];
if(key == list.value){
// Handle header deletion
array[index] = list.next;
}
// In other locations, you need to find the previous node to delete the node
Node prveNode = serch(key);
Node toRemove = prveNode.next;
prveNode.next = toRemove.next;
}
public Node serch(int key){
// Find the node before the node to be deleted
int index = hashFunc(key);
Node list = array[index];
for (Node cur = list; cur != null && cur.next != null; cur = cur.next ){
if(cur.next.key == key){
return cur;
}
}
return null;
}
}11, Performance analysis
Although hashtables have been struggling with conflicts , But in actual use , We think the conflict rate of hash table is not high , The number of conflicts is controllable , That is, the length of the list in each bucket is a constant , therefore , In the usual sense , We think that hash table insertion / Delete / The complexity of searching time is O(1) .
边栏推荐
- Digital signature standard (DSS)
- PC端录制扫515地机器人/scan数据
- Hello, is it safe to open an online stock account and buy stocks now?
- Idea collection code, quickly open favorites collection window
- Several key points in divorce agreement
- js强制转换
- 行锁分析和死锁
- RSA encryption and decryption details
- How about opening an account at Guojin securities? Is it safe to open an account?
- 我想知道,我在肇庆,到哪里开户比较好?网上开户是否安全么?
猜你喜欢

idea中文插件chinese(simplified) language pack

Properties file garbled

JVM entry door (1)

next(iter(dataloader))的一点点体会
![[uniapp] the uniapp mobile terminal uses uni Troubleshooting of navigateback failure](/img/26/da00e70d0955bcdef3362474bc5fc7.png)
[uniapp] the uniapp mobile terminal uses uni Troubleshooting of navigateback failure
![[buuctf.reverse] 126-130](/img/df/e35633d85caeff1dece62a66cb7804.png)
[buuctf.reverse] 126-130

MYSQL的下载与配置 mysql远程操控

零时科技 | 智能合约安全系列文章之反编译篇

Chinese (Simplified) language pack

RuntimeError: CUDA error: out of memory自己的解决方法(情况比较特殊估计对大部分人不适用)
随机推荐
Hello, is it safe to open an online stock account and buy stocks now?
深层次安全定义剖析及加密技术
RSA encryption and decryption details
How about opening a flush account? Is it safe? How to open a stock trading account
MYSQL的下载与配置 mysql远程操控
Padding percentage operation
(必须掌握的多线程知识点)认识线程,创建线程,使用Thread的常见方法及属性,以及线程的状态和状态转移的意义
DoS及攻击方法详解
wm_concat()和group_concat()函数
Binary search-2
How about opening an account at Guojin securities? Is it safe to open an account?
Using redis for user access data statistics hyperloglog and bitmap advanced data types
手写promise.all
KDD 2022 | how to use comparative learning in cross domain recommendation?
Connected to surface test questions
sqlite数据库的系统表sqlite_master
[buuctf.reverse] 126-130
Temporarily turn off MySQL cache
CD-CompactDisk
Tsinghua & Shangtang & Shanghai AI & CUHK proposed Siamese image modeling, which has both linear probing and intensive prediction performance!