当前位置:网站首页>Take you to resolve hash conflicts and implement a simple hash table,
Take you to resolve hash conflicts and implement a simple hash table,
2022-06-26 18:11:00 【A goblin】
List of articles
- 1. Concept
- 2, Conflict — Concept
- 3, Conflict — avoid
- 4, Conflict — avoid — Hash function design
- 5, Conflict - avoid - Load factor regulation ( Focus on mastering )
- 6, Conflict — solve
- 7, Hash — solve — Closed hash
- 8, Conflict — solve — Hash / Hash bucket ( Focus on mastering )
- 9, A solution to a serious conflict
- 10, Implement a hash table
- 11, Performance analysis
1. Concept
Sequence structure and balance tree , There is no corresponding relationship between the element key and its storage location , So when looking for an element , You have to go through key
Multiple comparisons of codes . The time complexity of sequential search is O(N), Balance the height of the tree in the tree , namely O( l o g 2 N log_2 N log2N), The efficiency of a search depends on having searched
Comparison times of elements in the process .
Ideal search method : Without any comparison , Get the elements to search directly from the table at a time . If you construct a storage structure , By some kind of letter
Count (hashFunc) A one-to-one mapping relationship can be established between the storage location of an element and its key , Then when searching, you can quickly
Find the element .
When in this structure :
Insert elements
According to the key of the element to be inserted , This function calculates the storage location of the element and stores it according to this location
Search element
Do the same calculation for the key of the element , Take the function value as the storage location of the element , Take the element comparison in this position in the structure , if
The keys are equal , Then the search is successful
This is called hash ( hash ) Method , The conversion function used in the hash method is called hash ( hash ) function , The constructed structure is called a hash table (Hash
Table)( Or a hash table )
for example : Data set {1,7,6,4,5,9};
The hash function is set to :hash(key) = key % capacity; capacity Is the total size of the underlying space of the storage element .

Using this method to search does not need to compare multiple keys , So the search speed is faster
problem : According to the above hash method , Insert a meta... Into the collection plain 44, What will happen ?
Obviously 44%10=4 , In this case, the array subscript is 4 An element already exists at the location of , This is the problem we face Hash Collisions ( Hash collision )
2, Conflict — Concept
For the keywords of two data elements k i k_i ki and k j k_j kj(i != j), Yes k i k_i ki != k j k_j kj, But there are :Hash( k i k_i ki) == Hash( k j k_j kj), namely : Different Keyword calculate the same hash address through the same hash hash number , This phenomenon is called hash collision or hash collision .
Data elements with different keys and the same hash address are called “ A synonym for ”.
3, Conflict — avoid
First , We need to be clear , Because the capacity of the underlying array of our hash table is often less than the actual number of keywords to be stored , This leads to a A question , Conflict is inevitable , But what we can do is try our best Reduce the conflict rate .
4, Conflict — avoid — Hash function design
One possible cause of hash conflict is : Hash function design is not reasonable . Hash function design principles :
- The definition field of hash function must include all keys to be stored , And if the hash table allows m An address , Its value range must be in 0 To m-1 Between
- The address calculated by hash function can be evenly distributed in the whole space
- Hash functions should be simpler
Common hash functions
1). Direct customization –( Commonly used )
Take a linear function of keyword as hash address :Hash(Key)= A*Key + B advantage : Simple 、 uniform shortcoming : You need to know in advance Distribution of key words Use scenarios : It is suitable for finding relatively small and continuous cases
2). Division and remainder –( Commonly used )
Set the number of addresses allowed in the hash table to m, Take one not greater than m, But close to or equal to m The prime number of p As divisor , According to the hash function : Hash(key) = key% p(p<=m), Convert key to hash address
3). Square with the middle method –( understand )
Assume keyword is 1234, Yes, its square is 1522756, Extract the middle 3 position 227 As a hash address ; Another example is that the keyword is 4321, Yes It's Square 18671041, Extract the middle 3 position 671( or 710) As a hash address The square middle method is more suitable for : I don't know the score of keywords cloth , And the number of digits is not very big
4). Folding method –( understand )
Folding is to divide keywords from left to right into several parts with equal digits ( The last part of the digits can be shorter ), And then add these parts together and sum it up , And press the hash table length , Take the last few bits as the hash address . Folding method is suitable for not knowing the distribution of keywords in advance , It is suitable for the situation with more keyword digits
5). Random number method –( understand )
Choose a random function , Take the random function value of a keyword as its hash address , namely H(key) = random(key), among random Is a random number function . This method is usually used when the keyword length is different
6). Mathematical analysis –( understand )
Equipped with n individual d digit , Everyone may have r Different symbols , this r Different symbols don't always appear the same frequency on you , Maybe somewhere These positions are evenly distributed , Every symbol has an equal chance of appearing , It is unevenly distributed on some bits, and only some symbols often appear . According to the The size of the hash table , Select a number of bits in which various symbols are evenly distributed as the hash address .
Digital analysis method is usually suitable for dealing with large number of key words , If the distribution of keywords is known in advance and several bits of keywords are evenly distributed Even conditions
Be careful : The more sophisticated the hash function is , The lower the likelihood of hash conflicts , But there is no way to avoid hash conflicts
5, Conflict - avoid - Load factor regulation ( Focus on mastering )
The load factor of the hash table is defined as : a = The number of elements in the table / The length of the hash table
a It is a marker factor of hash table fullness . Since the meter length is a fixed value ,
a And “ The number of elements in the table ” In direct proportion to , therefore ,a The bigger it is , Indicates that the more elements to fill in the table , The more likely there is to be a conflict : conversely ,a The smaller it is , Indicate that the fewer elements to fill in the table , The less likely there is a conflict . actually , The average lookup length of the hash table is the load factor a Function of , It's just that different methods of dealing with conflicts have different functions .For open addressing , The load factor is a particularly important factor , It should be strictly limited to 0.7-0.8 following . exceed 0. 8, Look up the table CPU Cache miss (cache,
missing) Follow the exponential curve . therefore , Some use open addressing hash library ,
Such as Java The system library limits the load factor to 0.75, Exceeding this value will resize Hash table .
The relationship between load factor and conflict rate is roughly demonstrated 
So when the conflict rate is unbearable , We need to reduce the conflict rate by reducing the load factor .
It is known that the number of existing keywords in the hash table is immutable , Then all we can adjust is the size of the array in the hash table .
6, Conflict — solve
Two common ways to resolve hash conflicts are : Closed hash and Hash
7, Hash — solve — Closed hash
Closed hash : It's also called open addressing , When a hash conflict occurs , If the hash table is not full , It means that there must be a space in the hash table , Then you can. hold key Stored in a conflict location “ next ” Go to an empty place . So how to find the next empty position ?
1) Linear detection
For example, the scene above , Now you need to insert the element 44, First calculate the hash address through the hash function , Subscript to be 4, therefore 44 In theory, it should be inserted in this Location , But the position has been set to the value 4 The elements of , That is, a hash conflict occurs .
Linear detection : Start from where the conflict occurred , Sequential backward detection , Until we find the next empty position .
Insert :
- Get the location of the element to be inserted in the hash table through the hash function
- If there is no element in the location, insert the new element directly , If there are elements in the location that have hash conflicts , Use linear detection to find Next empty position , Insert new elements

- When using closed hash to handle hash conflicts , The existing elements in the hash table cannot be deleted randomly , If you delete the element directly, it will affect other
Element search . For example, delete elements 4, If delete directly ,44 The search may be affected . Therefore, the linear detection adopts the standard Remember the pseudo deletion method to delete an element
2). Second detection
The drawback of linear detection is that conflicting data are stacked together , This has something to do with finding the next empty location , Because the way to find an empty position is to Look back one by one , Therefore, in order to avoid this problem , The way to find the next empty location is : H i H_i Hi = ( H 0 H_0 H0 + i 2 i^2 i2 )% m, perhaps : H i H_i Hi = ( H 0 H_0 H0 - i 2 i^2 i2 )% m. among :i = 1,2,3…, H 0 H_0 H0 Is through the hash function Hash(x) Key to element key The calculated position ,m Is the size of the table . about 2.1 If you want to insert 44, Conflict , The situation after use is :
Studies have shown that : When the length of the table is prime and the table loading factor a No more than 0.5 when , New entries must be able to insert , And not in any position Will be probed twice . So as long as there are half empty positions in the table , There will be no table full problem . When searching, you can not consider the situation that the table is full condition , However, when inserting, you must ensure that the load factor of the table a No more than 0.5, If it exceeds the limit, we must consider increasing capacity .
therefore : The biggest disadvantage of hash ratio is that the space utilization is relatively low , This is also the defect of hash .
8, Conflict — solve — Hash / Hash bucket ( Focus on mastering )
Hash method is also called chain address method ( Open chain method ), First, we use hash function to calculate hash address for key set , Keys with the same address belong to the same child aggregate , Each subset is called a bucket , The elements in each bucket are linked by a single chain table , The head nodes of each linked list are stored in the hash table .

As can be seen from the above figure , Each bucket in the hash is filled with elements with hash conflicts .
Hash , It can be considered that a search problem in a large set is transformed into a search problem in a small set .
9, A solution to a serious conflict
We just mentioned , In fact, hash bucket can be regarded as transforming the search problem of large set into the search problem of small set , If the conflict is serious , It means In fact, the search performance of small collections is not good , At this time, we can continue to transform the so-called small set search problem , for example :
1). Behind each bucket is another hash table
2). Behind each bucket is a search tree
10, Implement a hash table
I use the open hash method to solve the hash conflict problem
I use hash The method of dividing a function and leaving a remainder –( Commonly used )
Here we mainly implement three operations of hash table
1) newly added ( With capacity expansion operation )
2) lookup
3) Delete
// Hash conflicts are handled by hashing
public class MyHashMap {
static class Node{
public int key;
public int value;
public Node next;
public Node(int key, int value) {
this.key = key;
this.value = value;
}
}
private static final double LOAD_FACTOR = 0.75; // Define a load shadow constant / It is used to judge whether capacity expansion is needed
//array Namely hash Table body , Each element of the array is also the head node of a linked list
private Node[] array = new Node[101];
private int size = 0; // At present hash The number of elements in the table
private int hashFunc( int key){
// Seek the present key stay hash Storage location in the table , The actual solution is much more complicated than this
return key%array.length;
}
// If hash It already exists in the table key Just modify the original key Of value
// Insert new key value pairs if they do not exist
public void put(int key,int value){
//1, Need to put key Map to array subscript
int index = hashFunc(key);
// Get the node at this position in the current array ( That is, the head node of the linked list )
Node list = array[index];
// Judgment is key Whether it exists in the linked list
for (Node cur = list; cur != null; cur = cur.next){
if(cur.key == value){
// If it exists, modify the current node value
cur.value = value;
return;
}
}
// If the cycle ends , Did not find , It is inserted into the subscript of the corresponding array ,
// Here, the head plug is used , You can also tail insert
Node newNode = new Node(key,value);
newNode.next = list;
array[index] = newNode;
index++;
if(size/array.length > LOAD_FACTOR){
resize(); // Larger than the load factor, the capacity will be expanded
}
}
private void resize(){
// Scale operation
Node[] newArray = new Node[array.length*2];
for (int i = 0; i < array.length; i++) {
for(Node cur = array[i]; cur != null; cur = cur.next){
int index = cur.key%newArray.length;
Node newNode = new Node(cur.key,cur.value);
newNode.next = newArray[index].next;
newArray[index] = newNode;
}
}
array = newArray;
}
// Search operation , lookup key Corresponding value, If you find it , Just go back to value, Otherwise return to null
public Integer get(int key){
// First we have to take key Map to array subscript
int index = hashFunc(key);
// Take out the linked list of the subscript position
Node list = array[index];
// Traverse the linked list to find
for (Node cur = list; cur != null; cur = cur.next){
if(cur.key == key){
return cur.value; // eureka
}
}
// Did not find
return null;
}
// Delete key
public void remove(int key){
int index = hashFunc(key);
Node list = array[index];
if(key == list.value){
// Handle header deletion
array[index] = list.next;
}
// In other locations, you need to find the previous node to delete the node
Node prveNode = serch(key);
Node toRemove = prveNode.next;
prveNode.next = toRemove.next;
}
public Node serch(int key){
// Find the node before the node to be deleted
int index = hashFunc(key);
Node list = array[index];
for (Node cur = list; cur != null && cur.next != null; cur = cur.next ){
if(cur.next.key == key){
return cur;
}
}
return null;
}
}11, Performance analysis
Although hashtables have been struggling with conflicts , But in actual use , We think the conflict rate of hash table is not high , The number of conflicts is controllable , That is, the length of the list in each bucket is a constant , therefore , In the usual sense , We think that hash table insertion / Delete / The complexity of searching time is O(1) .
边栏推荐
- [npoi] C copy sheet template across workbooks to export Excel
- 解决pycharm里面每个字母占一格空格的问题
- Using redis for user access data statistics hyperloglog and bitmap advanced data types
- Please advise tonghuashun which securities firm to choose for opening an account? Is it safe to open an account online now?
- LeetCode 238 除自身以外数组的乘积
- Introduction to Ethereum Technology Architecture
- wm_concat()和group_concat()函数
- 必须要掌握的面试重点——索引和事务(附讲B-树与B+树)
- 17.13 supplementary knowledge, thread pool discussion, quantity discussion and summary
- Handwritten promise all
猜你喜欢
随机推荐
Map和List<Map>转相应的对象
idea中文插件chinese(simplified) language pack
VCD video disc
Deep understanding of MySQL lock and transaction isolation level
JVM entry Door (1)
#25class的类继承
LeetCode 238 除自身以外数组的乘积
17.13 supplementary knowledge, thread pool discussion, quantity discussion and summary
Paging query and join Association query optimization
Digital signature standard (DSS)
股票开账户如何优惠开户?现在在线开户安全么?
Bayesian network explanation
二分查找法-1
非对称密码体制详解
How does Guosen Securities open an account? Is it safe to open a stock account through the link
Row lock analysis and deadlock
Lm06 the mystery of constructing the bottom and top trading strategy only by trading volume
你好,现在网上股票开户买股票安全吗?
图像二值化处理
力扣每日一题-第28天-566.重塑矩阵








