当前位置:网站首页>[STL source code analysis] summary notes (10): hashtable exploration

[STL source code analysis] summary notes (10): hashtable exploration

2022-06-11 07:17:00 Janvis of the stark family

00 Write it at the front

Balanced binary search tree RB-tree As a structure with balanced efficiency and complexity , Has been used for STL set and map Underlying support for .
【STL Source analysis 】 Summary notes (8): Red and black trees (RB-tree) To explore the
【STL Source analysis 】 Summary notes (9):set/multiset And map/multimap

And there is another structure , Hashtable ( Hereinafter referred to as hashtable), When the data has enough randomness , It is also able to maintain the function of inserting, deleting and other operations “ Constant average time ” performance . It's also unordered The foundation of the structure .

hashtable More is experience design

01 summary

hashtable Is a very common data structure , We also have contact in the data structure course .hash table It is a dictionary structure .

Suppose we have a lot of elements to store in an array , Then you can construct a simple array Can be realized .

But when the amount of data becomes larger , It may be unrealistic to apply for huge space , This is the time to consider another storage method .

Mapping function is a method , Map large numbers to decimals .

PNG Images 31

If the array length is tablesize, that x%tablesize The resulting number can be used as this number x The index of .

Solve the collision

such hash function There will be a problem , When the number of elements increases, it is inevitable that multiple elements are mapped to the same location . This situation is also called “ Collision ”. There are several common methods to solve collision :

  1. Linear detection

    When using hash function After calculating the position, it is found that the position has no space , We will continue to look down one by one for the positions where we can put the elements . If you reach the tail, start from the beginning .

    This method is easy to understand , But there is also a very intuitive problem : After many vacant seats are occupied , There will be a phenomenon that new elements crash into the wall crazily and finally insert successfully , Increased average insertion cost .

  2. Second detection

    The second detection is about , Since one by one detection will cause problems , Then we will probe in the form of quadratic . Turned out to be h+1,h+2… Now it becomes h+(1 The square of ),h+(2 The square of )…

    It is obvious that this method of detection is jumping , There will be no continuous impact on the wall , The effect will be much better . But it still happens when the calculated positions of two elements are the same , The efficiency waste caused by the same detection position .

  3. Open chain

    Open chain refers to maintaining a list, Let the elements with the same calculation position form a chain . This is also SGI STL in hash table Implementation method . We put forward separately that .

02 Open chain at hashtable The use of

hash table The form of is shown in the figure :

 remarks 2021 year 2 month 18 Japan

There are a lot of things in this form “ Wise remark of an experienced person ”( A design specification that everyone abides by, but the basis is unknown )

Like the whole buckets This vector The size of is usually a prime number .

And we can see that when there are more overlapping elements , This chain structure needs to maintain a long list . Then the search efficiency will be very low at this time .

So one rule is : When the number of elements is greater than buckets The number of spaces , It will buckets become 2 Double amplification , Then adjust it to prime number . In this way, the long chain can be shortened after recalculation .

In fact, it's not so easy STL These numbers are specified , We can have a look .

static const int _stl_num_primes=28;
static const unsigned long _stl_prime_list[_stl_num_primes]={
    53,97,193,389,769,1543,3079,6151,12289,24593,1572869,3145739,6291469...
}

03 hashtable Structure

hash table The structure is as follows :

template <class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc=alloc>
class hashtable{
public:
    typedef HashFcn hasher;
    typedef EqualKey key_equal;
    typedef size_t size_type;
private:
    hasher hash;
    key_equal equals;
    ExtratKey get_key;
    
    typedef _hashtable_node<Value>node;
    
    vector<node*,Alloc>buckets;
    size_type num_elements;
    ...
}

hashtable There are many template parameters , Take a look at .

  1. Value: The real value of the node
  2. Key: The key value of the node
  3. HashFcn:hash function, Used to get the element number ( That is, where to put it )
  4. ExtractKey: Take out key Methods , It is similar to the red and black tree getofvalue.
  5. EqualKey: Judge key Whether the same method

Below _hashtable_node yes hashtable The node structure of , contain next Pointers and value

template <class Value>
struct _hashtable_node{
    _hashtable_node* next;
    Value val;
}

Continue down to see buckets Is the use of vector Realized .

We can also calculate hashtable Size :

Three methods ( Function or functor ) Its size is 3.vector There are three pointers in the , Size is 12.size_type Size is 4. Total size is 19, For alignment, the system will be adjusted to 20.

04 hash function

hash function The important task of is to find the foothold of the element , That is, where to put it after calculation .

We all know that the calculation process is a modular process , But there will be some differences . For example, if you encounter characters and strings, you can't perform simple calculations .

stay STL Wrapped a function bkt_num(), Call in this function hash function.

size_type bkt_num_key(const key_type& key,size_t n)const
{
    return hash(key)%n;
}

there hash The design method of partial specialization is used ( stay traits The design method I have seen )

Set different methods according to the passed in value .( Most don't need to be changed , Directly return the original value and then take the module )

// Generalized version 
template <class Key> struct hash{};

// special version 
_STL_TEMPLATE_NULL struct hash<char>{
    size_t operator()(char x)const {return x;}
}
_STL_TEMPLATE_NULL struct hash<short>{
    size_t operator()(short x)const {return x;}
}
_STL_TEMPLATE_NULL struct hash<unsigned short>{
    size_t operator()(unsigned short x)const {return x;}
}
_STL_TEMPLATE_NULL struct hash<int>{
    size_t operator()(int x)const {return x;}
}

...

None of the above needs to be changed , Return the original value directly .

Let's look at the practices that need to be changed ( For character strings ):

_STL_TEMPLATE_NULL struct hash<char*>{
    size_t operator()(const char* s)const {return _stl_hash_string(s);}
}

_STL_TEMPLATE_NULL struct hash<const char*>{
    size_t operator()(const char* s)const {return _stl_hash_string(s);}
}

And then jump to _stl_hash_string(s)

inline size_t _stl_hash_string(const char* s){
    unsigned long h=0;
    for(;*s;++s){
        h=5*h+*s;
        return size_t(h);
    }
}

It is normal to fail to understand this practice , Because this itself is a very random piece of code .

for instance , We now have strings abc, Then the result will be calculated 5*(5*a+b)+c, there abc yes acsii code .

because hash function It is to ask for the code of an element to find the appropriate location , So this method is to calculate a value that is messy enough and not easy to repeat .

05 hashtable The iterator

Finally, let's talk about hashtable The iterator .

Because there are next The pointer , So you can easily jump to the next position .

node* cur;
hashtable* ht;

stay iterator There are two pointers in , The first is to point to the current list Pointer to the node in .

The second pointer is used in buckets Jump between , Because when we arrive list You need to jump to the next bucket. It's a bit like deque Control center .

Another digression , because hashtable You can only move forward ( Because only next), So it is easy for us to know its type .

typedef forward_iterator_tag iterator_category;

06 unordered

about unordered For containers , The underlying implementation is hashtable. The original non-standard name is hash set/multiset and hash map/multimap.

Let's take a look here hash set/multiset and hash map/multimap The implementation of the . And the one in front set/multiset,map/multimap similar , Can be seen as a container adapter .

hash set/multiset

hashset The bottom layer of depends on hashtable Realized , But unlike red and black trees hashtable No automatic sorting function , This is also unordered The origin of .

hashset How to use and set identical .

template <class Value,class HashFcn=hash<Value>,class EqualKey=equal_to<Value>,class Alloc=alloc>class hash_set{private:    typedef hashtable<Value,Value,HashFcn.identity<Value>,EqualKey,Alloc>ht;    ht rep;    ...}

You can see hashset take key and value Set to the same value , Then call directly at the bottom hashtable Realization .

Empathy ,hash multiset And hashset The difference is that the insert function , The former uses insert_equal(), The latter uses insert_unique().

hash multiset And multiset The only difference is hash multiset The elements of are not automatically sorted .

hash map/multimap

map and set The difference is the key value , It is similar to the red black tree in terms of parameters .

template<class Key,class T,class HashFcn=hash<Key>,class EqualKey=equal_to<key>,class Alloc=alloc>class hash_map{private:    typedef hashtable<pair<const Key,T>,Key,HashFcn,select1st<pair<const Key,T>>,EqualKey,Alloc>ht;    ht rep;}

This is the same as what we said in the red and black tree const The usage is consistent , final map Save two parts , Part of it is key, The other part is key and data Composed of value.

hashmap How to use and map identical .

hash multimap And multimap The only difference is hash multiset The elements of are not automatically sorted .

原网站

版权声明
本文为[Janvis of the stark family]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203020522174027.html