当前位置：网站首页>LRU cache for leveldb source code analysis

LRU cache for leveldb source code analysis

2022-07-01 05:17:00 【Ethan97】

Preface

Because the random access speed of memory is several orders of magnitude faster than that of disk , You can use memory to cache the data obtained by accessing the hard disk , When the same data is accessed again , No more disk requests IO, But directly from Cache Get data in . And according to the principle of locality ： The information to be used in the near future is likely to be close to the information being used in terms of spatial address , You can read more data than you need when you read the disk , These data may be needed in the near future .

LRU Full name Least Recently Used, This means that the least recently used cache entries will be swapped out .LRUCache Each cache item will be referenced , It maintains two linked lists ：

in_use_, Contains the cache entries being used （ namely refs >= 2 && in_cache == true）;
lru_, Contains unused cache entries （ namely refs == 1 && in_cache == true）.

Its working mechanism is as follows ：

When a new cache entry enters LRUCache when , Increase its reference count refs++, And insert it into in_use_ The head of the chain ;
When a cache item is no longer externally referenced , namely refs == 1（1 Represents only by LRUCache quote ） when , Take it from in_use_ List removal , And insert into lru_ The tail of the list .
When LRUCache When the total data size of the cache is larger than its rated capacity , It will be taken from lru_ Take out the linked list Oldest cache entry （ be located lru_ The head of the linked list ）, Eliminated , And load the new cache entry .

LRUCache It also contains a hash table , It is used to quickly index to the corresponding cache item by key .

Cache class

After analyzing the implementation class LRUCache front , First analysis LevelDB The cache interface of .include/leveldb/cache.h The contents are as follows ：

namespace leveldb {
    

class LEVELDB_EXPORT Cache;

LEVELDB_EXPORT Cache* NewLRUCache(size_t capacity);

class LEVELDB_EXPORT Cache {
    
 public:
  Cache() = default;

  Cache(const Cache&) = delete;
  Cache& operator=(const Cache&) = delete;

  //  call deleter Destroy all entry.
  virtual ~Cache();

  // cache Medium entry The handle of .
  struct Handle {
    };

  // Insert Insert a key value pair mapping into cache in , And give corresponding “ cost ”charge.
  // Insert Returns the corresponding handle. When the caller is no longer in use, he must call Release(handle) Method .
  //  When this is no longer needed entry when , Key value pairs are passed to deleter.
  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) = 0;

  // Lookup Return the corresponding key Of handle, If not, return nullptr.
  //  When the caller is no longer in use, he must call Release(handle) Method .
  virtual Handle* Lookup(const Slice& key) = 0;

  // Release Release one mapping.
  // handle Must have not been released , And the handle Returned by this object .
  virtual void Release(Handle* handle) = 0;

  // Value encapsulation Lookup() Back to handle And return it .
  // handle Must have not been released , And the handle Returned by this object .
  virtual void* Value(Handle* handle) = 0;

  // Erase Will a entry delete . Only in all the associated handle It will not be deleted until it is released .
  virtual void Erase(const Slice& key) = 0;

  // NewId Back to a new id.  The id It could be multiple client Use , They share the same cache To differentiate key space. 
  //  Usually client A new... Will be assigned at startup id, And its id Put it in the cache key .
  virtual uint64_t NewId() = 0;

  // Prune Remove all that are not in use entry.
  virtual void Prune() {
    }

  // TotalCharge Return total charge Estimation .
  virtual size_t TotalCharge() const = 0;
};

}  // namespace leveldb

adopt Cache The method of class can roughly understand how to use a cache. The main methods are ：

adopt Insert take entry Put it in cache in ;
adopt Lookup lookup cache One of them entry;
When you no longer use a entry when , call Release Method , Indicates that the entry No longer used by the caller .

`LRUHandle` Structure

LRUHandle yes LRUCache The basic unit of management , It allocates memory space on the heap .

struct LRUHandle {
    
  void* value;
  void (*deleter)(const Slice&, void* value);
  LRUHandle* next_hash;	//  Linked list pointer in hash table 
  LRUHandle* next;	//  The pointer of a two-way linked list 
  LRUHandle* prev;
  size_t charge;  // TODO(opt): Only allow uint32_t?
  size_t key_length;
  bool in_cache;     // Whether entry is in the cache.
  uint32_t refs;     // References, including cache reference, if present.
  uint32_t hash;     // Hash of key(); used for fast sharding and comparisons
  char key_data[1];  // Beginning of key

  Slice key() const {
    
    // next is only equal to this if the LRU handle is the list head of an
    // empty list. List heads never have meaningful keys.
    assert(next != this);

    return Slice(key_data, key_length);
  }
};

HandleTable class

HandleTable yes LevelDB Own implementation of the hash table , It has been LRUCache Use .HandleTable The implementation is a very simple hash table , It is different from C++ The main aspect of the built-in hash table is , When Number of linked list nodes > Barrel array length when ,HandleTable Start capacity expansion .

`HandleTable` Constructors

HandleTable The main member variables are ：

  // The table consists of an array of buckets where each bucket is
  // a linked list of cache entries that hash into the bucket.
  uint32_t length_;	// LRUHandle* The length of the array 
  uint32_t elems_;	//  Number of nodes in the table 
LRUHandle** list_;

It can be seen from the notes , The hash table contains an array of buckets , Each bucket is a cache entry Linked list .
HandleTable The constructor of calls the expansion directly Resize Method ：

HandleTable() : length_(0), elems_(0), list_(nullptr) {
     Resize(); }

Resize Method is the bottom layer of the hash table LRUHandle The pointer array allocates space .
Resize The code is as follows ：

  void Resize() {
    
    uint32_t new_length = 4;
    //  Find out what can hold elems_ Space required for elements .
    while (new_length < elems_) {
    
      new_length *= 2;
    }
    //  The assignment points to LRUHandle Pointer array of .
    LRUHandle** new_list = new LRUHandle*[new_length];
    memset(new_list, 0, sizeof(new_list[0]) * new_length);
    uint32_t count = 0;
    //  Traverse the old underlying array .
    for (uint32_t i = 0; i < length_; i++) {
    
      // h Of the old array LRUHandle The pointer .
      LRUHandle* h = list_[i];
      //  Traverse the positions in the old table i The linked list of .
      while (h != nullptr) {
    
        LRUHandle* next = h->next_hash;
        uint32_t hash = h->hash;
        // ptr Point to where the element should be placed .
        LRUHandle** ptr = &new_list[hash & (new_length - 1)];
        //  The first interpolation , Insert the linked list node of the old table into the linked list of the new table .
        h->next_hash = *ptr;
        *ptr = h;
        h = next;
        count++;
      }
    }
    assert(elems_ == count);
    delete[] list_;
    list_ = new_list;
    length_ = new_length;
  }

Through the above methods, we can see ,HandleTable Zipper method is used , All nodes with hash conflicts are linked together . When expansion is needed （ That is to call Resize）, Create a new LRUHandle* Array , And put all the pointers in the old table in the corresponding positions of the new table .

`HandleTable` lookup

First of all, we will introduce how to find entry Of FindPointer Method ：

  // FindPointer Return contains the given key/hash The node of . If there is no corresponding node , Then it returns the position where it should be inserted .
  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    
    //  Locate the corresponding slot according to the hash value .
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    //  Traverse the linked list of this slot , Until you find or reach the end of the linked list .
    while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
    
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

The schematic diagram is as follows ：
Please add a picture description

With FindPointer, You can easily realize the search function ：

LRUHandle* Lookup(const Slice& key, uint32_t hash) {
    
    return *FindPointer(key, hash);
  }

HandleTable Insert

Insert Will a handle Insert into hash table . The code is as follows ：

LRUHandle* Insert(LRUHandle* h) {
    
    LRUHandle** ptr = FindPointer(h->key(), h->hash);
    LRUHandle* old = *ptr;
    //  take h Of next_hash Pointer to ptr Location handle Of next_hash.
    h->next_hash = (old == nullptr ? nullptr : old->next_hash);
    //  Point the pointer at the end of the linked list to this insertion node .
    *ptr = h;
    //  If it is a new node （ That is, it does not overwrite existing nodes ）, Count .
    //  Determine whether capacity expansion is needed .
    if (old == nullptr) {
    
      ++elems_;
      if (elems_ > length_) {
    
        //  As long as the number of nodes in the hash table is greater than the length of the bucket array , We'll start expanding .
        Resize();
      }
    }
    //  Return to the old linked list node .
    return old;
  }

`HandleTable` Delete

Remove Remove key value pairs from the hash table , It removes the linked list node corresponding to the key value pair .

  LRUHandle* Remove(const Slice& key, uint32_t hash) {
    
  	//  Find a way to LRUHandle* The pointer to .
    LRUHandle** ptr = FindPointer(key, hash);
    LRUHandle* result = *ptr;
    if (result != nullptr) {
    
      //  Directly remove the node from the linked list .
      *ptr = result->next_hash;
      --elems_;
    }
    return result;
  }

`LRUCache` class

`LRUCache` Member variables of

I understand LRUCache It is easy to understand the working mechanism of ：

  // Initialized before use.
  size_t capacity_;	// LRUCache The total capacity 

  // mutex_ protects the following state.
  mutable port::Mutex mutex_;
  size_t usage_ GUARDED_BY(mutex_);	//  The amount of capacity used by cache entries 

  // Dummy head of LRU list.
  // lru.prev is newest entry, lru.next is oldest entry.
  // Entries have refs==1 and in_cache==true.
  LRUHandle lru_ GUARDED_BY(mutex_);	//  stay LRUCache Cache Necklace header dummy node in but not referenced 

  // Dummy head of in-use list.
  // Entries are in use by clients, and have refs >= 2 and in_cache==true.
  LRUHandle in_use_ GUARDED_BY(mutex_);	//  The cache Necklace Chain header dummy node being used 


  HandleTable table_ GUARDED_BY(mutex_);	//  Hash table used to find cache entries

`Ref`

call Ref Method represents a handle reference .Ref This may involve moving nodes from lru_ Move to in_use_.

void LRUCache::Ref(LRUHandle* e) {
    
  if (e->refs == 1 && e->in_cache) {
      // If on lru_ list, move to in_use_ list.
  	//  If refs == 1 And the cache item is in LRUCache in , The cache item is in the idle linked list lru_ in , Put it in in_use_ Linked list .
    LRU_Remove(e);
    LRU_Append(&in_use_, e);
  }
  e->refs++;
}

`Unref`

Unref Dereference a cache entry , And when no one quotes handle Space release .Unref This may involve moving nodes from in_use_ Move to lru_.

void LRUCache::Unref(LRUHandle* e) {
    
  assert(e->refs > 0);
  e->refs--;
  if (e->refs == 0) {
    
  	//  No one is referencing this cache entry anymore , Release it .
    assert(!e->in_cache);	// in_cache The setting of is completed by the caller .
    (*e->deleter)(e->key(), e->value);	//  Callback deleter.
    free(e);	//  Free heap space .
  } else if (e->in_cache && e->refs == 1) {
    
    //  except LRUCache No one is referencing this cache entry , Insert the node into the idle linked list lru_.
    LRU_Remove(e);
    LRU_Append(&lru_, e);
  }
}

`Insert`

Insert A cache entry will be created LRUHandle, And try to add it to LRUCache in .

Cache::Handle* LRUCache::Insert(const Slice& key, uint32_t hash, void* value,
                                size_t charge,
                                void (*deleter)(const Slice& key,
                                                void* value)) {
    
  MutexLock l(&mutex_);

  //  Allocate one on the heap LRUHandle, Pay attention to its space size ,-1 Because e->key_data One has been occupied byte.
  LRUHandle* e =
      reinterpret_cast<LRUHandle*>(malloc(sizeof(LRUHandle) - 1 + key.size()));
  e->value = value;
  e->deleter = deleter;
  e->charge = charge;
  e->key_length = key.size();
  e->hash = hash;
  e->in_cache = false;
  e->refs = 1;  //  here refs = 1 It's because in the end handle return .
  //  Append key to end .
  std::memcpy(e->key_data, key.data(), key.size());

  if (capacity_ > 0) {
    
    e->refs++;  //  here refs++ Because LRUCache Quoted it .
    e->in_cache = true;
    LRU_Append(&in_use_, e);	//  Append to in_use_ The head of the linked list .
    usage_ += charge;	//  At this time usage_ May exceed the total capacity .
    // Insert The node e Insert insert hash table , And return the replaced node （ The situation covered , namely LRUCache This key already exists in ） perhaps nullptr.
    // FinishErase
    FinishErase(table_.Insert(e));
  } else {
    
    //  There is no capacity , No caching .
    //  Yes next To initialize , because key() Will read next.
    e->next = nullptr;
  }
  //  When the consumption exceeds the total capacity , Constantly from lru_ Remove node , Until the dosage is within the capacity range , Or all idle nodes have been removed .
  while (usage_ > capacity_ && lru_.next != &lru_) {
    
    LRUHandle* old = lru_.next;
    assert(old->refs == 1);
    bool erased = FinishErase(table_.Remove(old->key(), old->hash));
    if (!erased) {
      // to avoid unused variable when compiled NDEBUG
      assert(erased);
    }
  }
  //  Return to newly created handle.
  return reinterpret_cast<Cache::Handle*>(e);
}

FinishErase Completed the closeout of the removal ：

//  from LRUCache Remove e, And back to e Is it nullptr.
bool LRUCache::FinishErase(LRUHandle* e) {
    
  if (e != nullptr) {
    
    assert(e->in_cache);
    //  Remove the node from the linked list .
    LRU_Remove(e);
    //  This cache entry will no longer be in LRUCache in .
    e->in_cache = false;
    usage_ -= e->charge;
    Unref(e);	//  If no one references this node , It will be Unref Function .
  }
  return e != nullptr;
}

`Lookup`

To find the way Lookup The implementation of is very simple , Just in LRUCache Find the corresponding... In the hash table of handle that will do .

Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) {
    
  MutexLock l(&mutex_);
  LRUHandle* e = table_.Lookup(key, hash);
  if (e != nullptr) {
    
    Ref(e);	//  Increase the reference count .
  }
  return reinterpret_cast<Cache::Handle*>(e);
}

`Release`

Because only one cache item can be released when no one references it , therefore Release Simply call Unref Method .

void LRUCache::Release(Cache::Handle* handle) {
    
  MutexLock l(&mutex_);
  Unref(reinterpret_cast<LRUHandle*>(handle));
}

`Erase`

Erase Remove the corresponding key from the hash table handle, And do some finishing work .

void LRUCache::Erase(const Slice& key, uint32_t hash) {
    
  MutexLock l(&mutex_);
  FinishErase(table_.Remove(key, hash));
}

`ShardedLRUCache`

because LRUCache Only one thread can be allowed to operate at a time , Performance bottlenecks may occur in multithreaded environments , This leads to ShardedLRUCache, It supports simultaneous access by multiple threads Cache.
ShardedLRUCache Just to Cache The work is delegated to multiple LRUCache One of , In particular , adopt key The high hash value of determines which LRUCache.

static const int kNumShardBits = 4;
static const int kNumShards = 1 << kNumShardBits;

class ShardedLRUCache : public Cache {
    
 private:
  LRUCache shard_[kNumShards];
  port::Mutex id_mutex_;
  uint64_t last_id_;

  static inline uint32_t HashSlice(const Slice& s) {
    
    return Hash(s.data(), s.size(), 0);
  }

  //  Use the high order of the hash value to map to a shard.
  static uint32_t Shard(uint32_t hash) {
     return hash >> (32 - kNumShardBits); }

 public:
  explicit ShardedLRUCache(size_t capacity) : last_id_(0) {
    
    //  here capacity + (kNumShards - 1) In order to avoid the actual capacity shortage caused by rounding error .
    const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;
    //  Create each LRUCache.
    for (int s = 0; s < kNumShards; s++) {
    
      shard_[s].SetCapacity(per_shard);
    }
  }
  ~ShardedLRUCache() override {
    }
  Handle* Insert(const Slice& key, void* value, size_t charge,
                 void (*deleter)(const Slice& key, void* value)) override {
    
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);
  }
  Handle* Lookup(const Slice& key) override {
    
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Lookup(key, hash);
  }
  void Release(Handle* handle) override {
    
    LRUHandle* h = reinterpret_cast<LRUHandle*>(handle);
    shard_[Shard(h->hash)].Release(handle);
  }
  void Erase(const Slice& key) override {
    
    const uint32_t hash = HashSlice(key);
    shard_[Shard(hash)].Erase(key, hash);
  }
  void* Value(Handle* handle) override {
    
    return reinterpret_cast<LRUHandle*>(handle)->value;
  }
  uint64_t NewId() override {
    
    MutexLock l(&id_mutex_);
    return ++(last_id_);
  }
  void Prune() override {
    
    for (int s = 0; s < kNumShards; s++) {
    
      shard_[s].Prune();
    }
  }
  size_t TotalCharge() const override {
    
    size_t total = 0;
    for (int s = 0; s < kNumShards; s++) {
    
      total += shard_[s].TotalCharge();
    }
    return total;
  }
};

Where to use Cache

LevelDB The cache of is used in two places ：

cache SSTable Inside Data Block;
cache SSTable Data structure in memory Table.

review

Why set up LRUHandle

Why set up LRUHandle, Not directly LRUCache Middle operation key value pair ：

LRUHandle It is actually a linked list node that encapsulates key value pairs + Hash table node , It can easily move between linked lists , Or insert into the hash table ;
Save some metadata of a cache item , Such as reference count 、 Occupancy space 、 Hash value, etc .

Reference count `refs` Timing of increase and decrease

refs increase ：

The newly created handle, If inserted LRUCache in ,refs Set to 2, Otherwise, it is set to 1 And back to ;
handle The reference count increases automatically when it is found .

refs Reduce ：

Closeout work removed FinishErase in refs--;
call Release On release , Count minus one ;
LRUCache At the time of destruction , Count minus one .

Why use `deleter`

The task of cleaning up the passed in key value pairs should be left to the callback function provided by the caller , because LRUCache I do not know! value The type of , It's not clear what space to free , Here's an example ：

struct TableAndFile {
    
  RandomAccessFile* file;
  Table* table;
};

// DeleteEntry It's a deleter.
static void DeleteEntry(const Slice& key, void* value) {
    
  TableAndFile* tf = reinterpret_cast<TableAndFile*>(value);
  delete tf->table;
  delete tf->file;
  delete tf;
}

原网站

版权声明
本文为[Ethan97]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/182/202207010506423434.html

当前位置：网站首页>LRU cache for leveldb source code analysis

LRU cache for leveldb source code analysis

Preface

Cache class

`LRUHandle` Structure

HandleTable class

`HandleTable` Constructors

`HandleTable` lookup

HandleTable Insert

`HandleTable` Delete

`LRUCache` class

`LRUCache` Member variables of

`Ref`

`Unref`

`Insert`

`Lookup`

`Release`

`Erase`

`ShardedLRUCache`

Where to use Cache

review

Why set up LRUHandle

Reference count `refs` Timing of increase and decrease

Why use `deleter`

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>LRU cache for leveldb source code analysis

LRU cache for leveldb source code analysis

Preface

Cache class

LRUHandle Structure

HandleTable class

HandleTable Constructors

HandleTable lookup

HandleTable Insert

HandleTable Delete

LRUCache class

LRUCache Member variables of

Ref

Unref

Insert

Lookup

Release

Erase

ShardedLRUCache

Where to use Cache

review

Why set up LRUHandle

Reference count refs Timing of increase and decrease

Why use deleter

边栏推荐

猜你喜欢

随机推荐

`LRUHandle` Structure

`HandleTable` Constructors

`HandleTable` lookup

`HandleTable` Delete

`LRUCache` class

`LRUCache` Member variables of

`Ref`

`Unref`

`Insert`

`Lookup`

`Release`

`Erase`

`ShardedLRUCache`

Reference count `refs` Timing of increase and decrease

Why use `deleter`