当前位置：网站首页>LevelDB源码分析之LRU Cache

LevelDB源码分析之LRU Cache

2022-07-01 05:06:00 【Ethan97】

前言

因为内存的随机访问速度要比磁盘的随机访问速度快上几个数量级，可以用内存将访问硬盘得到的数据缓存起来，当再次访问同样的数据时，不必再请求磁盘IO，而是可以直接从Cache中获取数据。并且根据局部性原理：在最近的将来将用到的信息很可能与正在使用的信息在空间地址上是临近的，可以在读磁盘时读入比需要的数据更多的数据，这些数据可能就是不久的将来需要的。

LRU全称Least Recently Used，意即最近最少使用的缓存项将被换出。LRUCache会对每个缓存项进行引用计数，它维护两个链表：

in_use_，包含正在被使用的缓存项（即 refs >= 2 && in_cache == true）；
lru_，包含未被使用的缓存项（即 refs == 1 && in_cache == true）。

其工作机制如下：

当一个新的缓存项进入LRUCache时，增加其引用计数refs++，并将它被插入到in_use_链表的头部；
当一个缓存项不再被外部引用，即refs == 1（1代表只被LRUCache引用）时，将它从in_use_链表移除，并插入到 lru_链表的尾部 。
当LRUCache缓存的总数据大小大于其额定容量时，它将从lru_链表拿出最老的缓存项（位于lru_链表的首部），淘汰之，并将新的缓存项装入。

LRUCache还包含一个哈希表，用于通过键快速索引到对应的缓存项。

Cache类

在分析实现类LRUCache前，先分析LevelDB的缓存接口。include/leveldb/cache.h 内容如下：

namespace leveldb {
    

class LEVELDB_EXPORT Cache;

LEVELDB_EXPORT Cache* NewLRUCache(size_t capacity);

class LEVELDB_EXPORT Cache {
    
 public:
  Cache() = default;

  Cache(const Cache&) = delete;
  Cache& operator=(const Cache&) = delete;

  // 调用deleter销毁所有entry.
  virtual ~Cache();

  // cache中的entry的句柄。
  struct Handle {
    };

  // Insert将一个键值对映射插入到cache中，并赋予相应的“费用”charge.
  // Insert返回该映射对应的handle.调用者不再使用时必须调用Release(handle)方法。
  // 当不再需要此entry时，键值对会被传到deleter.
  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) = 0;

  // Lookup返回对应key的handle，如果没有则返回nullptr.
  // 调用者不再使用时必须调用Release(handle)方法。
  virtual Handle* Lookup(const Slice& key) = 0;

  // Release释放一个mapping.
  // handle必须还未被释放，且该handle由该对象返回。
  virtual void Release(Handle* handle) = 0;

  // Value封装Lookup()返回的handle中的值并返回之。
  // handle必须还未被释放，且该handle由该对象返回。
  virtual void* Value(Handle* handle) = 0;

  // Erase将一个entry删去。只有在所有与之关联的handle释放后才会被真正删去。
  virtual void Erase(const Slice& key) = 0;

  // NewId返回一个新的id. 该id可能被多个client使用，他们共享同一个cache来划分key space. 
  // 通常client会在启动时分配一个新的id，并将其id放到缓存的键里。
  virtual uint64_t NewId() = 0;

  // Prune移除所有不在使用中的entry.
  virtual void Prune() {
    }

  // TotalCharge返回总charge的估计。
  virtual size_t TotalCharge() const = 0;
};

}  // namespace leveldb

通过Cache类的方法可以大致了解应该怎样使用一个cache。主要方法有：

通过Insert将entry放到cache中；
通过Lookup查找cache中的一个entry；
当不再使用一个entry时，调用Release方法，表明该entry不再被调用者使用。

`LRUHandle`结构体

LRUHandle是LRUCache管理的基本单位，它在堆上分配内存空间。

struct LRUHandle {
    
  void* value;
  void (*deleter)(const Slice&, void* value);
  LRUHandle* next_hash;	// 哈希表中的链表指针
  LRUHandle* next;	// 双向链表中的链表指针
  LRUHandle* prev;
  size_t charge;  // TODO(opt): Only allow uint32_t?
  size_t key_length;
  bool in_cache;     // Whether entry is in the cache.
  uint32_t refs;     // References, including cache reference, if present.
  uint32_t hash;     // Hash of key(); used for fast sharding and comparisons
  char key_data[1];  // Beginning of key

  Slice key() const {
    
    // next is only equal to this if the LRU handle is the list head of an
    // empty list. List heads never have meaningful keys.
    assert(next != this);

    return Slice(key_data, key_length);
  }
};

HandleTable类

HandleTable是LevelDB自己实现的哈希表，它被LRUCache使用。HandleTable实现的是一个非常简单的哈希表，其区别于C++内置哈希表的主要方面在于，当链表节点数>桶数组长度时，HandleTable就开始进行扩容。

`HandleTable`构造函数

HandleTable主要的成员变量为：

  // The table consists of an array of buckets where each bucket is
  // a linked list of cache entries that hash into the bucket.
  uint32_t length_;	// LRUHandle*数组长度
  uint32_t elems_;	// 表内结点数
LRUHandle** list_;

由注释可知，哈希表包含了一个桶数组，每个桶是一个cache entry链表。
HandleTable的构造函数直接调用了扩容Resize方法：

HandleTable() : length_(0), elems_(0), list_(nullptr) {
     Resize(); }

Resize方法为哈希表的底层LRUHandle指针数组分配了空间。
Resize代码如下：

  void Resize() {
    
    uint32_t new_length = 4;
    // 找出可以容纳elems_个元素所需空间。
    while (new_length < elems_) {
    
      new_length *= 2;
    }
    // 分配指向LRUHandle的指针数组。
    LRUHandle** new_list = new LRUHandle*[new_length];
    memset(new_list, 0, sizeof(new_list[0]) * new_length);
    uint32_t count = 0;
    // 遍历旧的底层数组。
    for (uint32_t i = 0; i < length_; i++) {
    
      // h即旧数组的LRUHandle指针。
      LRUHandle* h = list_[i];
      // 遍历旧表中位置i的链表。
      while (h != nullptr) {
    
        LRUHandle* next = h->next_hash;
        uint32_t hash = h->hash;
        // ptr指向元素应该放到的位置。
        LRUHandle** ptr = &new_list[hash & (new_length - 1)];
        // 头插法，将旧表的链表结点插入到新表的链表中。
        h->next_hash = *ptr;
        *ptr = h;
        h = next;
        count++;
      }
    }
    assert(elems_ == count);
    delete[] list_;
    list_ = new_list;
    length_ = new_length;
  }

通过上述方法可以看出，HandleTable使用了拉链法，所有哈希冲突的结点都被链接在一起。当需要扩容时（即调用Resize），创建新的LRUHandle*数组，并将旧表中的所有指针放到新表的相应位置。

`HandleTable`查找

首先介绍用于查找entry的FindPointer方法：

  // FindPointer返回包含给定key/hash的结点。如果没有对应结点，则返回应该插入的位置。
  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    
    // 根据哈希值定位到对应的槽位。
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    // 遍历此槽位的链表，直至找到或到达链表尾部。
    while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
    
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

其示意图大致如下：
请添加图片描述

有了FindPointer，便可以轻易实现查找功能：

LRUHandle* Lookup(const Slice& key, uint32_t hash) {
    
    return *FindPointer(key, hash);
  }

HandleTable插入

Insert将一个handle插入到哈希表中。代码如下：

LRUHandle* Insert(LRUHandle* h) {
    
    LRUHandle** ptr = FindPointer(h->key(), h->hash);
    LRUHandle* old = *ptr;
    // 将h的next_hash指针指向ptr位置handle的next_hash.
    h->next_hash = (old == nullptr ? nullptr : old->next_hash);
    // 将链表表尾指针指向此插入结点。
    *ptr = h;
    // 如果是新增结点（即不是覆盖已有结点），进行计数。
    // 判断是否需要扩容。
    if (old == nullptr) {
    
      ++elems_;
      if (elems_ > length_) {
    
        // 只要哈希表中的结点数大于桶数组长度，就开始扩容。
        Resize();
      }
    }
    // 返回旧的链表结点。
    return old;
  }

`HandleTable`删除

Remove将哈希表内的键值对移除，它将键值对对应的链表节点移除。

  LRUHandle* Remove(const Slice& key, uint32_t hash) {
    
  	// 找到指向LRUHandle*的指针。
    LRUHandle** ptr = FindPointer(key, hash);
    LRUHandle* result = *ptr;
    if (result != nullptr) {
    
      // 直接将结点从链表中移除。
      *ptr = result->next_hash;
      --elems_;
    }
    return result;
  }

`LRUCache`类

`LRUCache`的成员变量

了解了LRUCache的工作机制后比较容易理解：

  // Initialized before use.
  size_t capacity_;	// LRUCache的总容量

  // mutex_ protects the following state.
  mutable port::Mutex mutex_;
  size_t usage_ GUARDED_BY(mutex_);	// 缓存项已占用的容量

  // Dummy head of LRU list.
  // lru.prev is newest entry, lru.next is oldest entry.
  // Entries have refs==1 and in_cache==true.
  LRUHandle lru_ GUARDED_BY(mutex_);	// 在LRUCache中但未被引用的缓存项链表头哑结点

  // Dummy head of in-use list.
  // Entries are in use by clients, and have refs >= 2 and in_cache==true.
  LRUHandle in_use_ GUARDED_BY(mutex_);	// 正在被使用的缓存项链表链表头哑结点


  HandleTable table_ GUARDED_BY(mutex_);	// 用于查找缓存项的哈希表

`Ref`

调用Ref方法表示对一个handle进行引用。Ref可能涉及将结点从lru_移动到in_use_.

void LRUCache::Ref(LRUHandle* e) {
    
  if (e->refs == 1 && e->in_cache) {
      // If on lru_ list, move to in_use_ list.
  	// 如果refs == 1且缓存项在LRUCache中，说明缓存项在闲置链表lru_中，将它放到in_use_链表。
    LRU_Remove(e);
    LRU_Append(&in_use_, e);
  }
  e->refs++;
}

`Unref`

Unref取消引用一个缓存项，并在没有人引用时将handle的空间释放。Unref可能涉及将结点从in_use_移动到lru_.

void LRUCache::Unref(LRUHandle* e) {
    
  assert(e->refs > 0);
  e->refs--;
  if (e->refs == 0) {
    
  	// 没有人在引用此缓存项了，释放它。
    assert(!e->in_cache);	// in_cache的设置由调用者完成。
    (*e->deleter)(e->key(), e->value);	// 回调deleter.
    free(e);	// 释放堆空间。
  } else if (e->in_cache && e->refs == 1) {
    
    // 除了LRUCache本身没有人在引用这个缓存项了，将结点插入到闲置链表lru_.
    LRU_Remove(e);
    LRU_Append(&lru_, e);
  }
}

`Insert`

Insert将创建一个缓存项LRUHandle，并尝试将它加入到LRUCache中。

Cache::Handle* LRUCache::Insert(const Slice& key, uint32_t hash, void* value,
                                size_t charge,
                                void (*deleter)(const Slice& key,
                                                void* value)) {
    
  MutexLock l(&mutex_);

  // 在堆上分配一个LRUHandle，注意其空间大小，-1是因为e->key_data已经占用一个byte.
  LRUHandle* e =
      reinterpret_cast<LRUHandle*>(malloc(sizeof(LRUHandle) - 1 + key.size()));
  e->value = value;
  e->deleter = deleter;
  e->charge = charge;
  e->key_length = key.size();
  e->hash = hash;
  e->in_cache = false;
  e->refs = 1;  // 此处refs = 1是因为最后要将handle返回。
  // 将键追加在末尾。
  std::memcpy(e->key_data, key.data(), key.size());

  if (capacity_ > 0) {
    
    e->refs++;  // 此处refs++是因为LRUCache引用了它。
    e->in_cache = true;
    LRU_Append(&in_use_, e);	// 追加到in_use_链表首部。
    usage_ += charge;	// 这时usage_可能超出总容量。
    // Insert将结点e插入插入哈希表，并返回被替换的结点（被覆盖的情况，即LRUCache中已有此键）或者nullptr.
    // FinishErase
    FinishErase(table_.Insert(e));
  } else {
    
    // 已经没有容量了，不进行缓存。
    // 对next进行初始化，因为key()会读取next。
    e->next = nullptr;
  }
  // 当用量超出了总容量，不断从lru_移除结点，直至用量在容量范围内，或已经移除所有闲置结点。
  while (usage_ > capacity_ && lru_.next != &lru_) {
    
    LRUHandle* old = lru_.next;
    assert(old->refs == 1);
    bool erased = FinishErase(table_.Remove(old->key(), old->hash));
    if (!erased) {
      // to avoid unused variable when compiled NDEBUG
      assert(erased);
    }
  }
  // 返回新创建的handle.
  return reinterpret_cast<Cache::Handle*>(e);
}

FinishErase完成了移除的收尾工作：

// 从LRUCache中移除e，并返回e是否为nullptr.
bool LRUCache::FinishErase(LRUHandle* e) {
    
  if (e != nullptr) {
    
    assert(e->in_cache);
    // 从链表中移除该结点。
    LRU_Remove(e);
    // 此缓存项将不再在LRUCache中。
    e->in_cache = false;
    usage_ -= e->charge;
    Unref(e);	// 如果没有人引用该结点，它将在Unref函数中被释放。
  }
  return e != nullptr;
}

`Lookup`

查找方法Lookup的实现非常简单，只需在LRUCache的哈希表中查找对应的handle即可。

Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) {
    
  MutexLock l(&mutex_);
  LRUHandle* e = table_.Lookup(key, hash);
  if (e != nullptr) {
    
    Ref(e);	// 增加引用计数。
  }
  return reinterpret_cast<Cache::Handle*>(e);
}

`Release`

因为只能在没有人引用时释放一个缓存项，所以Release仅是简单地调用Unref方法。

void LRUCache::Release(Cache::Handle* handle) {
    
  MutexLock l(&mutex_);
  Unref(reinterpret_cast<LRUHandle*>(handle));
}

`Erase`

Erase从哈希表移除键对应的handle，并进行一些收尾工作。

void LRUCache::Erase(const Slice& key, uint32_t hash) {
    
  MutexLock l(&mutex_);
  FinishErase(table_.Remove(key, hash));
}

`ShardedLRUCache`

因为LRUCache一次只能允许一个线程进行操作，多线程环境下可能会形成性能瓶颈，由此引入了ShardedLRUCache，它支持多个线程同时访问Cache。
ShardedLRUCache只是将Cache工作委派到它持有的多个LRUCache之一，具体地，通过key的哈希值高位确定委派给哪个LRUCache.

static const int kNumShardBits = 4;
static const int kNumShards = 1 << kNumShardBits;

class ShardedLRUCache : public Cache {
    
 private:
  LRUCache shard_[kNumShards];
  port::Mutex id_mutex_;
  uint64_t last_id_;

  static inline uint32_t HashSlice(const Slice& s) {
    
    return Hash(s.data(), s.size(), 0);
  }

  // 用哈希值的高位映射到某个shard.
  static uint32_t Shard(uint32_t hash) {
     return hash >> (32 - kNumShardBits); }

 public:
  explicit ShardedLRUCache(size_t capacity) : last_id_(0) {
    
    // 这里capacity + (kNumShards - 1)为了避免舍入误差导致实际容量不足。
    const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;
    // 创建各LRUCache.
    for (int s = 0; s < kNumShards; s++) {
    
      shard_[s].SetCapacity(per_shard);
    }
  }
  ~ShardedLRUCache() override {
    }
  Handle* Insert(const Slice& key, void* value, size_t charge,
                 void (*deleter)(const Slice& key, void* value)) override {
    
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);
  }
  Handle* Lookup(const Slice& key) override {
    
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Lookup(key, hash);
  }
  void Release(Handle* handle) override {
    
    LRUHandle* h = reinterpret_cast<LRUHandle*>(handle);
    shard_[Shard(h->hash)].Release(handle);
  }
  void Erase(const Slice& key) override {
    
    const uint32_t hash = HashSlice(key);
    shard_[Shard(hash)].Erase(key, hash);
  }
  void* Value(Handle* handle) override {
    
    return reinterpret_cast<LRUHandle*>(handle)->value;
  }
  uint64_t NewId() override {
    
    MutexLock l(&id_mutex_);
    return ++(last_id_);
  }
  void Prune() override {
    
    for (int s = 0; s < kNumShards; s++) {
    
      shard_[s].Prune();
    }
  }
  size_t TotalCharge() const override {
    
    size_t total = 0;
    for (int s = 0; s < kNumShards; s++) {
    
      total += shard_[s].TotalCharge();
    }
    return total;
  }
};

哪里用到Cache

LevelDB的缓存使用在两个地方：

缓存SSTable里的Data Block；
缓存SSTable在内存中的数据结构Table。

回顾

为什么要设置LRUHandle

为什么要设置LRUHandle，而不是直接在LRUCache中操作键值对：

LRUHandle实际是封装了键值对的链表结点+哈希表结点，可以方便地在链表间移动，或者插入到哈希表中；
保存了一个缓存项的一些元数据，如引用计数、占用空间、哈希值等。

引用计数`refs`增减的时机

refs增加：

新创建的handle，如果被插入LRUCache中，refs设为2，否则设为1并返回；
handle被查找时引用计数自增。

refs减少：

移除的收尾工作FinishErase中refs--；
调用Release释放时，计数减一；
LRUCache销毁时，计数减一。

为什么要用`deleter`

应该将清理传入的键值对的工作交给调用者提供的回调函数，因为LRUCache不知道value的类型，也不清楚要释放哪些空间，如下例：

struct TableAndFile {
    
  RandomAccessFile* file;
  Table* table;
};

// DeleteEntry是一个deleter.
static void DeleteEntry(const Slice& key, void* value) {
    
  TableAndFile* tf = reinterpret_cast<TableAndFile*>(value);
  delete tf->table;
  delete tf->file;
  delete tf;
}

原网站

版权声明
本文为[Ethan97]所创，转载请带上原文链接，感谢
https://blog.csdn.net/weixin_43116322/article/details/125269114

当前位置：网站首页>LevelDB源码分析之LRU Cache

LevelDB源码分析之LRU Cache

前言

Cache类

`LRUHandle`结构体

HandleTable类

`HandleTable`构造函数

`HandleTable`查找

HandleTable插入

`HandleTable`删除

`LRUCache`类

`LRUCache`的成员变量

`Ref`

`Unref`

`Insert`

`Lookup`

`Release`

`Erase`

`ShardedLRUCache`

哪里用到Cache

回顾

为什么要设置LRUHandle

引用计数`refs`增减的时机

为什么要用`deleter`

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>LevelDB源码分析之LRU Cache

LevelDB源码分析之LRU Cache

前言

Cache类

LRUHandle结构体

HandleTable类

HandleTable构造函数

HandleTable查找

HandleTable插入

HandleTable删除

LRUCache类

LRUCache的成员变量

Ref

Unref

Insert

Lookup

Release

Erase

ShardedLRUCache

哪里用到Cache

回顾

为什么要设置LRUHandle

引用计数refs增减的时机

为什么要用deleter

边栏推荐

猜你喜欢

随机推荐

`LRUHandle`结构体

`HandleTable`构造函数

`HandleTable`查找

`HandleTable`删除

`LRUCache`类

`LRUCache`的成员变量

`Ref`

`Unref`

`Insert`

`Lookup`

`Release`

`Erase`

`ShardedLRUCache`

引用计数`refs`增减的时机

为什么要用`deleter`