当前位置:网站首页>LRU cache for leveldb source code analysis
LRU cache for leveldb source code analysis
2022-07-01 05:17:00 【Ethan97】
Preface
Because the random access speed of memory is several orders of magnitude faster than that of disk , You can use memory to cache the data obtained by accessing the hard disk , When the same data is accessed again , No more disk requests IO, But directly from Cache Get data in . And according to the principle of locality : The information to be used in the near future is likely to be close to the information being used in terms of spatial address , You can read more data than you need when you read the disk , These data may be needed in the near future .
LRU Full name Least Recently Used, This means that the least recently used cache entries will be swapped out .LRUCache Each cache item will be referenced , It maintains two linked lists :
in_use_, Contains the cache entries being used ( namelyrefs >= 2 && in_cache == true);lru_, Contains unused cache entries ( namelyrefs == 1 && in_cache == true).
Its working mechanism is as follows :
- When a new cache entry enters
LRUCachewhen , Increase its reference countrefs++, And insert it intoin_use_The head of the chain ; - When a cache item is no longer externally referenced , namely
refs == 1(1 Represents only byLRUCachequote ) when , Take it fromin_use_List removal , And insert intolru_The tail of the list . - When
LRUCacheWhen the total data size of the cache is larger than its rated capacity , It will be taken fromlru_Take out the linked list Oldest cache entry ( be locatedlru_The head of the linked list ), Eliminated , And load the new cache entry .
LRUCache It also contains a hash table , It is used to quickly index to the corresponding cache item by key .
Cache class
After analyzing the implementation class LRUCache front , First analysis LevelDB The cache interface of .include/leveldb/cache.h The contents are as follows :
namespace leveldb {
class LEVELDB_EXPORT Cache;
LEVELDB_EXPORT Cache* NewLRUCache(size_t capacity);
class LEVELDB_EXPORT Cache {
public:
Cache() = default;
Cache(const Cache&) = delete;
Cache& operator=(const Cache&) = delete;
// call deleter Destroy all entry.
virtual ~Cache();
// cache Medium entry The handle of .
struct Handle {
};
// Insert Insert a key value pair mapping into cache in , And give corresponding “ cost ”charge.
// Insert Returns the corresponding handle. When the caller is no longer in use, he must call Release(handle) Method .
// When this is no longer needed entry when , Key value pairs are passed to deleter.
virtual Handle* Insert(const Slice& key, void* value, size_t charge,
void (*deleter)(const Slice& key, void* value)) = 0;
// Lookup Return the corresponding key Of handle, If not, return nullptr.
// When the caller is no longer in use, he must call Release(handle) Method .
virtual Handle* Lookup(const Slice& key) = 0;
// Release Release one mapping.
// handle Must have not been released , And the handle Returned by this object .
virtual void Release(Handle* handle) = 0;
// Value encapsulation Lookup() Back to handle And return it .
// handle Must have not been released , And the handle Returned by this object .
virtual void* Value(Handle* handle) = 0;
// Erase Will a entry delete . Only in all the associated handle It will not be deleted until it is released .
virtual void Erase(const Slice& key) = 0;
// NewId Back to a new id. The id It could be multiple client Use , They share the same cache To differentiate key space.
// Usually client A new... Will be assigned at startup id, And its id Put it in the cache key .
virtual uint64_t NewId() = 0;
// Prune Remove all that are not in use entry.
virtual void Prune() {
}
// TotalCharge Return total charge Estimation .
virtual size_t TotalCharge() const = 0;
};
} // namespace leveldb
adopt Cache The method of class can roughly understand how to use a cache. The main methods are :
- adopt
Inserttake entry Put it in cache in ; - adopt
Lookuplookup cache One of them entry; - When you no longer use a entry when , call
ReleaseMethod , Indicates that the entry No longer used by the caller .
LRUHandle Structure
LRUHandle yes LRUCache The basic unit of management , It allocates memory space on the heap .
struct LRUHandle {
void* value;
void (*deleter)(const Slice&, void* value);
LRUHandle* next_hash; // Linked list pointer in hash table
LRUHandle* next; // The pointer of a two-way linked list
LRUHandle* prev;
size_t charge; // TODO(opt): Only allow uint32_t?
size_t key_length;
bool in_cache; // Whether entry is in the cache.
uint32_t refs; // References, including cache reference, if present.
uint32_t hash; // Hash of key(); used for fast sharding and comparisons
char key_data[1]; // Beginning of key
Slice key() const {
// next is only equal to this if the LRU handle is the list head of an
// empty list. List heads never have meaningful keys.
assert(next != this);
return Slice(key_data, key_length);
}
};
HandleTable class
HandleTable yes LevelDB Own implementation of the hash table , It has been LRUCache Use .HandleTable The implementation is a very simple hash table , It is different from C++ The main aspect of the built-in hash table is , When Number of linked list nodes > Barrel array length when ,HandleTable Start capacity expansion .
HandleTable Constructors
HandleTable The main member variables are :
// The table consists of an array of buckets where each bucket is
// a linked list of cache entries that hash into the bucket.
uint32_t length_; // LRUHandle* The length of the array
uint32_t elems_; // Number of nodes in the table
LRUHandle** list_;
It can be seen from the notes , The hash table contains an array of buckets , Each bucket is a cache entry Linked list .HandleTable The constructor of calls the expansion directly Resize Method :
HandleTable() : length_(0), elems_(0), list_(nullptr) {
Resize(); }
Resize Method is the bottom layer of the hash table LRUHandle The pointer array allocates space .Resize The code is as follows :
void Resize() {
uint32_t new_length = 4;
// Find out what can hold elems_ Space required for elements .
while (new_length < elems_) {
new_length *= 2;
}
// The assignment points to LRUHandle Pointer array of .
LRUHandle** new_list = new LRUHandle*[new_length];
memset(new_list, 0, sizeof(new_list[0]) * new_length);
uint32_t count = 0;
// Traverse the old underlying array .
for (uint32_t i = 0; i < length_; i++) {
// h Of the old array LRUHandle The pointer .
LRUHandle* h = list_[i];
// Traverse the positions in the old table i The linked list of .
while (h != nullptr) {
LRUHandle* next = h->next_hash;
uint32_t hash = h->hash;
// ptr Point to where the element should be placed .
LRUHandle** ptr = &new_list[hash & (new_length - 1)];
// The first interpolation , Insert the linked list node of the old table into the linked list of the new table .
h->next_hash = *ptr;
*ptr = h;
h = next;
count++;
}
}
assert(elems_ == count);
delete[] list_;
list_ = new_list;
length_ = new_length;
}
Through the above methods, we can see ,HandleTable Zipper method is used , All nodes with hash conflicts are linked together . When expansion is needed ( That is to call Resize), Create a new LRUHandle* Array , And put all the pointers in the old table in the corresponding positions of the new table .
HandleTable lookup
First of all, we will introduce how to find entry Of FindPointer Method :
// FindPointer Return contains the given key/hash The node of . If there is no corresponding node , Then it returns the position where it should be inserted .
LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
// Locate the corresponding slot according to the hash value .
LRUHandle** ptr = &list_[hash & (length_ - 1)];
// Traverse the linked list of this slot , Until you find or reach the end of the linked list .
while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
ptr = &(*ptr)->next_hash;
}
return ptr;
}
The schematic diagram is as follows :
With FindPointer, You can easily realize the search function :
LRUHandle* Lookup(const Slice& key, uint32_t hash) {
return *FindPointer(key, hash);
}
HandleTable Insert
Insert Will a handle Insert into hash table . The code is as follows :
LRUHandle* Insert(LRUHandle* h) {
LRUHandle** ptr = FindPointer(h->key(), h->hash);
LRUHandle* old = *ptr;
// take h Of next_hash Pointer to ptr Location handle Of next_hash.
h->next_hash = (old == nullptr ? nullptr : old->next_hash);
// Point the pointer at the end of the linked list to this insertion node .
*ptr = h;
// If it is a new node ( That is, it does not overwrite existing nodes ), Count .
// Determine whether capacity expansion is needed .
if (old == nullptr) {
++elems_;
if (elems_ > length_) {
// As long as the number of nodes in the hash table is greater than the length of the bucket array , We'll start expanding .
Resize();
}
}
// Return to the old linked list node .
return old;
}
HandleTable Delete
Remove Remove key value pairs from the hash table , It removes the linked list node corresponding to the key value pair .
LRUHandle* Remove(const Slice& key, uint32_t hash) {
// Find a way to LRUHandle* The pointer to .
LRUHandle** ptr = FindPointer(key, hash);
LRUHandle* result = *ptr;
if (result != nullptr) {
// Directly remove the node from the linked list .
*ptr = result->next_hash;
--elems_;
}
return result;
}
LRUCache class
LRUCache Member variables of
I understand LRUCache It is easy to understand the working mechanism of :
// Initialized before use.
size_t capacity_; // LRUCache The total capacity
// mutex_ protects the following state.
mutable port::Mutex mutex_;
size_t usage_ GUARDED_BY(mutex_); // The amount of capacity used by cache entries
// Dummy head of LRU list.
// lru.prev is newest entry, lru.next is oldest entry.
// Entries have refs==1 and in_cache==true.
LRUHandle lru_ GUARDED_BY(mutex_); // stay LRUCache Cache Necklace header dummy node in but not referenced
// Dummy head of in-use list.
// Entries are in use by clients, and have refs >= 2 and in_cache==true.
LRUHandle in_use_ GUARDED_BY(mutex_); // The cache Necklace Chain header dummy node being used
HandleTable table_ GUARDED_BY(mutex_); // Hash table used to find cache entries
Ref
call Ref Method represents a handle reference .Ref This may involve moving nodes from lru_ Move to in_use_.
void LRUCache::Ref(LRUHandle* e) {
if (e->refs == 1 && e->in_cache) {
// If on lru_ list, move to in_use_ list.
// If refs == 1 And the cache item is in LRUCache in , The cache item is in the idle linked list lru_ in , Put it in in_use_ Linked list .
LRU_Remove(e);
LRU_Append(&in_use_, e);
}
e->refs++;
}
Unref
Unref Dereference a cache entry , And when no one quotes handle Space release .Unref This may involve moving nodes from in_use_ Move to lru_.
void LRUCache::Unref(LRUHandle* e) {
assert(e->refs > 0);
e->refs--;
if (e->refs == 0) {
// No one is referencing this cache entry anymore , Release it .
assert(!e->in_cache); // in_cache The setting of is completed by the caller .
(*e->deleter)(e->key(), e->value); // Callback deleter.
free(e); // Free heap space .
} else if (e->in_cache && e->refs == 1) {
// except LRUCache No one is referencing this cache entry , Insert the node into the idle linked list lru_.
LRU_Remove(e);
LRU_Append(&lru_, e);
}
}
Insert
Insert A cache entry will be created LRUHandle, And try to add it to LRUCache in .
Cache::Handle* LRUCache::Insert(const Slice& key, uint32_t hash, void* value,
size_t charge,
void (*deleter)(const Slice& key,
void* value)) {
MutexLock l(&mutex_);
// Allocate one on the heap LRUHandle, Pay attention to its space size ,-1 Because e->key_data One has been occupied byte.
LRUHandle* e =
reinterpret_cast<LRUHandle*>(malloc(sizeof(LRUHandle) - 1 + key.size()));
e->value = value;
e->deleter = deleter;
e->charge = charge;
e->key_length = key.size();
e->hash = hash;
e->in_cache = false;
e->refs = 1; // here refs = 1 It's because in the end handle return .
// Append key to end .
std::memcpy(e->key_data, key.data(), key.size());
if (capacity_ > 0) {
e->refs++; // here refs++ Because LRUCache Quoted it .
e->in_cache = true;
LRU_Append(&in_use_, e); // Append to in_use_ The head of the linked list .
usage_ += charge; // At this time usage_ May exceed the total capacity .
// Insert The node e Insert insert hash table , And return the replaced node ( The situation covered , namely LRUCache This key already exists in ) perhaps nullptr.
// FinishErase
FinishErase(table_.Insert(e));
} else {
// There is no capacity , No caching .
// Yes next To initialize , because key() Will read next.
e->next = nullptr;
}
// When the consumption exceeds the total capacity , Constantly from lru_ Remove node , Until the dosage is within the capacity range , Or all idle nodes have been removed .
while (usage_ > capacity_ && lru_.next != &lru_) {
LRUHandle* old = lru_.next;
assert(old->refs == 1);
bool erased = FinishErase(table_.Remove(old->key(), old->hash));
if (!erased) {
// to avoid unused variable when compiled NDEBUG
assert(erased);
}
}
// Return to newly created handle.
return reinterpret_cast<Cache::Handle*>(e);
}
FinishErase Completed the closeout of the removal :
// from LRUCache Remove e, And back to e Is it nullptr.
bool LRUCache::FinishErase(LRUHandle* e) {
if (e != nullptr) {
assert(e->in_cache);
// Remove the node from the linked list .
LRU_Remove(e);
// This cache entry will no longer be in LRUCache in .
e->in_cache = false;
usage_ -= e->charge;
Unref(e); // If no one references this node , It will be Unref Function .
}
return e != nullptr;
}
Lookup
To find the way Lookup The implementation of is very simple , Just in LRUCache Find the corresponding... In the hash table of handle that will do .
Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) {
MutexLock l(&mutex_);
LRUHandle* e = table_.Lookup(key, hash);
if (e != nullptr) {
Ref(e); // Increase the reference count .
}
return reinterpret_cast<Cache::Handle*>(e);
}
Release
Because only one cache item can be released when no one references it , therefore Release Simply call Unref Method .
void LRUCache::Release(Cache::Handle* handle) {
MutexLock l(&mutex_);
Unref(reinterpret_cast<LRUHandle*>(handle));
}
Erase
Erase Remove the corresponding key from the hash table handle, And do some finishing work .
void LRUCache::Erase(const Slice& key, uint32_t hash) {
MutexLock l(&mutex_);
FinishErase(table_.Remove(key, hash));
}
ShardedLRUCache
because LRUCache Only one thread can be allowed to operate at a time , Performance bottlenecks may occur in multithreaded environments , This leads to ShardedLRUCache, It supports simultaneous access by multiple threads Cache.ShardedLRUCache Just to Cache The work is delegated to multiple LRUCache One of , In particular , adopt key The high hash value of determines which LRUCache.
static const int kNumShardBits = 4;
static const int kNumShards = 1 << kNumShardBits;
class ShardedLRUCache : public Cache {
private:
LRUCache shard_[kNumShards];
port::Mutex id_mutex_;
uint64_t last_id_;
static inline uint32_t HashSlice(const Slice& s) {
return Hash(s.data(), s.size(), 0);
}
// Use the high order of the hash value to map to a shard.
static uint32_t Shard(uint32_t hash) {
return hash >> (32 - kNumShardBits); }
public:
explicit ShardedLRUCache(size_t capacity) : last_id_(0) {
// here capacity + (kNumShards - 1) In order to avoid the actual capacity shortage caused by rounding error .
const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;
// Create each LRUCache.
for (int s = 0; s < kNumShards; s++) {
shard_[s].SetCapacity(per_shard);
}
}
~ShardedLRUCache() override {
}
Handle* Insert(const Slice& key, void* value, size_t charge,
void (*deleter)(const Slice& key, void* value)) override {
const uint32_t hash = HashSlice(key);
return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);
}
Handle* Lookup(const Slice& key) override {
const uint32_t hash = HashSlice(key);
return shard_[Shard(hash)].Lookup(key, hash);
}
void Release(Handle* handle) override {
LRUHandle* h = reinterpret_cast<LRUHandle*>(handle);
shard_[Shard(h->hash)].Release(handle);
}
void Erase(const Slice& key) override {
const uint32_t hash = HashSlice(key);
shard_[Shard(hash)].Erase(key, hash);
}
void* Value(Handle* handle) override {
return reinterpret_cast<LRUHandle*>(handle)->value;
}
uint64_t NewId() override {
MutexLock l(&id_mutex_);
return ++(last_id_);
}
void Prune() override {
for (int s = 0; s < kNumShards; s++) {
shard_[s].Prune();
}
}
size_t TotalCharge() const override {
size_t total = 0;
for (int s = 0; s < kNumShards; s++) {
total += shard_[s].TotalCharge();
}
return total;
}
};
Where to use Cache
LevelDB The cache of is used in two places :
- cache
SSTableInside Data Block; - cache
SSTableData structure in memory Table.
review
Why set up LRUHandle
Why set up LRUHandle, Not directly LRUCache Middle operation key value pair :
LRUHandleIt is actually a linked list node that encapsulates key value pairs + Hash table node , It can easily move between linked lists , Or insert into the hash table ;- Save some metadata of a cache item , Such as reference count 、 Occupancy space 、 Hash value, etc .
Reference count refs Timing of increase and decrease
refs increase :
- The newly created handle, If inserted
LRUCachein ,refsSet to 2, Otherwise, it is set to 1 And back to ; - handle The reference count increases automatically when it is found .
refs Reduce :
- Closeout work removed
FinishEraseinrefs--; - call
ReleaseOn release , Count minus one ; LRUCacheAt the time of destruction , Count minus one .
Why use deleter
The task of cleaning up the passed in key value pairs should be left to the callback function provided by the caller , because LRUCache I do not know! value The type of , It's not clear what space to free , Here's an example :
struct TableAndFile {
RandomAccessFile* file;
Table* table;
};
// DeleteEntry It's a deleter.
static void DeleteEntry(const Slice& key, void* value) {
TableAndFile* tf = reinterpret_cast<TableAndFile*>(value);
delete tf->table;
delete tf->file;
delete tf;
}
边栏推荐
- 液压滑环的特点讲解
- Unity drags and modifies scene camera parameters under the editor
- Design experience of Meizhou clinical laboratory
- Global and Chinese markets of superconductor 2022-2028: Research Report on technology, participants, trends, market size and share
- el-form表单新增表单项动态校验;el-form校验动态表单v-if不生效;
- 【暑期每日一题】洛谷 P5740【深基7.例9】最厉害的学生
- 担心侵权?必备无版权素材网站分享,不用担心视频剪辑缺素材
- LevelDB源码分析之LRU Cache
- Introduction of 3D Modeling and Processing Software Liu Ligang, Chinese University of Science and Technology
- [RootersCTF2019]babyWeb
猜你喜欢

实战:redux的基本使用

Detailed explanation of distributed global unique ID solution

Pytoch (III) -- function optimization

数字金额加逗号;js给数字加三位一逗号间隔的两种方法;js数据格式化
![[RootersCTF2019]babyWeb](/img/b4/aa8f8e107a9dacbace72d4717b1834.png)
[RootersCTF2019]babyWeb

智慧运维:基于 BIM 技术的可视化管理系统
![[NLP Li Hongyi] notes](/img/8e/a51ca5eee638facd54270fb28d2fce.jpg)
[NLP Li Hongyi] notes

Daily question -leetcode1175- permutation of prime numbers - Mathematics

C WPF uses dockpanel to realize screenshot box

Intelligent operation and maintenance: visual management system based on BIM Technology
随机推荐
Receiving package install and uninstall events
FileOutPutStream
Pytoch (IV) -- visual tool visdom
在Rainbond中一键部署高可用 EMQX 集群
FileOutPutStream
Vérification simple de la lecture et de l'écriture de qdatastream
AcWing 887. Finding combinatorial number III (Lucas theorem)
LevelDB源码分析之memtable
複制寶貝提示材質不能為空,如何解决?
Pytoch (II) -- activation function, loss function and its gradient
如何开始学剪辑?零基础详细解析
Worried about infringement? Must share copyrightless materials on the website. Don't worry about the lack of materials for video clips
Design experience of Meizhou clinical laboratory
Tar command
LeetCode316-去除重复字母-栈-贪心-字符串
Distributed transactions - Solutions
[daily question in summer] function of rogu p3742 UMI
Variable binding and deconstruction for rudimentary rust
[daily question in summer] first time, second time, deal!
Is there any good website or software for learning programming? [introduction to programming]?