当前位置:网站首页>Memtable for leveldb source code analysis
Memtable for leveldb source code analysis
2022-07-01 05:18:00 【Ethan97】
Preface
LevelDB It's a google Very efficient kv database .
LevelDB Adopted LSM tree It achieves efficient write operation and good read performance .LSM tree namely The Log-Structured Merge-Tree, It fully considers the slow random read and write speed of the disk , As far as possible, it ensures that the key value is only written to the disk in sequence during the insertion process , Without performing seek 、 Expensive operations such as moving heads .LSM tree It mainly consists of two data structures , One is memory resident memtable, The second is saved to disk sstable.LSM tree Introduction to , See also introduction to LSM.
- LevelDB Maintain in memory
memtableRecord recent writes (kTypeValue) And delete operations (kTypeDeletion); - When in memory
memtableAfter reaching a certain size , It will be saved assstable.
In this paper, LevelDB Of memtable Realization .
memtable
memtable Maintenance recently occurred in LevelDB Write operations for . Its bottom layer adopts a kind of O(logn) Find and insert data structures with time complexity :Skip List, The following is a brief introduction to its concept .

Skip List There are multiple layers , At the bottom is a well ordered linked list .
Skip list Lookup
- As shown in the figure above , Suppose you want to find 9, When looking for , From the top , Find the first one that is just less than 9 The node of , Here for 1;
- Because of the node 1 There are no more nodes after , Go down one floor , Repeat the process , Arrival node 6;
- Go on down one floor , And then look forward to , arrive 9, That is, find the target node .
Skip list Insertion
Skip list The process of inserting is similar to that of searching , But in the process of finding the insertion location , You need to record the nodes you pass “ route ”, When the insertion position is found , Walk vertically to the ground floor , Insert the new node into . Then go back layer by layer “ Node path ”, The first i+1 The nodes of the layer have a fixed probability p Extend to the upper layer .
On average , Each node appears in 1/(1-p) In layers .
memtable Realization
MemTable Class provides the API
// Auto increment reference count .
void Ref() {
++refs_; }
// Drop reference count. Delete if no more references exist.
void Unref() {
--refs_;
assert(refs_ >= 0);
if (refs_ <= 0) {
delete this;
}
}
// Back here memtable About the amount of memory space used .
size_t ApproximateMemoryUsage();
// memtable The iterator .
// The caller needs to ensure that when the iterator is alive memtable Also alive .
// This iterator returns the encoded internal key (internal key).
Iterator* NewIterator();
// Go to memtable Add a key value pair to the , Specify the serial number SequenceNumber And type .
// The type is kTypeValue Indicates that the key value pair is inserted , The type is kTypeDeletion Indicates that the key value pair is deleted .
void Add(SequenceNumber seq, ValueType type, const Slice& key,
const Slice& value);
// According to the key from memtable Get value , The value will be saved in value variable , And back to true.
// If memtable Contains a key Delete tag for , Save one NotFound() To variable s And in return true.
// Other cases return to false.
bool Get(const LookupKey& key, std::string* value, Status* s);
MemTable The member variables of are as follows :
typedef SkipList<const char*, KeyComparator> Table;
// Key comparator
KeyComparator comparator_;
// Reference counter
int refs_;
// Memory allocator
Arena arena_;
// At the bottom skip list
Table table_;
Add
Add Will a entry Insert into memtable in .
One entry The memory layout of is as follows :

here key The type is Internal Key.LevelDB There are three key:
- Lookup Key
- Internal Key
- User Key
User Key It is directly used for reading and writing databases key, Usually use one Slice Express , Such as :
db->Put(leveldb::WriteOptions(), "hello", "LevelDB"); // here "hello" namely User Key.
Internal Key It is in User Key Two fields are expanded on the basis of .Lookup Key And in the Internal Key A field is added to the header of the . The relationship among them is shown in the figure below :

Add The code implementation is as follows :
void MemTable::Add(SequenceNumber s, ValueType type, const Slice& key,
const Slice& value) {
// Format of an entry is concatenation of:
// key_size : varint32 of internal_key.size()
// key bytes : char[internal_key.size()]
// tag : uint64((sequence << 8) | type)
// value_size : varint32 of value.size()
// value bytes : char[value.size()]
size_t key_size = key.size();
size_t val_size = value.size();
size_t internal_key_size = key_size + 8; // Internal Key The occupied memory size is User Key + 8 bytes, See the figure above for details
const size_t encoded_len = VarintLength(internal_key_size) +
internal_key_size + VarintLength(val_size) +
val_size;
// to entry Allocate space .
char* buf = arena_.Allocate(encoded_len);
// take key_size Write it here entry in .
char* p = EncodeVarint32(buf, internal_key_size);
// take key Write it here entry in .
std::memcpy(p, key.data(), key_size);
p += key_size;
// take tag Write it here entry in ,tag Low with serial number 56 Bit and operation type ( Insert / Delete , Octet ).
EncodeFixed64(p, (s << 8) | type);
p += 8;
// take val_size Write it here entry in .
p = EncodeVarint32(p, val_size);
// take value Write it here entry in .
std::memcpy(p, value.data(), val_size);
assert(p + val_size == buf + encoded_len);
// entry Build complete , Insert it into skip list in .
table_.Insert(buf);
}
Add Actually according to entry The memory layout of puts the fields in the allocated buf In memory space , Then call it. skip list Methods of member variables table_.Insert(buf) Will construct entry Insert into skip list in .
Get
Get According to the key LookupKey stay memtable Search for entry.
Get The code is as follows :
bool MemTable::Get(const LookupKey& key, std::string* value, Status* s) {
Slice memkey = key.memtable_key();
// Declared a skip list Iterator on .
Table::Iterator iter(&table_);
iter.Seek(memkey.data());
if (iter.Valid()) {
// entry format is:
// klength varint32
// userkey char[klength]
// tag uint64
// vlength varint32
// value char[vlength]
// Check that it belongs to the same user key.
// Do not check the serial number , because Seek() The sequence number is too large entry.
// Iterate over the next key .
const char* entry = iter.key();
// from entry Middle readout key_length.
uint32_t key_length;
// GetVarint32Ptr from entry read out key_length, And return the position of the pointer after reading , namely key The address of .
const char* key_ptr = GetVarint32Ptr(entry, entry + 5, &key_length);
// Yes entry Medium user key and lookup key Medium user key Compare .
if (comparator_.comparator.user_comparator()->Compare(
Slice(key_ptr, key_length - 8), key.user_key()) == 0) {
// Found the right one user key.
// Take it out and follow it user key After tag, Serial number and value type .
const uint64_t tag = DecodeFixed64(key_ptr + key_length - 8);
// Take out tag low 8 The value of a , namely value type.
switch (static_cast<ValueType>(tag & 0xff)) {
case kTypeValue: {
// key_ptr + key_length That is, the position of the value length field ,GetLengthPrefixedSlice from entry It takes out the value v.
Slice v = GetLengthPrefixedSlice(key_ptr + key_length);
// Will value v Assign to value After the return true.
value->assign(v.data(), v.size());
return true;
}
case kTypeDeletion:
// The key has been deleted , return not found.
*s = Status::NotFound(Slice());
return true;
}
}
}
return false;
}
take memtable writes sstable
When one memtable Beyond a certain size ( Default 4MB) when , It will create a new one memtable And log files , And direct future updates to this new memtable And log files .
Perform the following work in the background :
- Put the previous
memtableIt sayssstable; - Discard old
memtable; - Delete old log files and
memtable; - New
sstableAdd to level-0.
The call chain is as follows :
Put() -> Write() -> MakeRoomForWrite() -> MaybeScheduleCompaction() ->
env_->Schedule(&DBImpl::BGWork, this) Asynchronous start BGWork() -> BackgroundCall() ->
BackgroundCompaction() -> CompactMemTable()
Reference material
边栏推荐
- 0xc000007b应用程序无法正常启动解决方案(亲测有效)
- Rainbow combines neuvector to practice container safety management
- Causes of short circuit of conductive slip ring and Countermeasures
- Daily question -leetcode1175- permutation of prime numbers - Mathematics
- Application and principle of ThreadPoolExecutor thread pool
- Global and Chinese markets for business weather forecasting 2022-2028: Research Report on technology, participants, trends, market size and share
- QDataStream的簡單讀寫驗證
- One click deployment of highly available emqx clusters in rainbow
- Unit testing with mongodb
- 3D建模与处理软件简介 刘利刚 中国科技大学
猜你喜欢

Distributed architecture system splitting principles, requirements and microservice splitting steps

LeetCode522-最长特殊序列II-哈希表-字符串-双指针

Rainbow combines neuvector to practice container safety management

STM32 expansion board digital tube display

Leetcode522- longest special sequence ii- hash table - String - double pointer

eBPF Cilium实战(2) - 底层网络可观测性

实战:redux的基本使用

CockroachDB: The Resilient Geo-Distributed SQL Database 论文阅读笔记

Detailed explanation of distributed global unique ID solution

Go learning notes (5) basic types and declarations (4)
随机推荐
LevelDB源码分析之LRU Cache
What can the points mall Games bring to businesses? How to build a points mall?
[data recovery in North Asia] a data recovery case of raid crash caused by hard disk drop during data synchronization of hot spare disk of RAID5 disk array
Thread process foundation of JUC
Spanner 论文小结
【暑期每日一题】洛谷 P1568 赛跑
Global and Chinese market of mainboard 2022-2028: Research Report on technology, participants, trends, market size and share
C WPF uses dockpanel to realize screenshot box
每日一题-LeetCode1175-质数排列-数学
Use and principle of wait notify
Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster
Global and Chinese markets of Ethernet communication modules 2022-2028: Research Report on technology, participants, trends, market size and share
担心侵权?必备无版权素材网站分享,不用担心视频剪辑缺素材
如何选择导电滑环材料
Introduction to 3D modeling and processing software Liu Ligang University of science and technology of China
Global and Chinese market of enterprise wireless LAN 2022-2028: Research Report on technology, participants, trends, market size and share
Unit testing with mongodb
Global and Chinese market of solder wire 2022-2028: Research Report on technology, participants, trends, market size and share
el-form表单新增表单项动态校验;el-form校验动态表单v-if不生效;
LeetCode316-去除重复字母-栈-贪心-字符串