当前位置:网站首页>Project 1 - buffer pool [cmu 15-445645] notes
Project 1 - buffer pool [cmu 15-445645] notes
2022-06-29 23:11:00 【Learning is so boring】
PROJECT #1 - BUFFER POOL 15-445/645 note

Because it is impossible to store all the blocks in main memory , We need to manage the allocation of free space in main memory for storage blocks . buffer Is the part of main memory used to store copies of disk data blocks .
Buffer manager
When a program in a database system needs blocks on disk , It makes a request to the buffer manager ( That is to call ). If the block is already in the buffer , Then the buffer manager passes the address of this block in main memory to the requester . If the block is not in the buffer , The buffer manager first allocates space for this block in the buffer , if necessary , Some other blocks may be moved out of main memory , Make room for this new block . A removed block is written back to disk only if it has been modified since it was last written back to disk . then , The buffer manager reads the requested block from the disk into the buffer , And pass the address of this block in main memory to the requester . The internal actions of the buffer manager are transparent to the program that makes the disk block request .
If you are familiar with the concept of operating system , You will find that the buffer manager looks no different from the virtual storage manager in most operating systems . One difference between them is that the size of the database will be larger than the hardware address space of the machine , Therefore, the memory address is not enough to address all disk blocks . Besides , In order to better serve the database system , Buffer managers must use technologies that are more complex than typical virtual storage management schemes .
TASK #1 - LRU REPLACEMENT POLICY Buffer replacement policy
Realization LRUReplacer
LRUReplacer Maximum frame The number is the same as the size of the buffer pool , Because it contains BufferPoolManager Placeholder of all frames , But not all frames ( Occupancy of ) All in LRUReplacer,LRUReplacer It is empty during initialization , Only the new unpin Of frame Will be in LRUReplacer
Victim(frame_id_t*):UseLRUprogramme ( Least recently used solution ), And the least recently accessed block is written back to disk , And remove from the buffer . Put itframeStored in parameters ( The ginseng ), IfReplacerIt's empty , returnfalsePin(frame_id_t):takeframeFixed toBufferPoolManagerIn the after , Call this method , He should start fromLRUReplacerRemove the correspondingframeIf the Pin It means that you are reading , So it cannot be removed , therefore
lru_cache.erase(lru_map[frame_id]); lru_map.erase(frame_id);Unpin(frame_id_t): When a certainframeOfpin_countbecome 0 when , callUnpin, This function should be directed toLRUReplaceradd toframeWhen the process finishes reading data , Will perform
Unpinoperation , It is allowed to change the block to remove , So you need to add toLRUReplacer, Multiple processes can read data from a block in the buffer , Each process is required to execute before accessing datapinoperation , Execute... Until finishedunpinoperation , All processes executeUnpinAfter the operation , ThisframeCan be removed , So we need to usepin_countCount , Ensure that the above operations are realizedSize(): returnLRUReplacerOfsize
Since thread safety must be ensured , So use a lock , because unpin call size, As I thought before , Not sure Size() Whether it will call... In other places , So both functions must be locked , If you use std::mutex Can cause deadlock , So recursive lock is used std::recursive_mutex, Solve this problem .
std::recursive_mutex: The same thread is allowed to lock the same instance of a mutex multiple times . Sometimes during certain operations , A public function needs to call another public function . under these circumstances , The latter will also try to lock the mutex , If the std::mutex This leads to undefined behavior . Using recursive mutual exclusion instead of common mutual exclusion to solve .
lru_replacer.h
#pragma once
#include <list>
#include <mutex> // NOLINT
#include <unordered_map>
#include <vector>
#include "buffer/replacer.h"
#include "common/config.h"
namespace bustub {
/** * LRUReplacer implements the Least Recently Used replacement policy. */
class LRUReplacer : public Replacer {
public:
/** * Create a new LRUReplacer. * @param num_pages the maximum number of pages the LRUReplacer will be required to store */
explicit LRUReplacer(size_t num_pages);
/** * Destroys the LRUReplacer. */
~LRUReplacer() override;
bool Victim(frame_id_t *frame_id) override;
void Pin(frame_id_t frame_id) override;
void Unpin(frame_id_t frame_id) override;
size_t Size() override;
private:
std::recursive_mutex lru_lock; // Recursive lock
std::list<frame_id_t> lru_cache;
std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> lru_map;
size_t capacity;
};
} // namespace bustub
lru_replacer.cpp
#include "buffer/lru_replacer.h"
namespace bustub {
LRUReplacer::LRUReplacer(size_t num_pages):capacity(num_pages) {
};
LRUReplacer::~LRUReplacer() = default;
bool LRUReplacer::Victim(frame_id_t *frame_id) {
std::lock_guard latch(lru_lock);
if (lru_map.empty()) return false;
frame_id_t frame = lru_cache.back(); // Return the page of drug sacrifice , And from cache Delete
lru_cache.pop_back();
lru_map.erase(frame);
*frame_id = frame;
return true;
}
void LRUReplacer::Pin(frame_id_t frame_id) {
std::lock_guard latch(lru_lock);
if (lru_map.count(frame_id) != 0) {
lru_cache.erase(lru_map[frame_id]); // from cache Delete in , So it won't be deleted , Equivalent to nailing
lru_map.erase(frame_id);
}
}
void LRUReplacer::Unpin(frame_id_t frame_id) {
// Add to cache in , frame Can be deleted
std::lock_guard latch(lru_lock);
if (lru_map.count(frame_id) != 0) return;
while (this->Size() >= capacity) {
// Normally this would not happen , After writing, refer to other people's , I found that many people wrote this , And added ...
frame_id_t del_frame = lru_cache.front();
lru_cache.pop_front();
lru_map.erase(del_frame);
}
lru_cache.push_front(frame_id);
lru_map[frame_id] = lru_cache.begin();
}
size_t LRUReplacer::Size() {
std::lock_guard latch(lru_lock);
return lru_cache.size();
}
} // namespace bustub

TASK #2 - BUFFER POOL MANAGER INSTANCE Buffer pool manager instance
Next , Buffer pool manager is implemented in (BufferPoolManagerInstance).BufferPoolManagerInstance In charge of from DiskManager Get the database page and store it in memory .BufferPoolManagerInstance Explicitly indicates when a dirty page is written to disk , Or when it needs to evict a page to make room for a new page , Write out dirty pages to disk .
All in memory pages in the system are controlled by Page Objects represent .BufferPoolManagerInstance You don't need to know the content of these pages . however , As a system developer , You must understand Page Objects are simply containers of memory in the buffer pool , Therefore, it is not specific to a unique page . in other words , Every Page Objects contain a block of memory ,DiskManager This memory block will be used as the location to copy it to read from disk The physical page of The content of .BufferPoolManagerInstance Will reuse the same Page object , To store data as it moves back and forth to disk . This means that throughout the life cycle of the system , same Page Objects may contain different physical pages .Page Object identifier (page_id) Track the physical pages it contains ; If Page Object does not contain a physical page , Then it has to be page_id Set to INVALID_PAGE_ID.
Every Page Object also maintains a counter , Used to indicate that “ Fix ” The number of threads on the page . Don't allow BufferPoolManagerInstance Release fixed Page. Every Page The object also tracks whether it is dirty . Your job is to record whether the page has been modified before undocking .BufferPoolManagerInstance Dirt must be removed Page The contents of are written back to disk , Then you can reuse the object .
BufferPoolManagerInstance The implementation will use the... That you created in the previous steps of this assignment LRUReplacer class . It will use LRUReplacer To track when to access Page object , So that it can decide what to do when it has to free up frames to make room to copy new physical pages from disk y Which object to remove .
** First , Let's start with a few concepts : **
page: The database needs to send files into many small parts , So the file is made up ofpageComposed of ,pageThat is, the smallest unit to read data , In this experiment, eachpageThe size is4kframe: The space in memory , andpageIt's the same size , Used to place... Read from diskpage, OnepageCan be placed in any emptyframein ,When
BUFFER POOL MANAGERneedframeStore newpageWhen , First, we will queryfree_list( Store idleframe) Are there any freeframe, If there is , willpageSave this freeframe, If there is no freeframe, It will start fromReplacerFind one of themunpinOfframe, If it's dirty , Write back to disk , otherwise , NewpageRead thisframein , IfReplacerAlso cannot return a replaceable ( sacrifice ) Offrame, Direct return error
Concrete realization , You can see comments and code
buffer_pool_manager_instance.cpp
#include "buffer/parallel_buffer_pool_manager.h"
namespace bustub {
ParallelBufferPoolManager::ParallelBufferPoolManager(size_t num_instances, size_t pool_size, DiskManager *disk_manager,
LogManager *log_manager)
: num_instances_(num_instances), pool_size_(pool_size), start_index_(0) {
// Allocate and create individual BufferPoolManagerInstances
for (size_t i = 0; i < num_instances_; i++) {
parallel_mannager_instances.emplace_back(
new BufferPoolManagerInstance(pool_size_, num_instances_, i, disk_manager, log_manager));
}
}
// Update constructor to destruct all BufferPoolManagerInstances and deallocate any associated memory
ParallelBufferPoolManager::~ParallelBufferPoolManager() {
for (auto instance_pointer : parallel_mannager_instances) {
delete instance_pointer;
}
};
size_t ParallelBufferPoolManager::GetPoolSize() {
// Get size of all BufferPoolManagerInstances
size_t all_buffer_size{
0};
for (auto instance_pointer : parallel_mannager_instances) {
all_buffer_size += instance_pointer->GetPoolSize();
}
return all_buffer_size;
}
BufferPoolManager *ParallelBufferPoolManager::GetBufferPoolManager(page_id_t page_id) {
// Get BufferPoolManager responsible for handling given page id. You can use this method in your other methods.
size_t index = page_id % num_instances_;
return parallel_mannager_instances[index];
}
Page *ParallelBufferPoolManager::FetchPgImp(page_id_t page_id) {
// Fetch page for page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->FetchPage(page_id);
}
bool ParallelBufferPoolManager::UnpinPgImp(page_id_t page_id, bool is_dirty) {
// Unpin page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->UnpinPage(page_id, is_dirty);
}
bool ParallelBufferPoolManager::FlushPgImp(page_id_t page_id) {
// Flush page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->FlushPage(page_id);
}
Page *ParallelBufferPoolManager::NewPgImp(page_id_t *page_id) {
// create new page. We will request page allocation in a round robin manner from the underlying
// BufferPoolManagerInstances
// 1. From a starting index of the BPMIs, call NewPageImpl until either 1) success and return 2) looped around to
// starting index and return nullptr
Page *p{
nullptr};
size_t index{
start_index_ % num_instances_};
size_t start(index);
start_index_++;
while (true) {
p = parallel_mannager_instances[index]->NewPage(page_id);
if (p != nullptr) return p;
index = (index + 1) % num_instances_;
if (index == start) return nullptr;
}
// 2. Bump the starting index (mod number of instances) to start search at a different BPMI each time this function
// is called
return nullptr;
}
bool ParallelBufferPoolManager::DeletePgImp(page_id_t page_id) {
// Delete page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->DeletePage(page_id);
}
void ParallelBufferPoolManager::FlushAllPgsImp() {
// flush all pages from all BufferPoolManagerInstances
for (auto instance_pointer : parallel_mannager_instances) {
instance_pointer->FlushAllPages();
}
}
} // namespace bustub

TASK #3 - PARALLEL BUFFER POOL MANAGER Parallel task buffer
As you may have noticed in the previous task , A single buffer pool manager instance requires a latch to achieve thread safety . This can lead to a lot of contention , Because each thread competes on a single latch when interacting with the buffer pool . One potential solution is to set up multiple buffer pools in the system , Each buffer pool has its own latch .
ParallelBufferPoolManager It's one containing more than one BufferPoolManagerInstances Class . For each operation ,ParallelBufferPoolManager Select a BufferPoolManagerInstance And delegate to the instance .
We use a given page ID To determine the specific BufferPoolManagerInstance. If we have many BufferPoolManagerInstances num_instances So we need some way to put a given page ID Mapping to [0, num_instances) Number in range . For this project , We will use the modulo operator ,page_id mod num_instances Will be given page_id Map to the correct scope .
When ParallelBufferPoolManager When instantiating for the first time , Its starting index should be 0. Every time you create a new page , You will try every BufferPoolManagerInstance, Start with the starting index , Until a successful . Then increase the starting index 1.
Make sure you create a single BufferPoolManagerInstance when , Use uint32_t num_instances and uint32_t instance_index Constructor for , So that the page can be created correctly ID.
paraller_buffer_pool_manager.h
std::vector<BufferPoolManagerInstance*> parallel_mannager_instances;
size_t num_instances_;
size_t pool_size_;
size_t start_index_;
paraller_buffer_pool_manager.cpp
#include "buffer/parallel_buffer_pool_manager.h"
namespace bustub {
ParallelBufferPoolManager::ParallelBufferPoolManager(size_t num_instances, size_t pool_size, DiskManager *disk_manager,
LogManager *log_manager)
: num_instances_(num_instances), pool_size_(pool_size), start_index_(0) {
// Allocate and create individual BufferPoolManagerInstances
for (size_t i = 0; i < num_instances_; i++) {
parallel_mannager_instances.emplace_back(
new BufferPoolManagerInstance(pool_size_, num_instances_, i, disk_manager, log_manager));
}
}
// Update constructor to destruct all BufferPoolManagerInstances and deallocate any associated memory
ParallelBufferPoolManager::~ParallelBufferPoolManager() {
for (auto instance_pointer : parallel_mannager_instances) {
delete instance_pointer;
}
};
size_t ParallelBufferPoolManager::GetPoolSize() {
// Get size of all BufferPoolManagerInstances
size_t all_buffer_size{
0};
for (auto instance_pointer : parallel_mannager_instances) {
all_buffer_size += instance_pointer->GetPoolSize();
}
return all_buffer_size;
}
BufferPoolManager *ParallelBufferPoolManager::GetBufferPoolManager(page_id_t page_id) {
// Get BufferPoolManager responsible for handling given page id. You can use this method in your other methods.
size_t index = page_id % num_instances_;
return parallel_mannager_instances[index];
}
Page *ParallelBufferPoolManager::FetchPgImp(page_id_t page_id) {
// Fetch page for page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->FetchPage(page_id);
}
bool ParallelBufferPoolManager::UnpinPgImp(page_id_t page_id, bool is_dirty) {
// Unpin page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->UnpinPage(page_id, is_dirty);
}
bool ParallelBufferPoolManager::FlushPgImp(page_id_t page_id) {
// Flush page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->FlushPage(page_id);
}
Page *ParallelBufferPoolManager::NewPgImp(page_id_t *page_id) {
// create new page. We will request page allocation in a round robin manner from the underlying
// BufferPoolManagerInstances
// 1. From a starting index of the BPMIs, call NewPageImpl until either 1) success and return 2) looped around to
// starting index and return nullptr
Page *p{
nullptr};
size_t index{
start_index_ % num_instances_};
size_t start(index);
start_index_++;
while (true) {
p = parallel_mannager_instances[index]->NewPage(page_id);
if (p != nullptr) return p;
index = (index + 1) % num_instances_;
if (index == start) return nullptr;
}
// 2. Bump the starting index (mod number of instances) to start search at a different BPMI each time this function
// is called
return nullptr;
}
bool ParallelBufferPoolManager::DeletePgImp(page_id_t page_id) {
// Delete page_id from responsible BufferPoolManagerInstance
return GetBufferPoolManager(page_id)->DeletePage(page_id);
}
void ParallelBufferPoolManager::FlushAllPgsImp() {
// flush all pages from all BufferPoolManagerInstances
for (auto instance_pointer : parallel_mannager_instances) {
instance_pointer->FlushAllPages();
}
}
} // namespace bustub

Write carefully , Otherwise it will be painful debug…
eturn GetBufferPoolManager(page_id)->DeletePage(page_id);
}
void ParallelBufferPoolManager::FlushAllPgsImp() {
// flush all pages from all BufferPoolManagerInstances
for (auto instance_pointer : parallel_mannager_instances) {
instance_pointer->FlushAllPages();
}
}
} // namespace bustub
[ Outside the chain picture transfer in ...(img-Fut25DfH-1656425093478)]
Write carefully , Otherwise it will be painful `debug`......
边栏推荐
- 稳!上千微服务接入 Zadig 的最佳姿势(Helm Chart 篇)
- How to solve the problem that the computer time is not automatically updated after proofreading
- LeetCode85+105+114+124
- STM32 basic knowledge points
- math_ Basic elementary function graph (power function / exponent / logarithm / trigonometry / inverse trigonometry)
- The server quickly sets up the alist integrated network disk website [pagoda panel one click deployment of alist]
- 3D stereo photo album, Valentine's day, couple birthday gift code applicable
- PhpSpreadsheet读写Excel文件
- NRM explanation
- Mysql database: use the show profile command to analyze performance
猜你喜欢

Qt5.14.2 error connecting to the MySQL database of Ubuntu 20.04

深入解析kubernetes controller-runtime
![Realizing deep learning framework from zero -- LSTM from theory to practice [theory]](/img/ac/164140eff1a6518d49ce25599d9c7b.png)
Realizing deep learning framework from zero -- LSTM from theory to practice [theory]

The third day

语音信号处理(二): 发声生理、听觉生理与听觉心理

5-1系统漏洞扫描

Ansible自动化运维

Vs2013 how to make the program run on other computers
Why does copying files on a shared folder on a local area network (ERP server) result in the loss of the local Internet

Constexpr function
随机推荐
5-2web application vulnerability scanning
Processing of error b6267342 reported by AIX small machine in production environment
Become the only key
论文阅读《Large-Scale Direct SLAM with Stereo Cameras》
How to use filters in jfinal to monitor Druid for SQL execution?
深入解析kubernetes controller-runtime
Why does copying files on a shared folder on a local area network (ERP server) result in the loss of the local Internet
Hematemesis finishing: a rare map of architects!
Deep parsing of kubernetes controller runtime
Hezhou air32f103cbt6 development board hands-on Report
constexpr 函数
2022年PMP项目管理考试敏捷知识点(5)
Evaluation of powerful and excellent document management software: image management, book management and document management
Summer rainbow comes for dinner
啃下大骨头——排序(一)
Does rapid software delivery really need to be at the cost of security?
With the rise of China's database, Alibaba cloud lifeifei: China's cloud database has taken the lead in various mainstream technological innovations abroad
从零实现深度学习框架——RNN从理论到实战【实战】
自己收藏的一些网址
Constexpr function