当前位置：网站首页>Project 1 - buffer pool [cmu 15-445645] notes

Project 1 - buffer pool [cmu 15-445645] notes

2022-06-29 23:11:00 【Learning is so boring】

PROJECT #1 - BUFFER POOL 15-445/645 note

Because it is impossible to store all the blocks in main memory , We need to manage the allocation of free space in main memory for storage blocks . buffer Is the part of main memory used to store copies of disk data blocks .

Buffer manager

When a program in a database system needs blocks on disk , It makes a request to the buffer manager （ That is to call ). If the block is already in the buffer , Then the buffer manager passes the address of this block in main memory to the requester . If the block is not in the buffer , The buffer manager first allocates space for this block in the buffer , if necessary , Some other blocks may be moved out of main memory , Make room for this new block . A removed block is written back to disk only if it has been modified since it was last written back to disk . then , The buffer manager reads the requested block from the disk into the buffer , And pass the address of this block in main memory to the requester . The internal actions of the buffer manager are transparent to the program that makes the disk block request .

If you are familiar with the concept of operating system , You will find that the buffer manager looks no different from the virtual storage manager in most operating systems . One difference between them is that the size of the database will be larger than the hardware address space of the machine , Therefore, the memory address is not enough to address all disk blocks . Besides , In order to better serve the database system , Buffer managers must use technologies that are more complex than typical virtual storage management schemes .

TASK #1 - LRU REPLACEMENT POLICY Buffer replacement policy

Realization LRUReplacer

LRUReplacer Maximum frame The number is the same as the size of the buffer pool , Because it contains BufferPoolManager Placeholder of all frames , But not all frames （ Occupancy of ） All in LRUReplacer,LRUReplacer It is empty during initialization , Only the new unpin Of frame Will be in LRUReplacer

Victim(frame_id_t*): Use LRU programme ( Least recently used solution ), And the least recently accessed block is written back to disk , And remove from the buffer . Put it frame Stored in parameters ( The ginseng ), If Replacer It's empty , return false
Pin(frame_id_t): take frame Fixed to BufferPoolManager In the after , Call this method , He should start from LRUReplacer Remove the corresponding frame
If the Pin It means that you are reading , So it cannot be removed , therefore
```
lru_cache.erase(lru_map[frame_id]);
lru_map.erase(frame_id);
```
Unpin(frame_id_t) : When a certain frame Of pin_count become 0 when , call Unpin , This function should be directed to LRUReplacer add to frame
When the process finishes reading data , Will perform Unpin operation , It is allowed to change the block to remove , So you need to add to LRUReplacer , Multiple processes can read data from a block in the buffer , Each process is required to execute before accessing data pin operation , Execute... Until finished unpin operation , All processes execute Unpin After the operation , This frame Can be removed , So we need to use pin_count Count , Ensure that the above operations are realized
Size(): return LRUReplacer Of size

Since thread safety must be ensured , So use a lock , because unpin call size, As I thought before , Not sure Size() Whether it will call... In other places , So both functions must be locked , If you use std::mutex Can cause deadlock , So recursive lock is used std::recursive_mutex, Solve this problem .

std::recursive_mutex: The same thread is allowed to lock the same instance of a mutex multiple times . Sometimes during certain operations , A public function needs to call another public function . under these circumstances , The latter will also try to lock the mutex , If the std::mutex This leads to undefined behavior . Using recursive mutual exclusion instead of common mutual exclusion to solve .

lru_replacer.h


#pragma once

#include <list>
#include <mutex> // NOLINT
#include <unordered_map>
#include <vector>

#include "buffer/replacer.h"
#include "common/config.h"

namespace bustub {
    

/** * LRUReplacer implements the Least Recently Used replacement policy. */
class LRUReplacer : public Replacer {
    
 public:
  /** * Create a new LRUReplacer. * @param num_pages the maximum number of pages the LRUReplacer will be required to store */
  explicit LRUReplacer(size_t num_pages);

  /** * Destroys the LRUReplacer. */
  ~LRUReplacer() override;

  bool Victim(frame_id_t *frame_id) override;

  void Pin(frame_id_t frame_id) override;

  void Unpin(frame_id_t frame_id) override;
  size_t Size() override;

 private:

  std::recursive_mutex lru_lock; //  Recursive lock 
  std::list<frame_id_t> lru_cache;
  std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> lru_map;
  size_t capacity;
};

}  // namespace bustub

lru_replacer.cpp



#include "buffer/lru_replacer.h"

namespace bustub {
    

LRUReplacer::LRUReplacer(size_t num_pages):capacity(num_pages) {
    };

LRUReplacer::~LRUReplacer() = default;

bool LRUReplacer::Victim(frame_id_t *frame_id) {
    
  std::lock_guard latch(lru_lock);

  if (lru_map.empty()) return false;

  frame_id_t frame = lru_cache.back(); //  Return the page of drug sacrifice ,  And from cache Delete 
  lru_cache.pop_back();
  lru_map.erase(frame);
  *frame_id = frame;

  return true;
}

void LRUReplacer::Pin(frame_id_t frame_id) {
    
  std::lock_guard latch(lru_lock);

  if (lru_map.count(frame_id) != 0) {
    
    lru_cache.erase(lru_map[frame_id]); //  from cache Delete in ,  So it won't be deleted ,  Equivalent to nailing 
    lru_map.erase(frame_id);
  }
}

void LRUReplacer::Unpin(frame_id_t frame_id) {
     //  Add to cache in , frame Can be deleted 

    std::lock_guard latch(lru_lock);
    if (lru_map.count(frame_id) != 0) return;


  while (this->Size() >= capacity) {
     //  Normally this would not happen ,  After writing, refer to other people's , I found that many people wrote this , And added ...

    frame_id_t del_frame = lru_cache.front();
    lru_cache.pop_front();
    lru_map.erase(del_frame);
  }

  lru_cache.push_front(frame_id);
  lru_map[frame_id] = lru_cache.begin();
}

size_t LRUReplacer::Size() {
    
  std::lock_guard latch(lru_lock);
  return lru_cache.size();
}


}  // namespace bustub

TASK #2 - BUFFER POOL MANAGER INSTANCE Buffer pool manager instance

Next , Buffer pool manager is implemented in （BufferPoolManagerInstance）.BufferPoolManagerInstance In charge of from DiskManager Get the database page and store it in memory .BufferPoolManagerInstance Explicitly indicates when a dirty page is written to disk , Or when it needs to evict a page to make room for a new page , Write out dirty pages to disk .

All in memory pages in the system are controlled by Page Objects represent .BufferPoolManagerInstance You don't need to know the content of these pages . however , As a system developer , You must understand Page Objects are simply containers of memory in the buffer pool , Therefore, it is not specific to a unique page . in other words , Every Page Objects contain a block of memory ,DiskManager This memory block will be used as the location to copy it to read from disk The physical page of The content of .BufferPoolManagerInstance Will reuse the same Page object , To store data as it moves back and forth to disk . This means that throughout the life cycle of the system , same Page Objects may contain different physical pages .Page Object identifier （page_id） Track the physical pages it contains ; If Page Object does not contain a physical page , Then it has to be page_id Set to INVALID_PAGE_ID.

Every Page Object also maintains a counter , Used to indicate that “ Fix ” The number of threads on the page . Don't allow BufferPoolManagerInstance Release fixed Page. Every Page The object also tracks whether it is dirty . Your job is to record whether the page has been modified before undocking .BufferPoolManagerInstance Dirt must be removed Page The contents of are written back to disk , Then you can reuse the object .

BufferPoolManagerInstance The implementation will use the... That you created in the previous steps of this assignment LRUReplacer class . It will use LRUReplacer To track when to access Page object , So that it can decide what to do when it has to free up frames to make room to copy new physical pages from disk y Which object to remove .

** First , Let's start with a few concepts : **

page: The database needs to send files into many small parts , So the file is made up of page Composed of , page That is, the smallest unit to read data , In this experiment, each page The size is 4k
frame: The space in memory , and page It's the same size , Used to place... Read from disk page, One page Can be placed in any empty frame in ,
When BUFFER POOL MANAGER need frame Store new page When , First, we will query free_list( Store idle frame) Are there any free frame, If there is , will page Save this free frame , If there is no free frame , It will start from Replacer Find one of them unpin Of frame, If it's dirty , Write back to disk , otherwise , New page Read this frame in , If Replacer Also cannot return a replaceable ( sacrifice ) Of frame, Direct return error

Concrete realization , You can see comments and code

buffer_pool_manager_instance.cpp

#include "buffer/parallel_buffer_pool_manager.h"

namespace bustub {
    

ParallelBufferPoolManager::ParallelBufferPoolManager(size_t num_instances, size_t pool_size, DiskManager *disk_manager,
                                                     LogManager *log_manager)
    : num_instances_(num_instances), pool_size_(pool_size), start_index_(0) {
    
  // Allocate and create individual BufferPoolManagerInstances
  for (size_t i = 0; i < num_instances_; i++) {
    
    parallel_mannager_instances.emplace_back(
        new BufferPoolManagerInstance(pool_size_, num_instances_, i, disk_manager, log_manager));
  }
}

// Update constructor to destruct all BufferPoolManagerInstances and deallocate any associated memory
ParallelBufferPoolManager::~ParallelBufferPoolManager() {
    
  for (auto instance_pointer : parallel_mannager_instances) {
    
    delete instance_pointer;
  }
};

size_t ParallelBufferPoolManager::GetPoolSize() {
    
  // Get size of all BufferPoolManagerInstances
  size_t all_buffer_size{
    0};
  for (auto instance_pointer : parallel_mannager_instances) {
    
    all_buffer_size += instance_pointer->GetPoolSize();
  }
  return all_buffer_size;
}

BufferPoolManager *ParallelBufferPoolManager::GetBufferPoolManager(page_id_t page_id) {
    
  // Get BufferPoolManager responsible for handling given page id. You can use this method in your other methods.
  size_t index = page_id % num_instances_;
  return parallel_mannager_instances[index];
}

Page *ParallelBufferPoolManager::FetchPgImp(page_id_t page_id) {
    
  // Fetch page for page_id from responsible BufferPoolManagerInstance

  return GetBufferPoolManager(page_id)->FetchPage(page_id);
}

bool ParallelBufferPoolManager::UnpinPgImp(page_id_t page_id, bool is_dirty) {
    
  // Unpin page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->UnpinPage(page_id, is_dirty);
}

bool ParallelBufferPoolManager::FlushPgImp(page_id_t page_id) {
    
  // Flush page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->FlushPage(page_id);
}

Page *ParallelBufferPoolManager::NewPgImp(page_id_t *page_id) {
    
  // create new page. We will request page allocation in a round robin manner from the underlying
  // BufferPoolManagerInstances
  // 1. From a starting index of the BPMIs, call NewPageImpl until either 1) success and return 2) looped around to
  // starting index and return nullptr
  Page *p{
    nullptr};
  size_t index{
    start_index_ % num_instances_};
  size_t start(index);
  start_index_++;

  while (true) {
    
    p = parallel_mannager_instances[index]->NewPage(page_id);
    if (p != nullptr) return p;
    index = (index + 1) % num_instances_;
    if (index == start) return nullptr;
  }

  // 2. Bump the starting index (mod number of instances) to start search at a different BPMI each time this function
  // is called
  return nullptr;
}

bool ParallelBufferPoolManager::DeletePgImp(page_id_t page_id) {
    
  // Delete page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->DeletePage(page_id);
}

void ParallelBufferPoolManager::FlushAllPgsImp() {
    
  // flush all pages from all BufferPoolManagerInstances
  for (auto instance_pointer : parallel_mannager_instances) {
    
    instance_pointer->FlushAllPages();
  }
}

}  // namespace bustub

TASK #3 - PARALLEL BUFFER POOL MANAGER Parallel task buffer

As you may have noticed in the previous task , A single buffer pool manager instance requires a latch to achieve thread safety . This can lead to a lot of contention , Because each thread competes on a single latch when interacting with the buffer pool . One potential solution is to set up multiple buffer pools in the system , Each buffer pool has its own latch .

ParallelBufferPoolManager It's one containing more than one BufferPoolManagerInstances Class . For each operation ,ParallelBufferPoolManager Select a BufferPoolManagerInstance And delegate to the instance .

We use a given page ID To determine the specific BufferPoolManagerInstance. If we have many BufferPoolManagerInstances num_instances So we need some way to put a given page ID Mapping to [0, num_instances） Number in range . For this project , We will use the modulo operator ,page_id mod num_instances Will be given page_id Map to the correct scope .

When ParallelBufferPoolManager When instantiating for the first time , Its starting index should be 0. Every time you create a new page , You will try every BufferPoolManagerInstance, Start with the starting index , Until a successful . Then increase the starting index 1.

Make sure you create a single BufferPoolManagerInstance when , Use uint32_t num_instances and uint32_t instance_index Constructor for , So that the page can be created correctly ID.

paraller_buffer_pool_manager.h

  std::vector<BufferPoolManagerInstance*> parallel_mannager_instances;
  size_t num_instances_;
  size_t pool_size_;
  size_t start_index_;

paraller_buffer_pool_manager.cpp

#include "buffer/parallel_buffer_pool_manager.h"

namespace bustub {
    

ParallelBufferPoolManager::ParallelBufferPoolManager(size_t num_instances, size_t pool_size, DiskManager *disk_manager,
                                                     LogManager *log_manager)
    : num_instances_(num_instances), pool_size_(pool_size), start_index_(0) {
    
  // Allocate and create individual BufferPoolManagerInstances
  for (size_t i = 0; i < num_instances_; i++) {
    
    parallel_mannager_instances.emplace_back(
        new BufferPoolManagerInstance(pool_size_, num_instances_, i, disk_manager, log_manager));
  }
}

// Update constructor to destruct all BufferPoolManagerInstances and deallocate any associated memory
ParallelBufferPoolManager::~ParallelBufferPoolManager() {
    
  for (auto instance_pointer : parallel_mannager_instances) {
    
    delete instance_pointer;
  }
};

size_t ParallelBufferPoolManager::GetPoolSize() {
    
  // Get size of all BufferPoolManagerInstances
  size_t all_buffer_size{
    0};
  for (auto instance_pointer : parallel_mannager_instances) {
    
    all_buffer_size += instance_pointer->GetPoolSize();
  }
  return all_buffer_size;
}

BufferPoolManager *ParallelBufferPoolManager::GetBufferPoolManager(page_id_t page_id) {
    
  // Get BufferPoolManager responsible for handling given page id. You can use this method in your other methods.
  size_t index = page_id % num_instances_;
  return parallel_mannager_instances[index];
}

Page *ParallelBufferPoolManager::FetchPgImp(page_id_t page_id) {
    
  // Fetch page for page_id from responsible BufferPoolManagerInstance

  return GetBufferPoolManager(page_id)->FetchPage(page_id);
}

bool ParallelBufferPoolManager::UnpinPgImp(page_id_t page_id, bool is_dirty) {
    
  // Unpin page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->UnpinPage(page_id, is_dirty);
}

bool ParallelBufferPoolManager::FlushPgImp(page_id_t page_id) {
    
  // Flush page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->FlushPage(page_id);
}

Page *ParallelBufferPoolManager::NewPgImp(page_id_t *page_id) {
    
  // create new page. We will request page allocation in a round robin manner from the underlying
  // BufferPoolManagerInstances
  // 1. From a starting index of the BPMIs, call NewPageImpl until either 1) success and return 2) looped around to
  // starting index and return nullptr
  Page *p{
    nullptr};
  size_t index{
    start_index_ % num_instances_};
  size_t start(index);
  start_index_++;

  while (true) {
    
    p = parallel_mannager_instances[index]->NewPage(page_id);
    if (p != nullptr) return p;
    index = (index + 1) % num_instances_;
    if (index == start) return nullptr;
  }

  // 2. Bump the starting index (mod number of instances) to start search at a different BPMI each time this function
  // is called
  return nullptr;
}

bool ParallelBufferPoolManager::DeletePgImp(page_id_t page_id) {
    
  // Delete page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->DeletePage(page_id);
}

void ParallelBufferPoolManager::FlushAllPgsImp() {
    
  // flush all pages from all BufferPoolManagerInstances
  for (auto instance_pointer : parallel_mannager_instances) {
    
    instance_pointer->FlushAllPages();
  }
}

}  // namespace bustub

Write carefully , Otherwise it will be painful debug…
eturn GetBufferPoolManager(page_id)->DeletePage(page_id);
}

void ParallelBufferPoolManager::FlushAllPgsImp() {
// flush all pages from all BufferPoolManagerInstances
for (auto instance_pointer : parallel_mannager_instances) {
instance_pointer->FlushAllPages();
}
}

} // namespace bustub


[ Outside the chain picture transfer in ...(img-Fut25DfH-1656425093478)]

 Write carefully ,  Otherwise it will be painful `debug`......

原网站

版权声明
本文为[Learning is so boring]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/180/202206292237012927.html