当前位置:网站首页>Youth training camp -- database operation project
Youth training camp -- database operation project
2022-07-02 05:20:00 【Where's huluwa's father】
Time : 5 month 28 Japan 16:00
Course link :https://live.juejin.cn/4354/yc_teacher-hands/preload
PPT: nothing
Learning manual :https://juejin.cn/post/7101135488974585870#heading-21
After class questions :https://juejin.cn/post/7102399708902981663/
Project address :https://github.com/thuyy/ByteYoungDB
:::
Background knowledge ( Study the contents of the manual )
Storage & database
The storage system
- Block storage : Store the underlying system in the software stack , The interface is too simple
- File store : The most widely used storage system , The interface is very friendly , Realize various
- Object storage : Trump products on the public cloud ,immutable Semantic blessing
- key-value Storage : The most flexible form , There are a lot of open source / Black box products
Database system
- Relational type : Based on relation and relational algebra , General support services and SQL visit , Use experience friendly storage products
- Non relational : Flexible structure , Flexible access , There are different targeted products for different scenarios
Distributed architecture
- Data distribution strategy : Determines how data is distributed to multiple physical nodes of the cluster , Is it even , Whether to achieve high performance
- Data replication protocol : influence IO Path performance 、 How to deal with machine failure scenarios
- Distributed transaction algorithm : Multiple database nodes work together to ensure the security of a transaction ACID Characteristic algorithm , Usually based on 2pc The idea of design
2pc: Two-phase commit , The first stage is the voting stage , The second stage is submission / Execution phase . In the first stage, the coordinator sends a transaction preprocessing request to all participants , Participants implement , Return to coordinator yes/no, If it's all yes be commit, If there is one or more no Or there are timeout uncommitted , be rollback
database structure
You can review MySQL 45. The first section , Note links :https://www.yuque.com/shenjingwa-ev2rj/wfwfwb/xbe3h6
- SQL engine
- Parser: Query resolution , Generate syntax tree , And verify the legitimacy ( Lexical analysis 、 Syntax analysis 、 Semantic analysis )
- Optimizer: Select the optimal execution path according to the syntax tree
- Executor: Query execution process , Real data processing
- Transaction engine
- Implementation of transactions ACID
- Storage engine
- Store the data 、 Indexes 、 journal
- Store the data 、 Indexes 、 journal
Project requirement
Implement a memory database ByteYoungDB, It can support the following operations :
- Create/Drop Table
- Insert、Delete、Update、Select(CRUD)
- Simple equivalent matching conditions
where col = XXX
- Support simple transactions Commit,Rollback
Project design
Project breakdown
- Parse user input SQL sentence , And verify the legitimacy ( Lexical analysis 、 Syntax analysis 、 Semantic analysis ):Parser
- User created tables 、 Management of column and other meta information :MetaData
- according to SQL Syntax tree of statements , Generate plan tree :Optimizer
- Query and execute according to the plan tree , Choose here Models of volcanoes :Executor
Volcanic model abstracts every operation in relational algebra into a Operator, Will the whole SQL Build into a Operator Trees , Top down call of query tree next() Interface , Data is pulled from the bottom up .
- Memory data storage structure design :Storage
- Commot/Rollback Transaction support :Transaction
Project file structure :
SQL Engine design
Parser
SQL Language parsing is very complicated , So here “ lazy ”, Go to github Looking for an open source SQL Parser .https://github.com/hyrise/sql-parser
Input is SQL sentence , The output is a syntax tree :
but sql-parser The library can only provide lexical analysis and grammatical analysis , Generate a query tree , Semantic analysis is not possible , That is, validity verification . So we will sql-parser Libraries to encapsulate , Add semantic analysis function :
bool Parser::parseStatement(std::string query) {
result_ = new SQLParserResult;
SQLParser::parse(query, result_);
if (result_->isValid()) {
return checkStmtsMeta();
} else {
std::cout << "[BYDB-Error] Failed to parse sql statement." << std::endl;
}
return true;
}
Optimizer
According to the generated query tree , Generate the corresponding plan tree . The plan tree is composed of various basic operators , For the scenarios required in this project , The following basic operators are constructed :
Like one UPDATE Inquire about , The corresponding plan tree is as follows :( The project has not been realized index)
Executor
Use volcanic models :
Rely on the plan tree to generate the corresponding execution tree , Every Plan Generate a corresponding Operator.
Every Operator call next_.exec() To call the lower layer Operator Generate data :
class BaseOperator {
public:
BaseOperator(Plan* plan, BaseOperator* next) : plan_(plan), next_(next) {
}
~BaseOperator() {
}
virtual bool exec() = 0;
Plan* plan_;
BaseOperator* next_;
};
Storage engine
data structure
Because we are a memory database , So the data structure can be designed simply . Here, a batch of recorded memory is applied for each time , This can reduce the problem of memory fragmentation , Improve memory access efficiency . Then put the memory of this batch of records into FreeList in . When data is inserted , from FreeList Get a piece of memory for writing , And put in DataList. When data is deleted , Take data from DataList Return to FreeList in .
Transaction engine
Without considering concurrency , And the data does not need to be persistent , Our transaction engine design becomes relatively simple . Therefore, there is no need to implement complex MVCC Mechanism , Only need to be able to implement transactions Commit and Rollback Function can be .
Here we implement a undo stack The mechanism of , Update one row of data each time , Just take the old version of this line of data push To undo stack in . If the transaction rolls back , Then from undo stack The old version of the data one by one pop come out , Restore to the original data .
Index design
Because it only requires equivalent matching , So you can use the simplest hash Indexes .
Project library
Project structures,
Use large C/C++ The most commonly used CMake Tools .CMake It's a cross platform compilation tool .CMake Mainly write CMakeLists.txt File by cmake The order will CMakeLists.txt File to make The required Makefile file , Last use make Command compilation source code to generate executable program or library file .
cmake_minimum_required(VERSION 3.8)
project(ByteYoungDB)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_COMPILER "g++")
set(CMAKE_CXX_FLAGS "-g -Wall -Werror -std=c++17")
// O0 No optimization ,O2 Can do optimization
set(CMAKE_CXX_FLAGS_DEBUG "-O0")
set(CMAKE_CXX_FLAGS_RELEASE "-O2 -DNDEBUG ")
set(CMAKE_INSTALL_PREFIX "install")
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY
${CMAKE_BINARY_DIR}/lib)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY
${CMAKE_BINARY_DIR}/bin)
include_directories(${CMAKE_SOURCE_DIR}/sql-parser/include)
add_subdirectory(src/main)
add_subdirectory(src/sql-parser-test)
Reference study materials :
《CMake Cookbook Chinese version 》 - Bookset
Code store
compile
linux The system follows README.md Chinese compiler
Not linux Environmental Science , download sql-parser Library code ,make Generate libsqlparser.so, Replace the ByteYoungDB/sql-parser/lib/libsqlparser.so Under the table of contents
Harvest
parser
The parser supports lexical analysis 、 Syntax analysis 、 Semantic analysis . Lexical analysis and grammatical analysis rely on sql-parser Open source project implementation , This project realizes semantic analysis based on it
Optimzer
If there is where Conditions , Will generate filter A filtered plan tree
If the table structure is a,b,c The order of , But specify b,c,a Output , Then the optimizer will generate the corresponding plan tree as required
Because it does not support very complex queries , So there is nothing optimized
Executor
The actuator will be initialized first , According to the generated by the analyzer plan Structure generates corresponding operator Structure , Depth first traversal , Sir becomes the lowest , And then generate up
The lowest level actuator calls the interface of the storage engine to scan the table for table operation
The actuator executes according to the generated operator structure , Use volcanic models , Pass from top to bottom next() Function to call the lower level , Then return the result from the bottom up
Storage engine
The storage engine implements two linked lists , As in the data structure above
The structure of a piece of data is actually getting longer char Array , Then apply for the memory size according to the calculation
Every time you apply for memory, it will be saved to tupleGroups, It is convenient to release the memory in a unified way
Business
Design three undo type : Additions and deletions , It's a stack data structure
Input begin, Open transaction
With insert Operation as an example , In the storage engine , Will judge whether it is within the transaction , If it is , Then add undo
If the user makes rollback, Will undo Stack pop, To recover data
delete Operation carried out undo When restoring data, you need to put the corresponding memory into freelist For later operation
边栏推荐
- Fabric.js 将本地图像上传到画布背景
- Gee series: Unit 5 remote sensing image preprocessing [GEE grid preprocessing]
- Gee: remote sensing image composite and mosaic
- Gee data set: export the distribution and installed capacity of hydropower stations in the country to CSV table
- leetcode存在重复元素go实现
- Fabric.js 背景不受视口变换影响
- Ubuntu 20.04 installing mysql8
- 黑马笔记---Map集合体系
- Fabric. JS iText superscript and subscript
- 视差特效的原理和实现方法
猜你喜欢
运维工作的“本手、妙手、俗手”
Fabric. JS iText set italics manually
kmp思想及模板代码
Collectors.groupingBy 排序
Fabric.js 激活输入框
Fabric. JS iText sets the color and background color of the specified text
Gee: use of common mask functions in remote sensing image processing [updatemask]
Disable access to external entities in XML parsing
黑馬筆記---Set系列集合
Cubemx DMA notes
随机推荐
Fabric.js 将本地图像上传到画布背景
Global and Chinese market of pressure gauges 2022-2028: Research Report on technology, participants, trends, market size and share
Exercise notes 13 (effective letter ectopic words)
指针使用详解
Collectors. Groupingby sort
6.网络-基础
How matlab marks' a 'in the figure and how matlab marks points and solid points in the figure
Nodejs (03) -- custom module
Two implementation methods of delay queue
Fabric. JS right click menu
Feign realizes file uploading and downloading
Black Horse Notes - - set Series Collection
Gee series: Unit 2 explore datasets
Pycharm breakpoint management: temporarily cancel some breakpoints + run directly to a line
Collectors.groupingBy 排序
Fabric.js 居中元素
[opencv] image binarization
【pyinstaller】_get_sysconfigdata_name() missing 1 required positional argument: ‘check_exists‘
青训营--数据库实操项目
函数中使用sizeof(arr) / sizeof(arr[0])求数组长度不正确的原因