当前位置:网站首页>Storage principle inside mongodb
Storage principle inside mongodb
2022-07-07 13:11:00 【cui_ yonghua】
The basic chapter ( Can solve the problem of 80% The problem of ):
MongoDB data type 、 Key concepts and shell Commonly used instructions
MongoDB Various additions to documents 、 to update 、 Delete operation summary
Advanced :
Other :
Storage engine
This article introduces the default storage engine WiredTiger
WiredTiger framework
WiredTiger The write operation will be written first Cache, And persist to WAL(Write ahead log), Every time 60s Will do it once Checkpoint, Persist the current data , Every time , Generate a new snapshot .Wiredtiger Connection initialization , First, restore the data to the latest snapshot state , And then according to Checkpoint Restore data , To ensure storage reliability
btree And b+tree
Although queries that traverse data are relatively common , however MongoDB It is considered that querying a single data record is far more common than traversing data , because B The non leaf nodes of the tree can also store data , therefore The average random required to query a piece of data IO More times than B+ Few trees , Use B Treelike MongoDB In similar scenarios, the query speed will be faster than MySQL fast .
This is not to say MongoDB You cannot traverse the data , We are MongoDB You can also use a range to query a batch of records that meet the corresponding conditions , It just takes more time than MySQL Longer .MySQL It is considered that queries that traverse data are common , So it chose B+ Tree as the underlying data structure
cache
Internal caching and file system caching , By default, the internal cache fetches 50%(RAM-1 GB) or 256M The greater , The file system cache uses all currently available RAM.
Wiredtiger Of Cache use Btree How to organize , Every Btree The node is a page,root page yes btree The root node ,internal page yes btree The middle index node of ,leaf page It's a leaf node that actually stores data ;btree The data to page Load or write to disk from disk on demand in units ,btree Each page In the document extent form ( By document offset + size identification ) Storage
page
ROW_ARRAY: Each array unit (wt_row) It's stored in this kv row Stored on disk page kv cell Location and encoding method of row set data buffer offset ( This location and encoding is in WT It is defined as a wt_cell object ), By offsetting the location information with this information, you can access the same... In the buffer K/V Content value
ROW_UPDATE_ARRAY: One mvcc list object ,mvcc_list And wt_row It's one-to-one ,mvcc list It's stored to wt_row Modified value , The modified values include value update and value deletion , It's a one-way list without locks
Write operations
- Traverse btree, Find what needs to be updated page
- If cache There is no corresponding page, Will load from disk page, Key value pairs are stored WT_ROW
- If it is insert operation , to update WT_INSERT, If it is update/delete operation , to update WT_UPDATE
- if necessary , Write the operation record to journal
Let's illustrate with an example :
If one page Stored a [0,100] Of key Range , The line originally stored on the disk key=2, 10 ,20, 30 , 50, 80, 90, Their values are value = 102, 110, 120, 130, 150, 180, 190.
stay page After data is read from disk to memory , Respectively for key=2 Of value Two changes have been made , The two modified values are respectively 402,502. Yes key = 20 ,50 Of value Made a change , The modified value = 122, 155, There is distribution after insert New key = 3,5, 41, 99,value = 203,205,241,299.
So in memory page This is how the data is organized as shown in the figure below :
The next two wt_row It may not be continuous , New units can be inserted between them , for example row1(key = 2) and row2(key=10) You can insert 3 and 5, these two items. row There needs to be a sorted data structure between (WT use skiplist data structure ) To store the inserted K/V, You just need one skiplist An array of objects page_insert_array And row array Corresponding . Here's the thing to note chart 6 In the middle of the red box skiplist8, It's for storage row1(key=2) Before the scope insert data , If there is key =1 The data of insert, This data will be added to skiplist8 among .
So in the picture row And insert skiplist The corresponding relationship between :
- row1 The previous range corresponds to insert yes skiplist8
- row1 and row2 Between the corresponding insert yes skiplist1
- row2 and row3 Between the corresponding insert yes skiplist3
- …
- row7 The range after that corresponds to insert yes skiplist7
checkpoint
One Checkpoit It contains the following metadata :
root page Address , The address is determined by the document offset,size And content checksum form
alloc extent list Address , Store since last checkpoint Newly assigned extent list
discard extent list Address , Store since last checkpoint Discarded extent list
available extent list Address , Store allocatable extent list , Only the latest checkpoint Include this list
file size To restore to this checkpoint The state of , Will file truncate To file size that will do
WAL(journal)
The log file records from the previous checkpoint After the actual operation , This document is every 100ms Or the file size reaches 100M Just synchronize from cache to disk
The whole relationship
Storage engine principle supplement
Reference resources 1
Reference resources 2
Distributed storage
framework
Architecture diagram :
Write data flow :
Read data flow :
边栏推荐
- DrawerLayout禁止侧滑显示
- Sed of three swordsmen in text processing
- 简单好用的代码规范
- ORACLE进阶(五)SCHEMA解惑
- Cinnamon 任务栏网速
- Unity build error: the name "editorutility" does not exist in the current context
- 为租客提供帮助
- Sample chapter of "uncover the secrets of asp.net core 6 framework" [200 pages /5 chapters]
- 【无标题】
- TPG x AIDU|AI领军人才招募计划进行中!
猜你喜欢
MySQL入门尝鲜
Lingyunguang of Dachen and Xiaomi investment is listed: the market value is 15.3 billion, and the machine is implanted into the eyes and brain
Analysis of DHCP dynamic host setting protocol
滑轨步进电机调试(全国海洋航行器大赛)(STM32主控)
为租客提供帮助
Vscode编辑器ESP32头文件波浪线不跳转彻底解决
MySQL master-slave replication
Introduce six open source protocols in detail (instructions for programmers)
[untitled]
【学习笔记】zkw 线段树
随机推荐
HZOJ #236. Recursive implementation of combinatorial enumeration
What if the xshell evaluation period has expired
博文推荐|Apache Pulsar 跨地域复制方案选型实践
Isprs2021/ remote sensing image cloud detection: a geographic information driven method and a new large-scale remote sensing cloud / snow detection data set
红杉中国完成新一期90亿美元基金募集
自定义线程池拒绝策略
日本政企员工喝醉丢失46万信息U盘,公开道歉又透露密码规则
企业级自定义表单引擎解决方案(十二)--体验代码目录结构
MongoDB 遇见 spark(进行整合)
关于 appium 启动 app 后闪退的问题 - (已解决)
事务的七种传播行为
shell 批量文件名(不含扩展名)小写改大写
Enterprise custom form engine solution (XII) -- experience code directory structure
ORACLE进阶(五)SCHEMA解惑
.Net下极限生产力之efcore分表分库全自动化迁移CodeFirst
Unity build error: the name "editorutility" does not exist in the current context
PACP学习笔记三:PCAP方法说明
centso7 openssl 报错Verify return code: 20 (unable to get local issuer certificate)
regular expression
人均瑞数系列,瑞数 4 代 JS 逆向分析