当前位置:网站首页>Storage principle inside mongodb
Storage principle inside mongodb
2022-07-07 13:11:00 【cui_ yonghua】
The basic chapter ( Can solve the problem of 80% The problem of ):
MongoDB data type 、 Key concepts and shell Commonly used instructions
MongoDB Various additions to documents 、 to update 、 Delete operation summary
Advanced :
Other :
Storage engine
This article introduces the default storage engine WiredTiger
WiredTiger framework
WiredTiger The write operation will be written first Cache, And persist to WAL(Write ahead log), Every time 60s Will do it once Checkpoint, Persist the current data , Every time , Generate a new snapshot .Wiredtiger Connection initialization , First, restore the data to the latest snapshot state , And then according to Checkpoint Restore data , To ensure storage reliability
btree And b+tree
Although queries that traverse data are relatively common , however MongoDB It is considered that querying a single data record is far more common than traversing data , because B The non leaf nodes of the tree can also store data , therefore The average random required to query a piece of data IO More times than B+ Few trees , Use B Treelike MongoDB In similar scenarios, the query speed will be faster than MySQL fast .
This is not to say MongoDB You cannot traverse the data , We are MongoDB You can also use a range to query a batch of records that meet the corresponding conditions , It just takes more time than MySQL Longer .MySQL It is considered that queries that traverse data are common , So it chose B+ Tree as the underlying data structure
cache
Internal caching and file system caching , By default, the internal cache fetches 50%(RAM-1 GB) or 256M The greater , The file system cache uses all currently available RAM.
Wiredtiger Of Cache use Btree How to organize , Every Btree The node is a page,root page yes btree The root node ,internal page yes btree The middle index node of ,leaf page It's a leaf node that actually stores data ;btree The data to page Load or write to disk from disk on demand in units ,btree Each page In the document extent form ( By document offset + size identification ) Storage
page
ROW_ARRAY: Each array unit (wt_row) It's stored in this kv row Stored on disk page kv cell Location and encoding method of row set data buffer offset ( This location and encoding is in WT It is defined as a wt_cell object ), By offsetting the location information with this information, you can access the same... In the buffer K/V Content value
ROW_UPDATE_ARRAY: One mvcc list object ,mvcc_list And wt_row It's one-to-one ,mvcc list It's stored to wt_row Modified value , The modified values include value update and value deletion , It's a one-way list without locks
Write operations
- Traverse btree, Find what needs to be updated page
- If cache There is no corresponding page, Will load from disk page, Key value pairs are stored WT_ROW
- If it is insert operation , to update WT_INSERT, If it is update/delete operation , to update WT_UPDATE
- if necessary , Write the operation record to journal
Let's illustrate with an example :
If one page Stored a [0,100] Of key Range , The line originally stored on the disk key=2, 10 ,20, 30 , 50, 80, 90, Their values are value = 102, 110, 120, 130, 150, 180, 190.
stay page After data is read from disk to memory , Respectively for key=2 Of value Two changes have been made , The two modified values are respectively 402,502. Yes key = 20 ,50 Of value Made a change , The modified value = 122, 155, There is distribution after insert New key = 3,5, 41, 99,value = 203,205,241,299.
So in memory page This is how the data is organized as shown in the figure below :
The next two wt_row It may not be continuous , New units can be inserted between them , for example row1(key = 2) and row2(key=10) You can insert 3 and 5, these two items. row There needs to be a sorted data structure between (WT use skiplist data structure ) To store the inserted K/V, You just need one skiplist An array of objects page_insert_array And row array Corresponding . Here's the thing to note chart 6 In the middle of the red box skiplist8, It's for storage row1(key=2) Before the scope insert data , If there is key =1 The data of insert, This data will be added to skiplist8 among .
So in the picture row And insert skiplist The corresponding relationship between :
- row1 The previous range corresponds to insert yes skiplist8
- row1 and row2 Between the corresponding insert yes skiplist1
- row2 and row3 Between the corresponding insert yes skiplist3
- …
- row7 The range after that corresponds to insert yes skiplist7
checkpoint
One Checkpoit It contains the following metadata :
root page Address , The address is determined by the document offset,size And content checksum form
alloc extent list Address , Store since last checkpoint Newly assigned extent list
discard extent list Address , Store since last checkpoint Discarded extent list
available extent list Address , Store allocatable extent list , Only the latest checkpoint Include this list
file size To restore to this checkpoint The state of , Will file truncate To file size that will do
WAL(journal)
The log file records from the previous checkpoint After the actual operation , This document is every 100ms Or the file size reaches 100M Just synchronize from cache to disk
The whole relationship
Storage engine principle supplement
Reference resources 1
Reference resources 2
Distributed storage
framework
Architecture diagram :
Write data flow :
Read data flow :
边栏推荐
猜你喜欢
[untitled]
Practical example of propeller easydl: automatic scratch recognition of industrial parts
达晨与小米投的凌云光上市:市值153亿 为机器植入眼睛和大脑
DHCP 动态主机设置协议 分析
OSI 七层模型
Cinnamon Applet 入门
《开源圆桌派》第十一期“冰与火之歌”——如何平衡开源与安全间的天然矛盾?
Practical example of propeller easydl: automatic scratch recognition of industrial parts
【黑马早报】华为辟谣“军师”陈春花;恒驰5预售价17.9万元;周杰伦新专辑MV 3小时播放量破亿;法华寺回应万元月薪招人...
Per capita Swiss number series, Swiss number 4 generation JS reverse analysis
随机推荐
MongoDB的用户管理总结
QQ的药,腾讯的票
测试下摘要
《ASP.NET Core 6框架揭秘》样章[200页/5章]
How to make the new window opened by electorn on the window taskbar
“新红旗杯”桌面应用创意大赛2022
RecyclerView的数据刷新
简单好用的代码规范
服务器到服务器 (S2S) 事件 (Adjust)
regular expression
Query whether a field has an index with MySQL
国泰君安证券开户怎么开的?开户安全吗?
【无标题】
HZOJ #240. Graphic printing IV
About the problem of APP flash back after appium starts the app - (solved)
Lingyunguang of Dachen and Xiaomi investment is listed: the market value is 15.3 billion, and the machine is implanted into the eyes and brain
通过Keil如何查看MCU的RAM与ROM使用情况
飞桨EasyDL实操范例:工业零件划痕自动识别
Pcap learning notes II: pcap4j source code Notes
The difference between cache and buffer