当前位置:网站首页>Storage principle inside mongodb
Storage principle inside mongodb
2022-07-07 13:11:00 【cui_ yonghua】
The basic chapter ( Can solve the problem of 80% The problem of ):
MongoDB data type 、 Key concepts and shell Commonly used instructions
MongoDB Various additions to documents 、 to update 、 Delete operation summary
Advanced :
Other :
Storage engine
This article introduces the default storage engine WiredTiger
WiredTiger framework
WiredTiger The write operation will be written first Cache, And persist to WAL(Write ahead log), Every time 60s Will do it once Checkpoint, Persist the current data , Every time , Generate a new snapshot .Wiredtiger Connection initialization , First, restore the data to the latest snapshot state , And then according to Checkpoint Restore data , To ensure storage reliability
btree And b+tree
Although queries that traverse data are relatively common , however MongoDB It is considered that querying a single data record is far more common than traversing data , because B The non leaf nodes of the tree can also store data , therefore The average random required to query a piece of data IO More times than B+ Few trees , Use B Treelike MongoDB In similar scenarios, the query speed will be faster than MySQL fast .
This is not to say MongoDB You cannot traverse the data , We are MongoDB You can also use a range to query a batch of records that meet the corresponding conditions , It just takes more time than MySQL Longer .MySQL It is considered that queries that traverse data are common , So it chose B+ Tree as the underlying data structure
cache
Internal caching and file system caching , By default, the internal cache fetches 50%(RAM-1 GB) or 256M The greater , The file system cache uses all currently available RAM.
Wiredtiger Of Cache use Btree How to organize , Every Btree The node is a page,root page yes btree The root node ,internal page yes btree The middle index node of ,leaf page It's a leaf node that actually stores data ;btree The data to page Load or write to disk from disk on demand in units ,btree Each page In the document extent form ( By document offset + size identification ) Storage
page
ROW_ARRAY: Each array unit (wt_row) It's stored in this kv row Stored on disk page kv cell Location and encoding method of row set data buffer offset ( This location and encoding is in WT It is defined as a wt_cell object ), By offsetting the location information with this information, you can access the same... In the buffer K/V Content value
ROW_UPDATE_ARRAY: One mvcc list object ,mvcc_list And wt_row It's one-to-one ,mvcc list It's stored to wt_row Modified value , The modified values include value update and value deletion , It's a one-way list without locks
Write operations
- Traverse btree, Find what needs to be updated page
- If cache There is no corresponding page, Will load from disk page, Key value pairs are stored WT_ROW
- If it is insert operation , to update WT_INSERT, If it is update/delete operation , to update WT_UPDATE
- if necessary , Write the operation record to journal
Let's illustrate with an example :
If one page Stored a [0,100] Of key Range , The line originally stored on the disk key=2, 10 ,20, 30 , 50, 80, 90, Their values are value = 102, 110, 120, 130, 150, 180, 190.
stay page After data is read from disk to memory , Respectively for key=2 Of value Two changes have been made , The two modified values are respectively 402,502. Yes key = 20 ,50 Of value Made a change , The modified value = 122, 155, There is distribution after insert New key = 3,5, 41, 99,value = 203,205,241,299.
So in memory page This is how the data is organized as shown in the figure below :
The next two wt_row It may not be continuous , New units can be inserted between them , for example row1(key = 2) and row2(key=10) You can insert 3 and 5, these two items. row There needs to be a sorted data structure between (WT use skiplist data structure ) To store the inserted K/V, You just need one skiplist An array of objects page_insert_array And row array Corresponding . Here's the thing to note chart 6 In the middle of the red box skiplist8, It's for storage row1(key=2) Before the scope insert data , If there is key =1 The data of insert, This data will be added to skiplist8 among .
So in the picture row And insert skiplist The corresponding relationship between :
- row1 The previous range corresponds to insert yes skiplist8
- row1 and row2 Between the corresponding insert yes skiplist1
- row2 and row3 Between the corresponding insert yes skiplist3
- …
- row7 The range after that corresponds to insert yes skiplist7
checkpoint
One Checkpoit It contains the following metadata :
root page Address , The address is determined by the document offset,size And content checksum form
alloc extent list Address , Store since last checkpoint Newly assigned extent list
discard extent list Address , Store since last checkpoint Discarded extent list
available extent list Address , Store allocatable extent list , Only the latest checkpoint Include this list
file size To restore to this checkpoint The state of , Will file truncate To file size that will do
WAL(journal)
The log file records from the previous checkpoint After the actual operation , This document is every 100ms Or the file size reaches 100M Just synchronize from cache to disk
The whole relationship
Storage engine principle supplement
Reference resources 1
Reference resources 2
Distributed storage
framework
Architecture diagram :
Write data flow :
Read data flow :
边栏推荐
- RecyclerView的数据刷新
- Conversion from non partitioned table to partitioned table and precautions
- PCAP学习笔记二:pcap4j源码笔记
- 高瓴投的澳斯康生物冲刺科创板:年营收4.5亿 丢掉与康希诺合作
- How to reset Google browser? Google Chrome restore default settings?
- 飞桨EasyDL实操范例:工业零件划痕自动识别
- QQ的药,腾讯的票
- php——laravel缓存cache
- Aosikang biological sprint scientific innovation board of Hillhouse Investment: annual revenue of 450million yuan, lost cooperation with kangxinuo
- Sample chapter of "uncover the secrets of asp.net core 6 framework" [200 pages /5 chapters]
猜你喜欢
随机推荐
[untitled]
DHCP 动态主机设置协议 分析
Aosikang biological sprint scientific innovation board of Hillhouse Investment: annual revenue of 450million yuan, lost cooperation with kangxinuo
DETR介绍
单片机原理期末复习笔记
共创软硬件协同生态:Graphcore IPU与百度飞桨的“联合提交”亮相MLPerf
【黑马早报】华为辟谣“军师”陈春花;恒驰5预售价17.9万元;周杰伦新专辑MV 3小时播放量破亿;法华寺回应万元月薪招人...
. Net ultimate productivity of efcore sub table sub database fully automated migration codefirst
API query interface for free mobile phone number ownership
ESP32构解工程添加组件
PAcP learning note 3: pcap method description
Layer pop-up layer closing problem
MongoDB 分片总结
高端了8年,雅迪如今怎么样?
Unity build error: the name "editorutility" does not exist in the current context
Cinnamon 任务栏网速
Go language learning notes - structure
Steps of building SSM framework
通过Keil如何查看MCU的RAM与ROM使用情况
JS判断一个对象是否为空