当前位置:网站首页>Mongodb learning notes: bson structure analysis
Mongodb learning notes: bson structure analysis
2022-07-27 20:22:00 【Pengzhenyi】
BSON What is it?
MongoDB As a popular document database , use BSON Format to support the document model .
BSON The full name is Binary JSON, and JSON It's like , But it is stored in binary format . comparison JSON There are the following advantages :
- Faster access .BSON Will store Value The type of , Compared with plaintext storage , There is no need to convert string type to other types . Integer 12345678 For example ,JSON You need to convert the string to an integer , and BSON Integer type flags are stored in , And use 4 Bytes directly store integer values ;
- Lower storage space . Or integer 12345678 For example ,JSON Using plaintext storage requires 8 Bytes , however BSON about Int32 The values of are uniformly 4 Byte storage ,Long and Double use 8 Byte storage . Of course, the lower storage space here is also specific , For example, for small integers ,BSON The space consumed is higher ;
- More data types .BSON comparison JSON, Added BinData,TimeStamp,ObjectID,Decimal128 Other types .
MongoDB Official documents There is a more authoritative and intuitive description of this , Summarized below :
JSON | BSON | |
|---|---|---|
Encoding mode | UTF-8 String | Binary |
data type | String, Boolean, Number, Array | String, Boolean, Number (Integer, Float, Long, Decimal128...), Array, Date, Raw Binary |
Readability | Human and Machine | Machine Only |
This article will BSON In-depth analysis of the storage format , And analyze from the code level BSON Storage and parsing process , Make everyone right BSON Have a deeper understanding .
BSON Storage format
The simplest BSON file , The front and back can be disassembled into the following parts :
- The total length of the document , Occupy 4 Bytes ;
- Value type , Reference resources Code definition , Occupy 1 Bytes ;
- Key Of String Representation form (Key Can only String type ), Length is not fixed , With '\0' ending , Occupy len(Key)+1 Bytes ;
- Value Binary storage , such as Int32 Occupy 4 byte ,Long and Double Occupy 8 Bytes, etc , The following article will analyze the common types one by one ;
- Document to '\0' ending , That is, traversal BSON At the end , common EOO(End Of Object), Occupy 1 Bytes ;
The following is a list of commonly used Int32, Double, String, Embedded documents ,Array type , And analyze their 16 Base representation .
Int type
Double type
Double Type take up 8 Byte space , Use IEEE 754 Standard to binary storage .
String type , And many KV Yes
String The type header contains additional 4 Byte length space , And take '\0' ending .
Nested documents
Nested documents are the same as ordinary documents , The head also contains additional 4 Byte length space . Take the following example {"b" : NumberInt(1)} The storage length of is 12 byte .
An array type
The array type header has 4 Byte storage length , Each element corresponds to a subscript , from '0' Began to increase .
For example, in the following example ,"a.0" It means the first one 1 Elements , The value is Double(1), "a.3" It means the first one 4 Elements , The value is "4".
BSON Parsing and encapsulation of
Analytical process
analysis BSON file when , First read the header in small end mode 4 Bytes , convert to Int32 Length information of type , obtain BSON The end of the document .
Then according to the introduction in the previous section BSON Format information , gaining Value type , Key, as well as Value. Repeat the above process through the iterator to get BSON All of the KV Yes .
The above process can be referred to MongoDB Right in code BSONObj and BSONObjIterator The definition of :
Some key codes are excerpted as follows :
// According to the incoming binary BSON Data construction iterator
explicit BSONObjIterator(const BSONObj& jso) {
int sz = jso.objsize(); // Small end mode read head 4 Bytes , obtain int32 The length of the type
if (MONGO_unlikely(sz == 0)) {
_pos = _theend = 0;
return;
}
_pos = jso.objdata() + 4; // The real starting position
_theend = jso.objdata() + sz - 1; // At the end of
}
// Determine whether the iterator is currently at the end
bool more() {
return _pos < _theend;
}
// Get the current point of the iterator BSONElement( You can think of it as one KV data ), Then iterators ++
BSONElement next() {
verify(_pos <= _theend);
BSONElement e(_pos); // From the current position , encapsulation KV data
_pos += e.size(); // iterator ++
return e;
}If first contact BSON , You may think that the visit BSON A field in , There will be Hash Or jump tables and other data structures to accelerate , achieve O(1) perhaps O(logN) Search efficiency . In principle , You must iterate from front to back through the iterator , The time complexity is O(N).
Packaging process
BSON The encapsulation process of documents can be regarded as the reverse process of parsing . First, keep it on the head 4 Bytes , Then continue to Value type ,Key, Value In binary form , Then add... At the end of the document '\0' EOO sign , Finally, the length will be calculated ( Including storage length 4 Bytes themselves ) Stored in the head reserved 4 In bytes .
The above process can be referred to MongoDB Right in code BSONObjBuilder The definition of :
Some key codes are excerpted as follows :
// Construct a BSONObjBuilder
BSONObjBuilder(int initsize = 512)
: _b(_buf), _buf(initsize), _offset(0), _s(this), _tracker(0), _doneCalled(false) {
// Skip over space for the object length. The length is filled in by _done.
_b.skip(sizeof(int)); // Head reservation 4 byte , Calling _done() When filling length
// Reserve space for the EOO byte. This means _done() can't fail.
_b.reserveBytes(1); // The tail is reserved 1 Bytes of EOO
}
// Go to BSONObjBuilder Insert a Value The type is int32 Of KV Yes
BSONObjBuilder& append(StringData fieldName, int n) {
_b.appendNum((char)NumberInt); // First, additional Value type , Occupy 1 Bytes
_b.appendStr(fieldName); // Append string type Key, With '\0' ending
_b.appendNum(n); // Additional 4 Binary integer of bytes Value
return *this;
}
// KV After the data is appended , call done How to deal with closeouts
char* _done() {
if (_doneCalled)
return _b.buf() + _offset;
_doneCalled = true;
// TODO remove this or find some way to prevent it from failing. Since this is intended
// for use with BSON() literal queries, it is less likely to result in oversized BSON.
_s.endField();
_b.claimReservedBytes(1); // Prevents adding EOO from failing.
_b.appendNum((char)EOO);
// Determine the starting position and size of the final data
// there _offset For nesting builder Share a buffer Spatial scene , otherwise _offset by 0
char* data = _b.buf() + _offset;
int size = _b.len() - _offset;
DataView(data).write(tagLittleEndian(size)); // take size Use small end mode to write the header 4 In bytes
if (_tracker)
_tracker->got(size);
return data; // Return the final data
}If first contact BSON, You may think that if you only modify BSON A field in , The bottom layer will only update this small piece of data in situ , It won't cost much . But that's not the case , From the previous description, we can see , Every KV It is arranged in a compact order , If you add 、 Deleted or modified a field , To generate a new BSON file .
In addition to BSONObjBuilder Streaming generation BSON Out of document ,MongoDB The code also provides DOM Interface Used to modify or add or delete a field , But after the modification, it will still Generate a new BSON.
summary
BSON As JSON An extended storage format , At speed , Storage space and data types have been greatly improved , And in MongoDB Plays a key role in the document model .
This paper compares BSON and JSON The differences and advantages and disadvantages of , Through some typical examples, this paper deeply analyzes BSON The organization of data , And from the code introduced BSON Reading and writing process and some precautions .
BSON The data structure of is very clear and concise , But I don't think it is perfect enough in some aspects . For example, in terms of storage space , No variable length integer is used for coding , In terms of search and modification efficiency , There is still a lot of read-write amplification overhead .
边栏推荐
猜你喜欢

JS实现视频录制-以Cesium为例

How to run kevinchappell / FormBuilder

Source code analysis of Chang'an chain data storage

发布2年后涨价100美元,Meta Quest 2的逆生长

OA项目之我的审批(查询&会议签字)

产品经理:排查下线上哪里冒出个“系统异常”的错误提示

Ms721 load test

预处理与宏定义

Use cpolar to build a business website (5)

Pyqt5 rapid development and practice 4.7 qspinbox (counter) and 4.8 QSlider (slider)
随机推荐
JS realizes video recording - Take cesium as an example
Get wechat product details API
京东:按关键字搜索商品 API
JS jump to the page and refresh (jump to this page)
YY English learning about fish
调整数组使奇数全部都位于偶数前
ES6--拓展运算符运用
C language -- array
2019年全球半导体市场收入4183亿美元,同比下滑11.9%
slf4j简介说明
速卖通:按关键字搜索商品 API
图解LeetCode——剑指 Offer II 115. 重建序列(难度:中等)
Underlying principle of mvcc
分享Redshift渲染器的去噪方法技巧,一定要看看
'vite' is not an internal or external command, nor is it a runnable program or batch file
set--数据解构
技术分享 | 接口自动化测试中,如何做断言验证?
Pyqt5 rapid development and practice 4.5 button controls and 4.6 qcombobox (drop-down list box)
GLTF模型添加关节控制
Assignment 1 - Hello World ! - Simple thread Creation