当前位置:网站首页>Mongodb learning notes: bson structure analysis
Mongodb learning notes: bson structure analysis
2022-07-27 20:22:00 【Pengzhenyi】
BSON What is it?
MongoDB As a popular document database , use BSON Format to support the document model .
BSON The full name is Binary JSON, and JSON It's like , But it is stored in binary format . comparison JSON There are the following advantages :
- Faster access .BSON Will store Value The type of , Compared with plaintext storage , There is no need to convert string type to other types . Integer 12345678 For example ,JSON You need to convert the string to an integer , and BSON Integer type flags are stored in , And use 4 Bytes directly store integer values ;
- Lower storage space . Or integer 12345678 For example ,JSON Using plaintext storage requires 8 Bytes , however BSON about Int32 The values of are uniformly 4 Byte storage ,Long and Double use 8 Byte storage . Of course, the lower storage space here is also specific , For example, for small integers ,BSON The space consumed is higher ;
- More data types .BSON comparison JSON, Added BinData,TimeStamp,ObjectID,Decimal128 Other types .
MongoDB Official documents There is a more authoritative and intuitive description of this , Summarized below :
JSON | BSON | |
|---|---|---|
Encoding mode | UTF-8 String | Binary |
data type | String, Boolean, Number, Array | String, Boolean, Number (Integer, Float, Long, Decimal128...), Array, Date, Raw Binary |
Readability | Human and Machine | Machine Only |
This article will BSON In-depth analysis of the storage format , And analyze from the code level BSON Storage and parsing process , Make everyone right BSON Have a deeper understanding .
BSON Storage format
The simplest BSON file , The front and back can be disassembled into the following parts :
- The total length of the document , Occupy 4 Bytes ;
- Value type , Reference resources Code definition , Occupy 1 Bytes ;
- Key Of String Representation form (Key Can only String type ), Length is not fixed , With '\0' ending , Occupy len(Key)+1 Bytes ;
- Value Binary storage , such as Int32 Occupy 4 byte ,Long and Double Occupy 8 Bytes, etc , The following article will analyze the common types one by one ;
- Document to '\0' ending , That is, traversal BSON At the end , common EOO(End Of Object), Occupy 1 Bytes ;
The following is a list of commonly used Int32, Double, String, Embedded documents ,Array type , And analyze their 16 Base representation .
Int type
Double type
Double Type take up 8 Byte space , Use IEEE 754 Standard to binary storage .
String type , And many KV Yes
String The type header contains additional 4 Byte length space , And take '\0' ending .
Nested documents
Nested documents are the same as ordinary documents , The head also contains additional 4 Byte length space . Take the following example {"b" : NumberInt(1)} The storage length of is 12 byte .
An array type
The array type header has 4 Byte storage length , Each element corresponds to a subscript , from '0' Began to increase .
For example, in the following example ,"a.0" It means the first one 1 Elements , The value is Double(1), "a.3" It means the first one 4 Elements , The value is "4".
BSON Parsing and encapsulation of
Analytical process
analysis BSON file when , First read the header in small end mode 4 Bytes , convert to Int32 Length information of type , obtain BSON The end of the document .
Then according to the introduction in the previous section BSON Format information , gaining Value type , Key, as well as Value. Repeat the above process through the iterator to get BSON All of the KV Yes .
The above process can be referred to MongoDB Right in code BSONObj and BSONObjIterator The definition of :
Some key codes are excerpted as follows :
// According to the incoming binary BSON Data construction iterator
explicit BSONObjIterator(const BSONObj& jso) {
int sz = jso.objsize(); // Small end mode read head 4 Bytes , obtain int32 The length of the type
if (MONGO_unlikely(sz == 0)) {
_pos = _theend = 0;
return;
}
_pos = jso.objdata() + 4; // The real starting position
_theend = jso.objdata() + sz - 1; // At the end of
}
// Determine whether the iterator is currently at the end
bool more() {
return _pos < _theend;
}
// Get the current point of the iterator BSONElement( You can think of it as one KV data ), Then iterators ++
BSONElement next() {
verify(_pos <= _theend);
BSONElement e(_pos); // From the current position , encapsulation KV data
_pos += e.size(); // iterator ++
return e;
}If first contact BSON , You may think that the visit BSON A field in , There will be Hash Or jump tables and other data structures to accelerate , achieve O(1) perhaps O(logN) Search efficiency . In principle , You must iterate from front to back through the iterator , The time complexity is O(N).
Packaging process
BSON The encapsulation process of documents can be regarded as the reverse process of parsing . First, keep it on the head 4 Bytes , Then continue to Value type ,Key, Value In binary form , Then add... At the end of the document '\0' EOO sign , Finally, the length will be calculated ( Including storage length 4 Bytes themselves ) Stored in the head reserved 4 In bytes .
The above process can be referred to MongoDB Right in code BSONObjBuilder The definition of :
Some key codes are excerpted as follows :
// Construct a BSONObjBuilder
BSONObjBuilder(int initsize = 512)
: _b(_buf), _buf(initsize), _offset(0), _s(this), _tracker(0), _doneCalled(false) {
// Skip over space for the object length. The length is filled in by _done.
_b.skip(sizeof(int)); // Head reservation 4 byte , Calling _done() When filling length
// Reserve space for the EOO byte. This means _done() can't fail.
_b.reserveBytes(1); // The tail is reserved 1 Bytes of EOO
}
// Go to BSONObjBuilder Insert a Value The type is int32 Of KV Yes
BSONObjBuilder& append(StringData fieldName, int n) {
_b.appendNum((char)NumberInt); // First, additional Value type , Occupy 1 Bytes
_b.appendStr(fieldName); // Append string type Key, With '\0' ending
_b.appendNum(n); // Additional 4 Binary integer of bytes Value
return *this;
}
// KV After the data is appended , call done How to deal with closeouts
char* _done() {
if (_doneCalled)
return _b.buf() + _offset;
_doneCalled = true;
// TODO remove this or find some way to prevent it from failing. Since this is intended
// for use with BSON() literal queries, it is less likely to result in oversized BSON.
_s.endField();
_b.claimReservedBytes(1); // Prevents adding EOO from failing.
_b.appendNum((char)EOO);
// Determine the starting position and size of the final data
// there _offset For nesting builder Share a buffer Spatial scene , otherwise _offset by 0
char* data = _b.buf() + _offset;
int size = _b.len() - _offset;
DataView(data).write(tagLittleEndian(size)); // take size Use small end mode to write the header 4 In bytes
if (_tracker)
_tracker->got(size);
return data; // Return the final data
}If first contact BSON, You may think that if you only modify BSON A field in , The bottom layer will only update this small piece of data in situ , It won't cost much . But that's not the case , From the previous description, we can see , Every KV It is arranged in a compact order , If you add 、 Deleted or modified a field , To generate a new BSON file .
In addition to BSONObjBuilder Streaming generation BSON Out of document ,MongoDB The code also provides DOM Interface Used to modify or add or delete a field , But after the modification, it will still Generate a new BSON.
summary
BSON As JSON An extended storage format , At speed , Storage space and data types have been greatly improved , And in MongoDB Plays a key role in the document model .
This paper compares BSON and JSON The differences and advantages and disadvantages of , Through some typical examples, this paper deeply analyzes BSON The organization of data , And from the code introduced BSON Reading and writing process and some precautions .
BSON The data structure of is very clear and concise , But I don't think it is perfect enough in some aspects . For example, in terms of storage space , No variable length integer is used for coding , In terms of search and modification efficiency , There is still a lot of read-write amplification overhead .
边栏推荐
- [RCTF2015]EasySQL-1|SQL注入
- js跳转页面并刷新(本页面跳转)
- Codeforces Round #810 (Div. 2)B.party(思维题)超详细题解
- Western digital mobile hard disk can't be read (the idiom of peace of mind)
- Dcm11- write the function and configuration of the data service ($2e) according to the identifier [based on DaVinci configurator classic]
- Zepto入门详解
- kubectl 获取pod日志 —— 筑梦之路
- Graphic leetcode - Sword finger offer II 115. reconstruction sequence (difficulty: medium)
- JS jump to the page and refresh (jump to this page)
- 我也是醉了,Eureka 延迟注册还有这个坑
猜你喜欢
![[paper reading] rich feature hierarchies for accurate object detection and semantic segmentation](/img/a9/690f52b5c4afba684f0add2434888c.png)
[paper reading] rich feature hierarchies for accurate object detection and semantic segmentation

Solve the problem of displaying the scroll bar when there is no data in the viewui table

ECU software and hardware architecture

Unity2d dynamic cartoon script (animation demonstration II for the chapter of Tiger Bridge)

Simple application of multipoint bidirectional republication and routing strategy

Connection pool - return connection details (Part 1)

OA项目之我的审批(查询&会议签字)

Compiling ncnn with vs

2022年,软件测试还能学吗?别学了,软件测试岗位饱和了...

图解LeetCode——592. 分数加减运算(难度:中等)
随机推荐
ZJNU 22-07-26 比赛心得
Huawei connect conference 2022 opens Bangkok trip; Facebook pushes the video revenue sharing function, and the creator can get 20% share
Graphic leetcode - Sword finger offer II 115. reconstruction sequence (difficulty: medium)
What does bus mean
为什么需要第三方支付?
Technology sharing | how to do Assertion Verification in interface automated testing?
ZJNU 22-07-26 competition experience
Rodin installs the SMT solvers plug-in
How to quickly improve the three minute response rate of Tiktok store? What will affect the reply rate of Tiktok store?
C # network application programming, experiment 2: IP address translation and domain name resolution exercises
Solve the problem of displaying the scroll bar when there is no data in the viewui table
Redis hash structure command
Talk about how redis handles requests
2019年中国智能机市场:华为拿下近4成份额,稳坐国内第一
kubectl 获取pod日志 —— 筑梦之路
会员卡头部组件使用文档
libpcap库和pcap_sendpacket接口函数了解
Connection pool - return connection details (Part 1)
shell
技术分享 | 接口自动化测试中,如何做断言验证?