当前位置：网站首页>Influxdb series (IV) TSM engine (storage principle)

Influxdb series (IV) TSM engine (storage principle)

2022-07-27 19:49:00 【Lin Musen^~^】

3、 ... and 、 principle

File directory


# wal  Directory structure 
-- wal
   -- mydb
      -- autogen
         -- 1
            -- _00001.wal
         -- 2
            -- _00035.wal
      -- 2hours
         -- 1
            -- _00001.wal
 
# data  Directory structure 
-- data
   -- mydb
      -- autogen
         -- 1
            -- 000000001-000000003.tsm
         -- 2
            -- 000000001-000000001.tsm
      -- 2hours
         -- 1
            -- 000000002-000000002.tsm

LSM Tree

The core of the core idea is to give up some reading ability , In exchange for maximum write capability . Its core idea is actually very simple , Is to first store the latest data in memory , Wait until you accumulate the last more , Then merge and sort the data in memory to the end of the disk queue

For disks , The way to maximize the characteristics of disk technology is : Read or write a fixed size piece of data at one time , And reduce the number of random seek operations as much as possible .

The evolution of the storage engine

LevelDB

A lot of shard File descriptor , Run out of system resources .

BoltDB

Random writes occur in some environments , Resulting in write performance degradation

Timing data
Insert picture description here

It can be said that SeriesKey It's a data source

InfluxDB Use a... In memory Map To store timeline data , This Map It can be expressed as <Key, List>. among Key Expressed as seriesKey+fieldKey,Map In a Key Corresponding to one List,List Store timeline data in . In fact, this is a very natural idea , There are no profound difficulties . be based on Map Such a data structure , The process of writing timing data into memory can be expressed as the following three steps ：

After the time series data enters the system, first according to measurement + tags Piece together seriesKey
According to this seriesKey And to be checked fieldKey Piece together Key, And then Map According to the Key Find the corresponding time series set , If not, create a new List
Once found, it will Timestamp|Value The combined value is added to the timeline data link list

TSM Storage engine

TSM The storage engine consists of several parts ： cache、wal、tsm file、compactor.

cache、wal

When inserting data , In fact, at the same time cache And wal Middle write data , It can be said that cache yes wal Data in the file is cached in memory . When InfluxDB Startup time , Will traverse all wal file , Reconstruct cache, So even if the system breaks down , It will not lead to the loss of data .

**cache The data in is not infinitely growing , There is one maxSize Parameters are used to control when cache How much memory is used by the data in will be written to tsm file .** If not configured , The default upper limit is 25MB, whenever cache After the data in the , Will put the current cache Take a snapshot , Then clear the current cache The content in , Create a new one wal The file is used to write , The rest wal The file will be deleted at the end , The data in the snapshot will be sorted and written to a new tsm In file .

current cache There is a problem with the design of , When a snapshot is being written to a new tsm When you file , Current cache Because a lot of data is written , Reached the threshold again , At this time, the previous snapshot has not been completely written to disk ,InfluxDB The best way is to make subsequent write operations fail , Users need to handle it by themselves , Continue writing data after waiting for recovery

TSM file

Single tsm The main format of the document is as follows

It is mainly divided into four parts ： Header, Blocks, Index, Footer.

among Index Part of the contents will be cached in memory , The data structure of each part is described in detail below .

Insert picture description here

compactor

compactor Components continue to run in the background , every other 1 Seconds to check if there is any data that needs to be compressed and merged .

There are two main operations , One is cache When the size of the data in reaches the threshold value , Take a snapshot , Then transfer to a tsm In file .

The other is to merge the current tsm file , Will be more than one small tsm Merge the files into one , Make each file as large as possible as a single file , Reduce the number of files , And some data deletion operations are also completed at this time .

Read and write flow

Reading process

Insert picture description here

First of all, according to the Key Find the corresponding SeriesIndex Block, because Key Is ordered , So you can use binary search to implement
find SeriesIndex Block Then, according to the search time range , Use [MinTime, MaxTime] The index locates to possible Series Data Block list
The Series Data Block Load it into memory, decompress it, and then use the binary search algorithm to find

Writing process

httpd The service parses out all to be inserted Points, And database , Storage strategy, etc , Then call PointsWriter Of WritePoints Method insert data .
WritePoints The function will according to Point The timestamp of determines which one it belongs to Shard, Then call writeToShard Function batch will Points Write to different Shard in .

Hardware promotion
Insert picture description here