当前位置：网站首页>Prometheus TSDB analysis

Prometheus TSDB analysis

2022-07-28 08:58:00 【Brother Xing plays with the clouds】

summary

Prometheus It is a famous open source monitoring project , Its monitoring tasks are scheduled to specific The server , The The server Grab monitoring data from the target , Then save it in the local TSDB in . Custom powerful PromQL Language queries real-time and historical time series data , Support rich query combinations . Prometheus 1.0 Version of TSDB（V2 Storage engine ） be based on LevelDB, And used and Facebook Gorilla Same compression algorithm , To be able to 16 Data points of bytes are compressed to an average 1.37 Bytes . Prometheus 2.0 The new version of V3 Storage engine , Provides higher write and query performance . This paper mainly analyzes the design idea of the storage engine .

Design thinking

Prometheus take Timeseries Data press 2 One hour block For storage . Every block It consists of a directory , This directory contains ： One or more chunk file （ preservation timeseries data ）、 One metadata file 、 One index file （ adopt metric name and labels lookup timeseries The data is in chunk The location of the file ）. The latest data written is stored in memory block in , achieve 2 Write to disk in hours . To prevent data loss due to program crash , Realized WAL（write-ahead-log） Mechanism , take timeseries The original data is added and written log Persistence in . Delete timeseries when , Deleted entries will be recorded in a separate tombstone In file , Not immediately from chunk File deletion . these 2 Hours of block It will be compressed into larger in the background block, Data compression is combined into higher level Of block After deleting the file level Of block file . This and leveldb、rocksdb etc. LSM The tree has the same idea . These designs and Gorilla The design is highly similar , therefore Prometheus Almost equal to a cache TSDB. The characteristics of its local storage determine that it cannot be used for long-term data storage , Can only be used for short window timeseries Data saving and query , And it doesn't have high availability （ Downtime will cause historical data to be unreadable ）. Prometheus Limitations of local storage , So it provides API Interface for and long-term Storage integration , Save data to remote TSDB On . The API Interfaces use custom protocol buffer over HTTP And not stable , Consider switching to gRPC.

Disk file structure

In memory block

In memory block When the data is not flushed ,block Under the directory, it mainly saves wal file .

./data/01BKGV7JBM69T2G1BGBGM6KB12 ./data/01BKGV7JBM69T2G1BGBGM6KB12/meta.json ./data/01BKGV7JBM69T2G1BGBGM6KB12/wal/000002 ./data/01BKGV7JBM69T2G1BGBGM6KB12/wal/000001

persistent block

persistent block Under the table of contents wal File deleted ,timeseries Data saved in chunk In the document .index Used to index timeseries stay wal Location in the file .

./data/01BKGV7JC0RY8A6MACW02A2PJD ./data/01BKGV7JC0RY8A6MACW02A2PJD/meta.json ./data/01BKGV7JC0RY8A6MACW02A2PJD/index ./data/01BKGV7JC0RY8A6MACW02A2PJD/chunks ./data/01BKGV7JC0RY8A6MACW02A2PJD/chunks/000001 ./data/01BKGV7JC0RY8A6MACW02A2PJD/tombstones

mmap

Use mmap Read the compressed and merged large file （ Do not occupy too many handles ）, Establish the mapping relationship between process virtual address and file offset , Only when querying and reading the corresponding location can the data be really read to the physical memory . Bypass the file system page cache, Reduced one copy of data . After query , The corresponding memory consists of Linux The system automatically recycles according to the memory pressure , It can be used for the next query hit before recycling . Therefore use mmap Automatically manage the memory cache required for queries , With simple management , The advantage of dealing with efficiency . It can also be seen from here , It's not entirely memory based TSDB, and Gorilla The difference is that querying historical data requires reading disk files .

Compaction

Compaction The main operations include merging block、 Delete expired data 、 restructure chunk data . Where multiple block Become a bigger block, Can effectively reduce block Number , When the query covers a long time range , Avoid the need to merge many block Query results of . To improve deletion efficiency , When deleting time series data , The deleted location will be recorded , Only block When all data needs to be deleted , Only then block Delete the entire directory . therefore block The size of the merge also needs to be limited , Avoid reserving too much deleted space （ Additional space occupation ）. A better method is to keep the data for a long time , By percentage （ Such as 10%） Calculation block The maximum duration of .

Inverted Index

Inverted Index（ Inverted index ） Provide fast search of data items based on a subset of its content . In short , I can view all tags as app=“nginx” The data of , Instead of going through every timeseries, And check whether the label is included . So , Every time series key Be assigned a unique ID, It allows you to retrieve... In a constant time , under these circumstances ,ID Forward index . Take a chestnut ： Such as ID by 9,10,29 Of series contain label app="nginx", be lable "nginx" The inverted index of is [9,10,29] Used for quick queries containing this label Of series.

performance

In the article Writing a Time Series Database from Scratch in , The author gives benchmark The test result is Macbook Pro Write on to 2000 Ten thousand seconds . This data ratio Gorilla The goal in the paper 7 Billion writes per minute （1000 More than ten million per second ） Provides higher stand-alone performance .

原网站

版权声明
本文为[Brother Xing plays with the clouds]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/197/202207131400314695.html