当前位置:网站首页>Prometheus TSDB analysis
Prometheus TSDB analysis
2022-07-28 08:58:00 【Brother Xing plays with the clouds】
summary
Prometheus It is a famous open source monitoring project , Its monitoring tasks are scheduled to specific The server , The The server Grab monitoring data from the target , Then save it in the local TSDB in . Custom powerful PromQL Language queries real-time and historical time series data , Support rich query combinations . Prometheus 1.0 Version of TSDB(V2 Storage engine ) be based on LevelDB, And used and Facebook Gorilla Same compression algorithm , To be able to 16 Data points of bytes are compressed to an average 1.37 Bytes . Prometheus 2.0 The new version of V3 Storage engine , Provides higher write and query performance . This paper mainly analyzes the design idea of the storage engine .
Design thinking
Prometheus take Timeseries Data press 2 One hour block For storage . Every block It consists of a directory , This directory contains : One or more chunk file ( preservation timeseries data )、 One metadata file 、 One index file ( adopt metric name and labels lookup timeseries The data is in chunk The location of the file ). The latest data written is stored in memory block in , achieve 2 Write to disk in hours . To prevent data loss due to program crash , Realized WAL(write-ahead-log) Mechanism , take timeseries The original data is added and written log Persistence in . Delete timeseries when , Deleted entries will be recorded in a separate tombstone In file , Not immediately from chunk File deletion . these 2 Hours of block It will be compressed into larger in the background block, Data compression is combined into higher level Of block After deleting the file level Of block file . This and leveldb、rocksdb etc. LSM The tree has the same idea . These designs and Gorilla The design is highly similar , therefore Prometheus Almost equal to a cache TSDB. The characteristics of its local storage determine that it cannot be used for long-term data storage , Can only be used for short window timeseries Data saving and query , And it doesn't have high availability ( Downtime will cause historical data to be unreadable ). Prometheus Limitations of local storage , So it provides API Interface for and long-term Storage integration , Save data to remote TSDB On . The API Interfaces use custom protocol buffer over HTTP And not stable , Consider switching to gRPC.
Disk file structure
In memory block
In memory block When the data is not flushed ,block Under the directory, it mainly saves wal file .
./data/01BKGV7JBM69T2G1BGBGM6KB12 ./data/01BKGV7JBM69T2G1BGBGM6KB12/meta.json ./data/01BKGV7JBM69T2G1BGBGM6KB12/wal/000002 ./data/01BKGV7JBM69T2G1BGBGM6KB12/wal/000001
persistent block
persistent block Under the table of contents wal File deleted ,timeseries Data saved in chunk In the document .index Used to index timeseries stay wal Location in the file .
./data/01BKGV7JC0RY8A6MACW02A2PJD ./data/01BKGV7JC0RY8A6MACW02A2PJD/meta.json ./data/01BKGV7JC0RY8A6MACW02A2PJD/index ./data/01BKGV7JC0RY8A6MACW02A2PJD/chunks ./data/01BKGV7JC0RY8A6MACW02A2PJD/chunks/000001 ./data/01BKGV7JC0RY8A6MACW02A2PJD/tombstones
mmap
Use mmap Read the compressed and merged large file ( Do not occupy too many handles ), Establish the mapping relationship between process virtual address and file offset , Only when querying and reading the corresponding location can the data be really read to the physical memory . Bypass the file system page cache, Reduced one copy of data . After query , The corresponding memory consists of Linux The system automatically recycles according to the memory pressure , It can be used for the next query hit before recycling . Therefore use mmap Automatically manage the memory cache required for queries , With simple management , The advantage of dealing with efficiency . It can also be seen from here , It's not entirely memory based TSDB, and Gorilla The difference is that querying historical data requires reading disk files .
Compaction
Compaction The main operations include merging block、 Delete expired data 、 restructure chunk data . Where multiple block Become a bigger block, Can effectively reduce block Number , When the query covers a long time range , Avoid the need to merge many block Query results of . To improve deletion efficiency , When deleting time series data , The deleted location will be recorded , Only block When all data needs to be deleted , Only then block Delete the entire directory . therefore block The size of the merge also needs to be limited , Avoid reserving too much deleted space ( Additional space occupation ). A better method is to keep the data for a long time , By percentage ( Such as 10%) Calculation block The maximum duration of .
Inverted Index
Inverted Index( Inverted index ) Provide fast search of data items based on a subset of its content . In short , I can view all tags as app=“nginx” The data of , Instead of going through every timeseries, And check whether the label is included . So , Every time series key Be assigned a unique ID, It allows you to retrieve... In a constant time , under these circumstances ,ID Forward index . Take a chestnut : Such as ID by 9,10,29 Of series contain label app="nginx", be lable "nginx" The inverted index of is [9,10,29] Used for quick queries containing this label Of series.
performance
In the article Writing a Time Series Database from Scratch in , The author gives benchmark The test result is Macbook Pro Write on to 2000 Ten thousand seconds . This data ratio Gorilla The goal in the paper 7 Billion writes per minute (1000 More than ten million per second ) Provides higher stand-alone performance .
边栏推荐
- kubernetes之Deployment
- Machine learning how to achieve epidemic visualization -- epidemic data analysis and prediction practice
- 【软考软件评测师】2013综合知识历年真题
- Div tags and span Tags
- Mobaxtermsession synchronization
- 思迈特软件完成C轮融资,让BI真正实现“普惠化”
- Use of tkmapper - super detailed
- Huid learning 7: Hudi and Flink integration
- Argocd Web UI loading is slow? A trick to teach you to solve
- A new method of exposing services in kubernetes clusters
猜你喜欢

Bluetooth technology | it is reported that apple, meta and other manufacturers will promote new wearable devices, and Bluetooth will help the development of intelligent wearable devices

No one wants to tell the truth about kubernetes secret

After reading these 12 interview questions, the new media operation post is yours

Smart software completed round C financing, making Bi truly "inclusive"

图片批处理|必备小技能

【软考软件评测师】2013综合知识历年真题

Use of tkmapper - super detailed

Machine learning how to achieve epidemic visualization -- epidemic data analysis and prediction practice

Eight ways to solve EMC and EMI conducted interference

Review the past and know the new MySQL isolation level
随机推荐
【软考软件评测师】2013综合知识历年真题
Gbase appears in Unicom cloud Tour (Sichuan Station) to professionally empower cloud ecology
Go synergy
Hcip day 8
TXT文本文件存储
There is a bug in installing CONDA environment
看完这12个面试问题,新媒体运营岗位就是你的了
Marketing play is changeable, and understanding the rules is the key!
Why setting application.targetframerate doesn't work
PHP Basics - PHP uses PDO
C #, introductory tutorial -- debugging skills and logical error probe technology and source code when the program is running
Export SQL server query results to excel table
Warehouse of multiple backbone versions of yolov5
ciou损失
Competition: diabetes genetic risk detection challenge (iFLYTEK)
Source code analysis of linkedblockingqueue
Recruiting talents, gbase high-end talent recruitment in progress
JS手写函数之slice函数(彻底弄懂包头不包尾)
第2章-14 求整数段和
Slice function of JS handwriting function (thoroughly understand the header but not the footer)