当前位置：网站首页>[Prometheus] an optimization record of Prometheus Federation

[Prometheus] an optimization record of Prometheus Federation

2022-06-21 05:50:00 【Meepoljd】

Prometheus An optimization record of the Federation

Preface
Text
- Regroup labels
- Useless indicator filtering

Preface

Under the current network environment , my Prometheus It uses the federated feature , This is because the monitored servers exist in multiple physical locations , At the same time, there are many servers , Only after comprehensive consideration, the federal ; But at that time, the servers were scattered , A single collection node does not need to monitor too many servers , It's probably in 500 I can't find it , Therefore, the performance problem has never occurred , I also thought there would be no pit .

During this period , It is necessary to carry out unified management for the data midrange cluster node-exporter Index collection of , The cluster size is about 2600 platform , For some special reason , End use 1 Federation nodes and 2 Acquisition nodes , The data obtained by the front end can be directly connected to the federated node .

The ideal is beautiful , Reality is often cruel , Each of my collection endpoints is the default pull interval , namely scrape_interval yes 15s,scrape_timeout yes 10s, However, each pull of the federated node must exceed 20s, Even 30s, This has directly led to the inevitable omission of some indicators , In response to this question , I still hope to optimize it ：

Insert picture description here

Text

Regroup labels

First of all, my idea is , Put the labels printed at the collection node into the Federation node for unified marking , In this way, you can optimize the time of each pull ？

therefore , I'll start with cluster The tag is placed on the federated node and tested ：
Insert picture description here
Seems to have some effect , But this obviously can not meet the demand .

Useless indicator filtering

A lot of information , In fact, the first determinant of pull time is the number of indicators pulled each time , In fact, I can reduce the burden of pulling to a certain extent by re marking for optimization , But after all, there is no real reduction in the number of indicators ; So I might as well try to reduce the number of indicators ？
Insert picture description here
therefore , Try to change the... Of the federated node prometheus.yml The configuration file ：

scrape_configs:
  - job_name: 'federate'
    honor_labels: true
    metrics_path: '/federate'
    params:
        'match[]':
          - '{__name__=~"node_*"}' #  Collect only node Relevant indicators

Insert picture description here
cool！ This time it will be shortened to 20 About a second , Actually, from my point of view, it's already very good . In addition to this method , It can also separate and manage the collected index data , For example, use different job Distinguish ;