当前位置:网站首页>Bosun query
Bosun query
2022-06-24 15:20:00 【Wang Lei -ai Foundation】
background
bosun It's a by Stack Exchange Open source monitoring and alarm system , The tools that can be benchmarked are prometheus Of alertmanager. bosun Is designed to work with a variety of tsdb Configure the monitoring alarm system , however bosun It also provides a set of dsl Used to query and monitor 、 Evaluation indicators , bring bosun It's also a kind of tsdb irrelevant ( Current support such as opentsdb, prometheus, influxdb, es Many other kinds tsdb Back end ) Index query language . To understand bosun How to generate an alarm , Or just use his index query ability , Coordination as grafana Such a monitoring front-end to display indicators , Then you must understand the language .
bosun It is not a very hot project , There are 3.1k star, There are few documents about him in the market , Most of them are literal translations of official documents . The purpose of this article is to introduce bosun How to query ( Mainly for the backend opentsdb), And some query skills .
Concept
First of all, understand bosun Some type concepts in queries :
- Scalar It's just a number
- NumberSet and Scalar It's basically one thing , But there is one more group tag, empty {} It is tag
- SeriesSet yes The most common format for characterizing the original indicator , and NumberSet Different , Its corresponding value is not a number , It's a set of associated timestamp values , such as Time 100 The next value 3.14, Time 200 The next value 3.28
- Results It is not a concept introduced in the document , It is the most common type of query in practice , It represents the most common result of a query : It's a group tag Different SeriesSet perhaps NumberSet Etc . There are different in the document tags Combination is also called group.
Inquire about
in the light of opentsdb Query in ,bosun Several query methods are provided
q
q(query string, startDuration string, endDuration string) seriesSet
This is the most commonly used query method , Most alarms are also queried with this statement , This statement is very simple , among query yes opentsdb Of Query statement ,startDuration and endDuration Is the start and end query time , such as q(sum:rate{counter}:sys.cpu.user, 5m, 1m), Represents a query sys.cpu.user indicators 5m Forward to 1m Some time ago sum:rate. There is a delay in the collection of indicators ,endDuration It is generally recommended that at least 1m front .
# Example
q("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "5m", "1m")
group result computations
{ }
{
"1620196950": 2.016666666666667,
"1620196980": 2.3666666666666667,
"1620197010": 1.0999999999999999,
"1620197040": 1.8333333333333335,
"1620197070": 2.7333333333333343,
"1620197100": 2.5,
"1620197130": 1.7000000000000002,
"1620197160": 0.9666666666666666
}bandQuery/overQuery
bandQuery(query string, duration string, period string, eduration string, num scalar) seriesSet band(query string, duration string, period string, num scalar) seriesSet
- bandQuery It means use query Statement is executed multiple times (num Time ) Inquire about , The time range of each query is determined by duration/period decision ,
- band yes bandQuery A special form of , It's equivalent to setting up eduration = period, such as
band("avg:os.cpu", "1h", "1d", 3)It is equivalent to querying the following three statementsq("avg:os.cpu", "25h", "1d"),q("avg:os.cpu", "49h", "2d"),q("avg:os.cpu", "73h", "3d"), Because it's set up eduration=period, So the latest cycle is (period+duration,period)
overQuery(query string, duration string, period string, eduration string, num scalar) seriesSet over(query string, duration string, period string, num scalar) seriesSet shiftBand(query string, duration string, period string, num scalar) seriesSet
- overQuery yes over and shiftBand The common form of , and bandQuery The difference is that , After the query, the query result will be tagged with query offset "shifted"
- over and shiftBand It's just overQuery A special form of , It is just equivalent to giving overQuery Of eduration Set to period and current time ( That is, do not fill in )
# Example , because The rest are just bandQuery and overQuery A special form of , Here are just two examples of these queries
> bandQuery("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "5m", "60m", "1m", 2)
group result computations
{ }
{
"1620195120": 69.96666666666665,
"1620195150": 5.816666666666666,
"1620195180": 5.766666666666667,
"1620195210": 4.3,
"1620195240": 5.7666666666666675,
"1620195270": 3.7666666666666675,
"1620195300": 4.4,
"1620195330": 4.933333333333334,
"1620195360": 4.033333333333334,
"1620195390": 1.7000000000000002,
"1620198720": 69.93333333333334,
"1620198750": 11.7,
"1620198780": 1.2999999999999998,
"1620198810": 1.8500000000000008,
"1620198840": 2.766666666666667,
"1620198870": 4.633333333333333,
"1620198900": 4.833333333333334,
"1620198930": 2.366666666666667,
"1620198960": 2.366666666666667,
"1620198990": 2.2666666666666666
}
> overQuery("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "5m", "60m", "1m", 2)
group result computations
{ shift=1m0s }
{
"1620198780": 69.93333333333334,
"1620198810": 11.7,
"1620198840": 1.2999999999999998,
"1620198870": 1.8500000000000008,
"1620198900": 2.766666666666667,
"1620198930": 4.633333333333333,
"1620198960": 4.833333333333334,
"1620198990": 2.366666666666667,
"1620199020": 2.366666666666667,
"1620199050": 2.2666666666666666
}
{ shift=1h1m0s }
{
"1620198780": 69.96666666666665,
"1620198810": 5.816666666666666,
"1620198840": 5.766666666666667,
"1620198870": 4.3,
"1620198900": 5.7666666666666675,
"1620198930": 3.7666666666666675,
"1620198960": 4.4,
"1620198990": 4.933333333333334,
"1620199020": 4.033333333333334,
"1620199050": 1.7000000000000002
}bandQuery and overQuery For the same time period of a query cycle ( For example, at this time of day ) Our indicators are very useful , And what's interesting is bandQuery It doesn't produce unjoined group, This is further explained in the following tips .
window
window(query string, duration string, period string, num scalar, funcName string) seriesSet
Compared with bandQuery and overQuery,window More useful for queries for presentation purposes , window The results of each query will be funcName Of reduction Calculation , The returned value and timestamp generate a new time series . for instance , You want to check the past 6 The number of requests per hour within an hour , You can use the following calculation method :
> window("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "60m", "60m", 6, "sum")
group result computations
{ }
{
"1620175620": 356260.0166666666,
"1620179220": 370473.99999999965,
"1620182820": 391460.0166666665,
"1620186420": 405893.36666666664,
"1620190020": 364280.9166666666,
"1620193620": 380179.3833333336
}coordination grafana You can draw such a curve or histogram
count/change
count Indicates that the query returns Results length , and change Indicates change , change("avg:rate:net.bytes", "60m", "") = avg(q("avg:rate:net.bytes", "60m", "")) * 60 * 60
Calculation
bosun The way we calculate is probably the most disturbing part , To understand this , First of all, we should understand several cores in combination with the concepts in Section 1 :
- Most of the returned results of a query are a set of SeriesSet perhaps NumberSet namely Results, For example, we use... When querying In this way query:
avg:rate:net.bytes{host=*}, Will automatically generate multiple group Of SeriesSet ( If not , It's just that screening can be written like thisavg:rate:net.bytes{}{host=1.2.3.4}) - bosun Most of the functions in the documentation are for a single group Of SeriesSet, That is, when applying functions to query results , Yes for each. group By application function , such as
avg(q("avg:rate:net.bytes{host=*}", "60m", ""))The results returned by the query are {host=a}, {host=b} wait , So for many group Separate application avg function - Different Results Calculate each other , for instance
+, It's for all group The combination is applied separately+Calculate , But not all group All combinations can calculate each other ,Only those that are subsets or equal to each other group To calculate, So there will be unjoined group,Not involved in the calculation group There will be a unjoined group, This calculation is a bit abstract , You can see the following examples to help understand . You can guess the result before you look at it , Make sure your understanding is correct .
# Two results The operation mode between
for g1 in Result1:
for g2 in Result2:
if g1 == g2 || g1 is subset of g2 || g2 is subset of g1:
Calculation
for g1 in Result1:
if g1 Not involved in the calculation :
Generate a unjoined group
for g2 in Result2:
if g2 Not involved in the calculation :
Generate a unjoined groupExample 1
$a = series("X=a1,Y=b1", 100, 1, 200, 2)
$b = series("X=a2,Y=b2", 100, 2, 200, 3)
$x = series("X=a1", 100, 2, 200, 1)
$y = series("X=a1,Y=b2", 100, 3, 200, 5)
$z = series("X=a2,Y=b2", 100, 3, 200, 2)
# {X=a1,Y=b1} {X=a2,Y=b2}
$ab = merge($a, $b)
# {X=a1} {X=a1,Y=b2} {X=a2,Y=b2}
$xyz = merge($x, $y, $z)
# The combinations that can participate in the calculation here are ({X=a1,Y=b1}, {X=a1}), ({X=a2,Y=b2}, {X=a2,Y=b2}), because {X=a1,Y=b2} Not involved in the calculation , So it will generate a unjoined group
$ab+$xyz
-----------------------------------
group result computations
{ X=a1, Y=b1 }
{
"100": 3,
"200": 3
}
{ X=a2, Y=b2 }
{
"100": 5,
"200": 5
}
{ X=a1, Y=b2 }
{
"100": "NaN",
"200": "NaN"
}
merge(series("X=a1,Y=b1", 100, 1, 200, 2), series("X=a2,Y=b2", 100, 2, 200, 3)) + merge(series("X=a1", 100, 2, 200, 1), series("X=a1,Y=b2", 100, 3, 200, 5), series("X=a2,Y=b2", 100, 3, 200, 2)) unjoined group (NaN)Example 2
$a = series("Y=b2", 100, 1, 200, 1)
$b = series("X=a1,Y=b1", 100, 3, 200, 5)
$c = series("X=a2,Y=b2", 100, 3, 200, 2)
$x = series("X=a2", 100, 2, 200, 1)
$y = series("X=a1,Y=b2", 100, 3, 200, 5)
$z = series("X=a2,Y=b2", 100, 3, 200, 2)
# {X=a1,Y=b1} {X=a2,Y=b2} {Y=b2}
$abc = merge($b, $c, $a)
# {X=a2,Y=b2} {X=a1,Y=b2} {X=a2}, Here is the {X=a2} It is placed last because if the first combination cannot be calculated, an error will be reported
$xyz = merge($z, $y, $x)
$abc + $xyz
-----------------------------------
group result computations
{ X=a2, Y=b2 }
{
"100": 6,
"200": 4
}
{ X=a2, Y=b2 }
{
"100": 4,
"200": 3
}
{ X=a1, Y=b2 }
{
"100": 4,
"200": 6
}
{ X=a2, Y=b2 }
{
"100": 5,
"200": 3
}
{ X=a1, Y=b1 }
merge(series("X=a2,Y=b2", 100, 3, 200, 2), series("X=a1,Y=b2", 100, 3, 200, 5), series("X=a2", 100, 2, 200, 1)) + merge(series("X=a1,Y=b1", 100, 3, 200, 5), series("X=a2,Y=b2", 100, 3, 200, 2), series("Y=b2", 100, 1, 200, 1))More examples
$aa=series("tagA=a", 0, 2, 60, 2)
$ab=series("tagA=a,tagB=b", 0, 2, 60, 1)
$ac=series("tagA=a,tagC=c", 0, 2, 60, 3)
$bb=series("tagB=b", 0, 2, 60, 2)
$cc=series("tagC=c", 0, 2, 60, 2)
# {tagA=a} {tagB=b} {tagC=c}
$abc=merge($aa,$bb,$cc)
# {tagA=a} {tagA=a,tagB=b}
$aab = merge($aa, $ab)
# {tagA=a} {tagA=a,tagC=c}
$aac = merge($aa, $ac)
# The combinations that can participate in the calculation here are ({tagA=a}, {tagA=a}) ({tagA=a},{tagA=a,tagC=c}) ({tagA=a,tagB=b},{tagA=a})
# $aab+$aac
# {tagA=a} {tagB=b}
$aabb = merge($aa, $bb)
# {tagA=a} {tagC=c}
$aacc = merge($aa, $cc)
# $aacc+$aabb
# {tagA=a} {tagC=c} + {tagA=a} {tagA=a,tagC=c}
# $aacc+$aac
# {tagA=a} {tagC=c} + {tagA=a} {tagB=b} {tagC=c}
# $aacc+$abcskill
avoid unjoined group
A common practice is to use group Some related operation functions , For example, when querying, it simply does not generate group, Use filter Statement query , such as avg(q("sum:rate:metrics.notexist{}{status=500)}", "1m", "0m")), Or use after query addtags, remove Such a function to handle tags, To avoid group Incompatibility between . Here is another ingenious approach , Can be ignored unjoined group. That is to use bandQuery To query ,
For example, an example of calculating the request error rate :
$key_err = "sum:rate{counter}:${service}.rpc.calledby.error.throughput{method=*}"
$key_succ = "sum:rate{counter}:${service}.rpc.calledby.success.throughput{method=*}"
$err_now = avg(q($key_err, "5m", "1m"))
$succ_now = avg(q($key_succ, "5m", "1m"))
$rate_now = $err_now / ($err_now +$succ_now)
$rate_nowUsing the above query method will produce a large number of unjoined group, as a result of rpc.calledby.error.throughput The of this indicator tags Quantity ratio success A lot less , But I hope that the returned results can bring method This grouping label . Use band The query method of is as follows :
$key_err = "sum:rate{counter}:${service}.rpc.calledby.error.throughput{method=*}"
$key_succ = "sum:rate{counter}:${service}.rpc.calledby.success.throughput{method=*}"
$err_now = avg(band($key_err, "4m", "1m", 1))
$succ_now = avg(band($key_succ, "4m", "1m", 1))
$rate_now = $err_now / ($err_now +$succ_now)
$rate_nowUse band The query will not produce unjoined group,unjoined group The results will be ignored , namely results In the calculation between , Generate unjoined group The steps of will be ignored .
grafana bosun plug-in unit
grafana bosun plug-in unit There are two built-in variables in
$ds: Suggested downsampling interval, This variable is very useful , In the use of queries, such asq("avg:$ds-avg:os.disk.fs.space_free{disk=*,host=backup}", "$start", ""), The query efficiency will be maintained when the user selects a large time range .$start: User selected start time
t Use of functions
group Operation function of There are several , Here is an introduction t function , He can put multiple group Of seriesSet join Become a group Of , To cooperate with some calculation functions . for instance , Calculation api Of 60 min weighting latency:
$latency=avg(q("avg:${service}.calledby.success.latency.us.pct99{handle_method=*}", "60m", ""))
$count=sum(q("sum:rate{counter,,,diff}:${service}.calledby.success.throughput{handle_method=*}", "60m", ""))
$total=sum(q("sum:rate{counter,,,diff}:${service}.calledby.success.throughput{}", "60m", ""))
sum(t($latency*($count/$total), ""))Other reference
边栏推荐
- Domestic payment system and payment background construction
- SF express: please sign for MySQL soul ten
- 股票开户要找谁?在线开户安全么?
- Unimelb COMP20008 Note 2019 SM1 - Data formats
- Allwinner a40i industrial Internet gateway design scheme, smart site, smart city core gateway
- leetcode 139. Word break word split (medium)
- Two way combination of business and technology to build a bank data security management system
- [ansible problem processing] remote execution user environment variable loading problem
- Data sharing between laravel lower views
- postgresql之List
猜你喜欢

Laravel8 uses faker to call factory to fill data

laravel8使用faker调用工厂填充数据

Left hand code, right hand open source, part of the open source road

Successfully solved: selenium common. exceptions. SessionNotCreatedException: Message: session not created: This versi

He is also a junior test engineer. Why is his salary high? The interview must be brilliant at these points

A common defect management tool - Zen, which teaches you from installation to using the handle

Application of motion capture system in positioning and mapping of mobile robot in underground tunnel

Don't underestimate the integral mall. It can play a great role

Virtual machines on the same distributed port group but different hosts cannot communicate with each other

postgresql之词法分析简介
随机推荐
时间同步业务的闭环管理——时间监测
Es search content top
Logstash introduction and simple case
Laravel8 uses faker to call factory to fill data
Brief discussion on the implementation framework of enterprise power Bi CI /cd
Concurrent writing of maps in golang
Allwinner a40i industrial Internet gateway design scheme, smart site, smart city core gateway
Keras deep learning practice (11) -- visual neural network middle layer output
作为一名开发者,对你影响最深的书籍是哪一本?
How to evaluate domestic reporting tools and Bi software
Is it safe to open an account in flush? What preparation is needed
Bitmap of redis data structure
Closed loop management of time synchronization service -- time monitoring
The security market has entered a trillion era, and the security B2B online mall system has been accurately connected to deepen the enterprise development path
[pytoch] quantification
Security Analysis on mining trend of dogecoin, a public cloud
Domestic payment system and payment background construction
[ansible problem processing] remote execution user environment variable loading problem
Overview of SAP marketing cloud functions (IV)
个人如何开户炒股 炒股开户安全吗