当前位置:网站首页>Druid query
Druid query
2022-06-13 03:28:00 【TRX1024】
Catalog
summary
Druid The query is through HTTP REST Send the query request by , The description of the query is written in a JSON In file , The services that can handle query requests include Broker、Historical and Realtime, These service nodes provide the same query interface , But generally, the query request is sent to Broker node , from Broker The node forwards the query data source to Historical perhaps RealTime node .
in addition , There are many open source queries using other languages Druid Packets of data . For details, please refer to :http://druid.io/docs/latest/development/libraries.html
Druid Self contained JSON+HTTP Query mode of , The data source used is lxw1234.
Execute the query ( What is specified here is Broker Node The address of ):
curl -X POST 'http://node2:8092/druid/v2/?pretty' -H 'content-type: application/json' -d @query.json
Druid About Query The official document address of is at :http://druid.io/docs/latest/querying/querying.html
Query classification :
There are three basic types of queries : Aggregate query (Aggregation Queries)、 Metadata query (Metadata Queries) And search queries (Search Queries).
- Aggregate query (Aggregation Queries)
- Timeseries
- TopN
- GroupBy
- Metadata query (Metadata Queries)
- TimeBoundary
- SegmentMetadata
- DatasourceMetadata
- Search for (Search Queries)
- Search
1、 Aggregate query (Aggregation Queries)
Aggregate query refers to the indicator data according to certain rules , Aggregate on one or more dimensions .
Divided into three categories :
- Timeseries
- TopN
- GroupBy
1.1 Timeseries
Timeseries Query aggregate query according to the specified time interval and time interval , You can also specify filter criteria in the query , Indicator columns to be aggregated 、 etc. .
timeseries The query includes the following fields :
| Field name | describe | Whether must |
|---|---|---|
| queryType | Query type , Only fill in timeseries Inquire about | yes |
| dataSource | The data source to query | yes |
| intervals | The time range of the query , The default is ISO-8601 Format | yes |
| granularity | The time granularity of query results aggregation ( The time interval ) | yes |
| aggregations | The type of aggregation 、 Name of field and result display | yes |
| postAggregations | Post polymerization | no |
| filter | Filter conditions | no |
| descending | Descending or not | no |
| context | Specify some query parameters | no |
timeseries Output statistics of specified conditions in each time granularity , adopt filter Specify filter criteria , adopt aggregations and postAggregations Specify the aggregation method .timeseries Cannot output dimension information ,granularity Support all,none,second,minute,hour,day,week,month,year Equal dimension .
A simple Timeseries The query configuration file is as follows :
{
"queryType": "timeseries",
"dataSource": "lxw1234",
"intervals": [ "2015-11-15/2015-11-18" ],
"granularity": "day",
"aggregations": [
{
"type": "longSum", "fieldName": "count", "name": "total_count"}
]
}
Running results :
Zero-filling:
In general , Use Timeseries Query summary by day , And one day there is no data ( It's filtered out ), The summary result of that day will be displayed in the result 0. Like the data above , hypothesis 2015-11-15 There is no eligible data on this day , Then the result will be :
{
"timestamp" : "2015-11-15T00:00:00.000Z",
"result" : {
"total_count" : 0
}
}
If you don't want this data in the results , Then you can use context Option to remove it ,context Is used to specify some query parameters , The configuration is as follows :
"context" : {
"skipEmptyBuckets": "true"
}
1.2 TopN(TopN queries)
TopN It is based on one dimension GroupBy, Then sort according to the summarized indicators , take TopN.
stay Druid in ,TopN The query is better than that of the same implementation GroupBy+Ordering Fast efficiency .
In principle , In fact, it means divide and rule , Take for example Top10, Each task node takes its own Top10, And then send it to Broker, from Broker From each node Top10 in , Then summarize the final Top10.
TopN The query includes the following fields :
| Field name | describe | Whether must |
|---|---|---|
| queryType | Query type , Only fill in timeseries Inquire about | yes |
| dataSource | The data source to query | yes |
| intervals | The time range of the query , The default is ISO-8601 Format | yes |
| granularity | The time granularity of query results aggregation ( The time interval ) | yes |
| dimension | Conduct TopN Query dimensions , One TopN A query can only have one dimension | yes |
| threshold | TopN Medium N value | yes |
| metric | For statistics and sorting metric | yes |
| aggregations | The type of aggregation 、 Name of field and result display | yes |
| postAggregations | Post polymerization | no |
| filter | Filter conditions | no |
| context | Specify some query parameters | no |
A simple TopN Query profile :
{
"queryType": "topN",
"dataSource": "lxw1234",
"granularity": "day",
"dimension": "cookieid",
"metric": "total_count",
"threshold" : 3,
"aggregations": [
{
"type": "longSum", "fieldName": "count", "name": "total_count"}
],
"intervals": ["2015-11-17/2015-11-18"]
}
This query finds out every day pv The most Top 3 cookieid, Query results :
Be careful :metric: yes TopN exclusive
metric Configuration mode :
"metric":"<metric_name>" The default is sorted in ascending order
"metric" : {
"type" : "numeric", // Assign according to numeric null
"metric" : "<metric_name>"
}
"metric" : {
"type" : "inverted", // Assign according to numeric Ascending sort
"metric" : "<metric_name>"
}
"metric" : {
"type" : "lexicographic", // Specifies to sort in dictionary order
"metric" : "<metric_name>"
}
"metric" : {
"type" : "alphaNumeric", // Specifies to sort by number
"metric" : "<metric_name>"
}
1.3 GroupBy
GroupBy Aggregate query is on multiple dimensions , Aggregate indicators .Druid I suggest , It works TimeseriesQueries and TopN Try not to use the implemented query GroupBy, because GroupBy The performance of is worse .
// TODO
Reference resources :http://lxw1234.com/archives/2015/11/561.htm
2、 Metadata query (Metadata Queries)
2.1 Time range query (Time Boundary Queries)
Time range query is used to query the minimum and maximum time points of a data source .
{
"queryType" : "timeBoundary",
"dataSource": "lxw1234"
}
Query results :
[ {
"timestamp" : "2015-11-15T00:00:00.000+08:00",
"result" : {
"minTime" : "2015-11-15T00:00:00.000+08:00",
"maxTime" : "2015-11-18T23:59:59.000+08:00"
}
} ]
in addition , There's another. bound Options , Used to specify whether to return the maximum time point or the minimum time point , If you don't specify , Then both return :
{
"queryType" : "timeBoundary",
"dataSource": "lxw1234",
"bound": "maxTime"
}
At this time, only the maximum time point is returned :
[ {
"timestamp" : "2015-11-18T23:59:59.000+08:00",
"result" : {
"maxTime" : "2015-11-18T23:59:59.000+08:00"
}
} ]
2.2 Segments Metadata query (Segment Metadata Queries)
Segments Metadata query can query every Segment For :
- Name
- Segment Cardinality of all columns in (Cardinality), Not STRING The columns of type are null;
- Estimated size of each column (Bytes);
- The Segment The time span of ;
- The type of the column ;
- The Segment Estimated total size of ;
- Segment ID;
Query configuration :
{
"queryType":"segmentMetadata",
"dataSource":"lxw1234",
"intervals":["2015-11-15/2015-11-19"]
}
Query results ( Only one Segment):
{
"id" : "lxw1234_2015-11-17T00:00:00.000+08:00_2015-11-18T00:00:00.000+08:00_2015-11-18T16:53:02.158+08:00_1",
"intervals" : [ "2015-11-17T00:00:00.000+08:00/2015-11-18T00:00:00.000+08:00" ],
"columns" : {
"__time" : {
"type" : "LONG",
"size" : 46837800,
"cardinality" : null,
"errorMessage" : null
},
"cookieid" : {
"type" : "STRING",
"size" : 106261532,
"cardinality" : 1134359,
"errorMessage" : null
},
"count" : {
"type" : "LONG",
"size" : 37470240,
"cardinality" : null,
"errorMessage" : null
},
"ip" : {
"type" : "STRING",
"size" : 63478131,
"cardinality" : 735562,
"errorMessage" : null
}
},
"size" : 272782823
}
2.3 Data source metadata query (Data Source Metadata Queries)
This query only returns the last time that data entered the data source .
such as , Query profile :
{
"queryType" : "dataSourceMetadata",
"dataSource": "lxw1234"
}
The result is :
[ {
"timestamp" : "2015-11-18T23:59:59.000+08:00",
"result" : {
"maxIngestedEventTime" : "2015-11-18T23:59:59.000+08:00"
}
} ]
3、 Search for (Search Queries)
select Be similar to sql in select operation ,select To view druid Stored data in , It also supports viewing specified dimensions and time periods according to specified filters metric, Can pass descending Field specifies the sort order , It supports paging pull , But does not support aggregations and postAggregations.
json Examples are as follows :
{
"queryType": "select",
"dataSource": "app_auto_prem_qd_pp3",
"granularity": "all",
"intervals": "1917-08-25T08:35:20+00:00/2017-08-25T08:35:20+00:00",
"dimensions": [
"status",
"is_new_car"
],
"pagingSpec":{
"pagingIdentifiers":{
},
"threshold":
},
"context" : {
"skipEmptyBuckets" : "true"
}
}
amount to SQL sentence
select status,is_new_car from app_auto_prem_qd_pp3 limit 3
边栏推荐
- How to Draw Useful Technical Architecture Diagrams
- Data from the first to seventh census (to counties)
- Polymorphism in golang
- Solution of Kitti data set unable to download
- Mongodb distributed cluster deployment process
- MySQL learning summary 10: detailed explanation of view use
- [azure data platform] ETL tool (1) -- Introduction to azure data factory
- P1048 [noip2005 popularization group] Drug collection
- Querywrapper constructor method
- [JVM Series 5] performance testing tool
猜你喜欢

Aggregation analysis of research word association based on graph data

Use cypher to get the tree of the specified structure

Microservice practice based on rustlang
![[azure data platform] ETL tool (1) -- Introduction to azure data factory](/img/0c/cd054c65aee6db5ae690f104db58a3.jpg)
[azure data platform] ETL tool (1) -- Introduction to azure data factory
![[azure data platform] ETL tool (7) - detailed explanation of ADF copy data](/img/d1/7c35e77a2b4f06dd9cef918da1104e.jpg)
[azure data platform] ETL tool (7) - detailed explanation of ADF copy data

How to Draw Useful Technical Architecture Diagrams
![[azure data platform] ETL tool (8) - ADF dataset and link service](/img/bf/d6d3a8c1139bb8d38ab9ee1ab9754e.jpg)
[azure data platform] ETL tool (8) - ADF dataset and link service

Quickly obtain the attributes of the sub graph root node

Unified scheduling and management of dataX tasks through web ETL
![[azure data platform] ETL tool (6) -- re understanding azure data factory](/img/b5/da5dc9815fb9729fb367f2482913b7.jpg)
[azure data platform] ETL tool (6) -- re understanding azure data factory
随机推荐
Summary of virtualization technology development
Spark Foundation
brew工具-“fatal: Could not resolve HEAD to a revision”错误解决
Array in PHP array function_ Slice and array_ flip
[azure data platform] ETL tool (3) - azure data factory copy from local data source to azure
2000-2019 enterprise registration data of all provinces, cities and counties in China (including longitude and latitude, registration number and other multi indicator information)
Configuration and practice of shardingsphere JDBC sub database separation of read and write
Three ways to start WPF project
Panel for measuring innovation efficiency of 31 provinces in China (using Malmquist method)
Nuggets new oil: financial knowledge map data modeling and actual sharing
English语法_频率副词
Unified scheduling and management of dataX tasks through web ETL
[azure data platform] ETL tool (5) -- use azure data factory data stream to convert data
English语法_方式副词-位置
Panel data set of rural cities and towns: per capita consumption and expenditure of prefecture level cities 2012-2019 & rural data of provinces 2013-2019
How to Draw Useful Technical Architecture Diagrams
This article takes you to learn DDD, basic introduction
MASA Auth - SSO与Identity设计
CXGRID keeps the original display position after refreshing the data
Prefecture level city - air flow coefficient data - updated to 2019 (including 10m wind speed, boundary height, etc.)