当前位置：网站首页>Druid query

Druid query

2022-06-13 03:28:00 【TRX1024】

Catalog

summary
Query classification ：

summary

Druid The query is through HTTP REST Send the query request by , The description of the query is written in a JSON In file , The services that can handle query requests include Broker、Historical and Realtime, These service nodes provide the same query interface , But generally, the query request is sent to Broker node , from Broker The node forwards the query data source to Historical perhaps RealTime node .

in addition , There are many open source queries using other languages Druid Packets of data . For details, please refer to ：http://druid.io/docs/latest/development/libraries.html

Druid Self contained JSON+HTTP Query mode of , The data source used is lxw1234.

Execute the query （ What is specified here is Broker Node The address of ）：

curl -X POST 'http://node2:8092/druid/v2/?pretty' -H 'content-type: application/json' -d @query.json

Druid About Query The official document address of is at ：http://druid.io/docs/latest/querying/querying.html

Query classification ：

There are three basic types of queries ： Aggregate query （Aggregation Queries）、 Metadata query （Metadata Queries） And search queries （Search Queries）.

Aggregate query （Aggregation Queries）
- Timeseries
- TopN
- GroupBy
Metadata query （Metadata Queries）
- TimeBoundary
- SegmentMetadata
- DatasourceMetadata
Search for （Search Queries）
- Search

1、 Aggregate query （Aggregation Queries）

Aggregate query refers to the indicator data according to certain rules , Aggregate on one or more dimensions .

Divided into three categories ：

Timeseries
TopN
GroupBy

1.1 Timeseries

Timeseries Query aggregate query according to the specified time interval and time interval , You can also specify filter criteria in the query , Indicator columns to be aggregated 、 etc. .

timeseries The query includes the following fields ：

Field name	describe	Whether must
queryType	Query type , Only fill in timeseries Inquire about	yes
dataSource	The data source to query	yes
intervals	The time range of the query , The default is ISO-8601 Format	yes
granularity	The time granularity of query results aggregation （ The time interval ）	yes
aggregations	The type of aggregation 、 Name of field and result display	yes
postAggregations	Post polymerization	no
filter	Filter conditions	no
descending	Descending or not	no
context	Specify some query parameters	no

timeseries Output statistics of specified conditions in each time granularity , adopt filter Specify filter criteria , adopt aggregations and postAggregations Specify the aggregation method .timeseries Cannot output dimension information ,granularity Support all,none,second,minute,hour,day,week,month,year Equal dimension .

A simple Timeseries The query configuration file is as follows ：

{
    
    "queryType": "timeseries",
    "dataSource": "lxw1234",
    "intervals": [ "2015-11-15/2015-11-18" ],
    "granularity": "day",
    "aggregations": [
        {
    "type": "longSum", "fieldName": "count", "name": "total_count"}
    ]
}

Running results ：
Insert picture description here
Zero-filling：
In general , Use Timeseries Query summary by day , And one day there is no data （ It's filtered out ）, The summary result of that day will be displayed in the result 0. Like the data above , hypothesis 2015-11-15 There is no eligible data on this day , Then the result will be ：

{
    
  "timestamp" : "2015-11-15T00:00:00.000Z",
  "result" : {
    
    "total_count" : 0
  }
}

If you don't want this data in the results , Then you can use context Option to remove it ,context Is used to specify some query parameters , The configuration is as follows ：

"context" : {
    
    "skipEmptyBuckets": "true"
 }

1.2 TopN（TopN queries）

TopN It is based on one dimension GroupBy, Then sort according to the summarized indicators , take TopN.

stay Druid in ,TopN The query is better than that of the same implementation GroupBy+Ordering Fast efficiency .

In principle , In fact, it means divide and rule , Take for example Top10, Each task node takes its own Top10, And then send it to Broker, from Broker From each node Top10 in , Then summarize the final Top10.

TopN The query includes the following fields ：

Field name	describe	Whether must
queryType	Query type , Only fill in timeseries Inquire about	yes
dataSource	The data source to query	yes
intervals	The time range of the query , The default is ISO-8601 Format	yes
granularity	The time granularity of query results aggregation （ The time interval ）	yes
dimension	Conduct TopN Query dimensions , One TopN A query can only have one dimension	yes
threshold	TopN Medium N value	yes
metric	For statistics and sorting metric	yes
aggregations	The type of aggregation 、 Name of field and result display	yes
postAggregations	Post polymerization	no
filter	Filter conditions	no
context	Specify some query parameters	no

A simple TopN Query profile ：

{
    
  "queryType": "topN",
  "dataSource": "lxw1234",
  "granularity": "day",
  "dimension": "cookieid",
  "metric": "total_count",
  "threshold" : 3,
  "aggregations": [
    {
    "type": "longSum", "fieldName": "count", "name": "total_count"}
  ],
  "intervals": ["2015-11-17/2015-11-18"]
}

This query finds out every day pv The most Top 3 cookieid, Query results ：
Insert picture description here
Be careful ：metric： yes TopN exclusive
metric Configuration mode ：

"metric":"<metric_name>"  The default is sorted in ascending order 
 
"metric" : {
    
    "type" : "numeric", // Assign according to numeric  null 
    "metric" : "<metric_name>"
}
 
"metric" : {
    
    "type" : "inverted", // Assign according to numeric  Ascending sort 
    "metric" : "<metric_name>"
}
 
"metric" : {
    
    "type" : "lexicographic", // Specifies to sort in dictionary order 
    "metric" : "<metric_name>"
}
 
"metric" : {
    
    "type" : "alphaNumeric", // Specifies to sort by number 
    "metric" : "<metric_name>"
}

1.3 GroupBy

GroupBy Aggregate query is on multiple dimensions , Aggregate indicators .Druid I suggest , It works TimeseriesQueries and TopN Try not to use the implemented query GroupBy, because GroupBy The performance of is worse .

// TODO
Reference resources ：http://lxw1234.com/archives/2015/11/561.htm

2、 Metadata query （Metadata Queries）

2.1 Time range query （Time Boundary Queries）

Time range query is used to query the minimum and maximum time points of a data source .

{
    
    "queryType" : "timeBoundary",
    "dataSource": "lxw1234"
}

Query results ：

[ {
    
  "timestamp" : "2015-11-15T00:00:00.000+08:00",
  "result" : {
    
    "minTime" : "2015-11-15T00:00:00.000+08:00",
    "maxTime" : "2015-11-18T23:59:59.000+08:00"
  }
} ]

in addition , There's another. bound Options , Used to specify whether to return the maximum time point or the minimum time point , If you don't specify , Then both return ：

{
    
    "queryType" : "timeBoundary",
    "dataSource": "lxw1234",
    "bound": "maxTime"
}

At this time, only the maximum time point is returned ：

[ {
    
  "timestamp" : "2015-11-18T23:59:59.000+08:00",
  "result" : {
    
    "maxTime" : "2015-11-18T23:59:59.000+08:00"
  }
} ]

2.2 Segments Metadata query （Segment Metadata Queries）

Segments Metadata query can query every Segment For ：

Name
Segment Cardinality of all columns in （Cardinality）, Not STRING The columns of type are null;
Estimated size of each column （Bytes）;
The Segment The time span of ;
The type of the column ;
The Segment Estimated total size of ;
Segment ID;

Query configuration ：

{
    
  "queryType":"segmentMetadata",
  "dataSource":"lxw1234",
  "intervals":["2015-11-15/2015-11-19"]
}

Query results （ Only one Segment）：

{
    
  "id" : "lxw1234_2015-11-17T00:00:00.000+08:00_2015-11-18T00:00:00.000+08:00_2015-11-18T16:53:02.158+08:00_1",
  "intervals" : [ "2015-11-17T00:00:00.000+08:00/2015-11-18T00:00:00.000+08:00" ],
  "columns" : {
    
    "__time" : {
    
      "type" : "LONG",
      "size" : 46837800,
      "cardinality" : null,
      "errorMessage" : null
    },
    "cookieid" : {
    
      "type" : "STRING",
      "size" : 106261532,
      "cardinality" : 1134359,
      "errorMessage" : null
    },
    "count" : {
    
      "type" : "LONG",
      "size" : 37470240,
      "cardinality" : null,
      "errorMessage" : null
    },
    "ip" : {
    
      "type" : "STRING",
      "size" : 63478131,
      "cardinality" : 735562,
      "errorMessage" : null
    }
  },
  "size" : 272782823
}

2.3 Data source metadata query （Data Source Metadata Queries）

This query only returns the last time that data entered the data source .

such as , Query profile ：

{
    
    "queryType" : "dataSourceMetadata",
    "dataSource": "lxw1234"
}

The result is ：

[ {
    
  "timestamp" : "2015-11-18T23:59:59.000+08:00",
  "result" : {
    
    "maxIngestedEventTime" : "2015-11-18T23:59:59.000+08:00"
  }
} ]

3、 Search for （Search Queries）

select Be similar to sql in select operation ,select To view druid Stored data in , It also supports viewing specified dimensions and time periods according to specified filters metric, Can pass descending Field specifies the sort order , It supports paging pull , But does not support aggregations and postAggregations.

json Examples are as follows ：

{
    
  "queryType": "select",
  "dataSource": "app_auto_prem_qd_pp3", 
  "granularity": "all", 
  "intervals": "1917-08-25T08:35:20+00:00/2017-08-25T08:35:20+00:00",
  "dimensions": [
      "status",
      "is_new_car"
  ], 
  "pagingSpec":{
    
  "pagingIdentifiers":{
    },
  "threshold":
  },
  "context" : {
    
   "skipEmptyBuckets" : "true"
  }
}

amount to SQL sentence

select status,is_new_car from app_auto_prem_qd_pp3 limit 3

原网站

版权声明
本文为[TRX1024]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280529584514.html