当前位置:网站首页>Elasticsearch Part 6: aggregate statistical query
Elasticsearch Part 6: aggregate statistical query
2020-11-06 20:10:00 【itread01】
There has been no record before Elasticsearch Aggregate queries or other complex queries . Take notes in this article , To facilitate testing , The index data is still the test index library generated in Article 5 db_student_test , The nickname is student_test
The first part Basic aggregation
1、 Maximum max、 minimum value min、 Average avg 、 Sum sum
Scene : Inquiry language 、 Mathematics 、 English The maximum of these three subjects 、 minimum value 、 Average
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"max_chinese" : { "max" : { "field" : "chinese" } },
"min_chinese" : { "min" : { "field" : "chinese" } },
"avg_chinese" : { "avg" : { "field" : "chinese" } },
"max_math": { "max" : { "field" : "math" } },
"min_math": { "min" : { "field" : "math" } },
"avg_math": { "avg" : { "field" : "math" } },
"max_english": { "max" : { "field" : "english" } },
"min_english": { "min" : { "field" : "english" } },
"avg_english": { "avg" : { "field" : "english" } }
}
}
The query result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"avg_english": {
"value": 57.78366490546798
},
"max_chinese": {
"value": 98
},
"min_chinese": {
"value": 25
},
"min_math": {
"value": 15
},
"max_english": {
"value": 98
},
"avg_chinese": {
"value": 59.353859695794505
},
"avg_math": {
"value": 56.92907568735187
},
"min_english": {
"value": 21
},
"max_math": {
"value": 99
}
}
}
You can also query the total score of Chinese subjects , It is equivalent to sql Of sum Logic , Although it doesn't make any sense here :
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"sum_chinese" : { "sum" : { "field" : "chinese" } }
}
}
2、 Count the number , It is equivalent to sql Of count Logic
Scene : Query the total number of all students , It's easy here count One The column is OK , For example, the field of Mathematics
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"age_count": {
"value_count": {
"field": "math"
}
}
}
}
The return result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"age_count": {
"value": 50084828
}
}
}
The total number between classes is :50084828 It's consistent with the amount of data we generated in Chapter 5
3、distinct polymerization , It is equivalent to sql Of count ( distinct )
Scene : How many kinds of value does the statistics language achievement have
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "chinese"
}
}
}
}
The return result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"type_count": {
"value": 74
}
}
}
In terms of results , Only 74 A different score , Match with the rules of the fifth randomly generated data
4、 Statistical aggregation
Scene : Query language scores Total number 、 Maximum 、 minimum value 、 Average 、 Sum up, etc
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_stats": {
"stats": {
"field": "chinese"
}
}
}
}
The return result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"chinese_stats": {
"count": 50084828,
"min": 25,
"max": 98,
"avg": 59.353859695794505,
"sum": 2972727854
}
}
}
5、 Enhanced statistical aggregation , The query results are based on the above , Plus the variance and other statistical data
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_stats": {
"extended_stats": {
"field": "chinese"
}
}
}
}
6、 Quantile aggregation Statistics
The default quantile is 1% 5% 25% 50% 75% 95% 99% 《= The concept of
The concept of quantile :25% The quantile of is 54, Less than or equal to 54 Of the total sample 25% , That is 54 This number will be at the bottom of 1/4 We're going to split it up .
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_percents": {
"percentiles": {
"field": "chinese"
}
}
}
}
You can also customize the quantile :
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"chinese_percents": {
"percentiles": {
"field": "chinese",
"percents" : [10,20,30,40,50,60,70,80,90]
}
}
}
}
7、 Range aggregation Statistics
Scene : The language scores are less than 40 branch 、 Smaller than 50 branch 、 Smaller than 60 The percentage of points
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs": {
"gge_perc_rank": {
"percentile_ranks": {
"field": "chinese",
"values": [40,50,60]
}
}
}
}
The above results are less than 40, Smaller than 50, Smaller than 60 % of , The information we get is : 21.29% 36.09% 51.12% As you can see, it's a sequence that's close to arithmetic , It can be seen that the randomness of the test data is still very good .
The second part Other polymerization methods
1、Term polymerization
Scene : Want to know the students' Chinese achievement , The number of points on all fractions
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"genres" : {
"terms" : {
"field" : "chinese"
}
}
}
}
This query will put the fields Chinese Aggregate , for example 87 To synthesize a group of ,88 To synthesize a group of , wait ;
But the default here is to sort by group size , And it doesn't show all the groups , Groups that are too small may be ignored , The results are as follows :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 42560269,
"buckets": [
{
"key": 61,
"doc_count": 752863
},
{
"key": 68,
"doc_count": 752835
},
{
"key": 55,
"doc_count": 752749
},
{
"key": 59,
"doc_count": 752444
},
{
"key": 76,
"doc_count": 752405
},
{
"key": 74,
"doc_count": 752309
},
{
"key": 56,
"doc_count": 752283
},
{
"key": 49,
"doc_count": 752273
},
{
"key": 52,
"doc_count": 752201
},
{
"key": 50,
"doc_count": 752197
}
]
}
}
}
If you want to customize the filter criteria ,Term Aggregation can also be queried according to the following settings :
post http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"genres" : {
"terms" : {
"field" : "chinese",
"size" : 100, // There may be 100 It's an unused score , We're going to show it all
"order" : { "_count" : "asc" }, // Sort according to the number of groups
"min_doc_count": 752200 // Filter conditions : The minimum number of groups is 752200
}
}
}
}
The query result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 52,
"doc_count": 752201
},
{
"key": 49,
"doc_count": 752273
},
{
"key": 56,
"doc_count": 752283
},
{
"key": 74,
"doc_count": 752309
},
{
"key": 76,
"doc_count": 752405
},
{
"key": 59,
"doc_count": 752444
},
{
"key": 55,
"doc_count": 752749
},
{
"key": 68,
"doc_count": 752835
},
{
"key": 61,
"doc_count": 752863
}
]
}
}
}
2、Filter polymerization
Filter Aggregation is conditional filtering first , In the process of polymerization
Scene : Query the average score of students in South China University of technology ( First screen schools , And then we aggregate the scores )
{
"aggs" : {
"scut_math_avg" : {
"filter" : { "term": { "school": " South China University of Technology " } },
"aggs" : {
"avg_price" : { "avg" : { "field" : "math" } }
}
}
}
}
The query result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"scut_math_avg": {
"doc_count": 1854993,
"avg_price": {
"value": 56.93080027795253
}
}
}
}
3、Filters Multiple aggregation
Scene : Check schools , Chinese 、 Mathematics 、 What is the average score of English , Multiple polymerization can be used , It may be a little slow , as follows
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"school_1" : { "term" : { "school" : " South China University of Technology " }},
"school_2" : { "term" : { "school" : " Sun Yat sen University " }},
"school_3" : { "match" : { "school" : " Jinan University " }}
}
},
"aggs" : {
"avg_chinese" : { "avg" : { "field" : "chinese" } },
"avg_math" : { "avg" : { "field" : "math" } }
}
}
}
}
So the result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"messages": {
"buckets": {
"school_1": {
"doc_count": 1854993,
"avg_chinese": {
"value": 59.353236912484306
},
"avg_math": {
"value": 56.93080027795253
}
},
"school_2": {
"doc_count": 1855016,
"avg_chinese": {
"value": 59.349129064115886
},
"avg_math": {
"value": 56.93540918245449
}
},
"school_3": {
"doc_count": 44519876,
"avg_chinese": {
"value": 59.35397212247402
},
"avg_math": {
"value": 56.92948502372289
}
}
}
}
}
}
4、Range Range aggregation
Scene : Want to query the number of Chinese scores in each segment , You can query
POST http://localhost:9200/student_test1/_search?size=0
{ "aggs" : { "chinese_ranges" : { "range" : { "field" : "chinese", "ranges" : [ { "to" : 60 }, { "from" : 60, "to" : 75 }, { "from" : 75, "to" : 85 }, { "from" : 85 } ] } } } }
The query result is :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"chinese_ranges": {
"buckets": [
{
"key": "*-60.0",
"to": 60,
"doc_count": 25096839
},
{
"key": "60.0-75.0",
"from": 60,
"to": 75,
"doc_count": 11278543
},
{
"key": "75.0-85.0",
"from": 75,
"to": 85,
"doc_count": 7424634
},
{
"key": "85.0-*",
"from": 85,
"doc_count": 6284812
}
]
}
}
}
The group names of the returned results are *-60.0 60.0-75.0 75.0-85.0 85.0-*
If we don't want a group name like this , You can customize the group name , for example :
POST http://localhost:9200/student_test1/_search?size=0
{
"aggs" : {
"chinese_ranges" : {
"range" : {
"field" : "chinese",
"keyed" : true,
"ranges" : [
{ "key" : " fail, ", "to" : 60 },
{ "key" : " pass ", "from" : 60, "to" : 75 },
{ "key" : " good ", "from" : 75, "to" : 85 },
{ "key" : " Excellent ", "from" : 85 }
]
}
}
}
}
The query result will be :
{
"took": 1675,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"chinese_ranges": {
"buckets": {
" fail, ": {
"to": 60,
"doc_count": 25096839
},
" pass ": {
"from": 60,
"to": 75,
"doc_count": 11278543
},
" good ": {
"from": 75,
"to": 85,
"doc_count": 7424634
},
" Excellent ": {
"from": 85,
"doc_count": 6284812
}
}
}
}
}
There are all kinds of other 、 Complex aggregate queries , All of them can check information online , It even supports some calculation methods of recommendation system , For example, the concept of matrix and so on .
You can also refer to https://blog.csdn.net/alex_xfboy/article/details/8610
版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
边栏推荐
- JNI-Thread中start方法的呼叫與run方法的回撥分析
- Introduction to Google software testing
- For a while, a dynamic thread pool was created, and the source code was put into GitHub
- 这个项目可以让你在几分钟快速了解某个编程语言
- DC-1靶機
- 一篇文章带你了解CSS 分页实例
- StickEngine-架构11-消息队列(MessageQueue)
- Discussion on the development practice of aspnetcore, a cross platform framework
- The data of pandas was scrambled and the training machine and testing machine set were selected
- From zero learning artificial intelligence, open the road of career planning!
猜你喜欢

A course on word embedding

Brief introduction of TF flags

MeterSphere开发者手册
![[C / C + + 1] clion configuration and running C language](/img/5b/ba96ff4447b150f50560e5d47cb8d1.jpg)
[C / C + + 1] clion configuration and running C language

一篇文章带你了解CSS3图片边框

How to understand Python iterators and generators?

【:: 是什么语法?】

NLP model Bert: from introduction to mastery (2)

Pattern matching: The gestalt approach一种序列的文本相似度方法

Wow, elasticsearch multi field weight sorting can play like this
随机推荐
From zero learning artificial intelligence, open the road of career planning!
How to customize sorting for pandas dataframe
Pattern matching: The gestalt approach一种序列的文本相似度方法
Windows 10 tensorflow (2) regression analysis of principles, deep learning framework (gradient descent method to solve regression parameters)
Basic principle and application of iptables
python100例項
有了这个神器,快速告别垃圾短信邮件
Using NLP and ml to extract and construct web data
一篇文章带你了解CSS3图片边框
Free patent download tutorial (HowNet, Espacenet)
仅用六种字符来完成Hello World,你能做到吗?
Advanced Vue component pattern (3)
Even liver three all night, jvm77 high frequency interview questions detailed analysis, this?
文件过多时ls命令为什么会卡住?
Five vuex plug-ins for your next vuejs project
只有1个字节的文件实际占用多少磁盘空间
Lane change detection
It's time for your financial report to change to a more advanced style -- financial analysis cockpit
Python download module to accelerate the implementation of recording
Pollard's Rho algorithm