当前位置：网站首页>Elasticsearch Part 6: aggregate statistical query

Elasticsearch Part 6: aggregate statistical query

2020-11-06 20:10:00 【itread01】

There has been no record before Elasticsearch Aggregate queries or other complex queries . Take notes in this article , To facilitate testing , The index data is still the test index library generated in Article 5 db_student_test , The nickname is student_test

The first part Basic aggregation

1、 Maximum max、 minimum value min、 Average avg 、 Sum sum

Scene ： Inquiry language 、 Mathematics 、 English The maximum of these three subjects 、 minimum value 、 Average

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "max_chinese" : { "max" : { "field" : "chinese" } },
        "min_chinese" : { "min" : { "field" : "chinese" } },
        "avg_chinese" : { "avg" : { "field" : "chinese" } },
        "max_math": { "max" : { "field" : "math" } },
        "min_math": { "min" : { "field" : "math" } },
        "avg_math": { "avg" : { "field" : "math" } },
        "max_english": { "max" : { "field" : "english" } },
        "min_english": { "min" : { "field" : "english" } },
        "avg_english": { "avg" : { "field" : "english" } }
    }
}

The query result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "avg_english": {
            "value": 57.78366490546798
        },
        "max_chinese": {
            "value": 98
        },
        "min_chinese": {
            "value": 25
        },
        "min_math": {
            "value": 15
        },
        "max_english": {
            "value": 98
        },
        "avg_chinese": {
            "value": 59.353859695794505
        },
        "avg_math": {
            "value": 56.92907568735187
        },
        "min_english": {
            "value": 21
        },
        "max_math": {
            "value": 99
        }
    }
}

You can also query the total score of Chinese subjects , It is equivalent to sql Of sum Logic , Although it doesn't make any sense here ：

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "sum_chinese" : { "sum" : { "field" : "chinese" } }
    }
}

2、 Count the number , It is equivalent to sql Of count Logic

Scene ： Query the total number of all students , It's easy here count One The column is OK , For example, the field of Mathematics

POST  http://localhost:9200/student_test1/_search?size=0
{
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "math"
      }
    }
  }
}

The return result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "value": 50084828
        }
    }
}

The total number between classes is ：50084828 It's consistent with the amount of data we generated in Chapter 5

3、distinct polymerization , It is equivalent to sql Of count ( distinct )

Scene ： How many kinds of value does the statistics language achievement have

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "type_count" : {
            "cardinality" : {
                "field" : "chinese"
            }
        }
    }
}

The return result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "type_count": {
            "value": 74
        }
    }
}

In terms of results , Only 74 A different score , Match with the rules of the fifth randomly generated data

4、 Statistical aggregation

Scene ： Query language scores Total number 、 Maximum 、 minimum value 、 Average 、 Sum up, etc

POST  http://localhost:9200/student_test1/_search?size=0
{
  "aggs": {
    "chinese_stats": {
      "stats": {
        "field": "chinese"
      }
    }
  }
}

The return result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "chinese_stats": {
            "count": 50084828,
            "min": 25,
            "max": 98,
            "avg": 59.353859695794505,
            "sum": 2972727854
        }
    }
}

5、 Enhanced statistical aggregation , The query results are based on the above , Plus the variance and other statistical data

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "chinese_stats": {
      "extended_stats": {
        "field": "chinese"
      }
    }
  }
}

6、 Quantile aggregation Statistics

The default quantile is 1% 5% 25% 50% 75% 95% 99% 《= The concept of

The concept of quantile ：25% The quantile of is 54, Less than or equal to 54 Of the total sample 25% , That is 54 This number will be at the bottom of 1/4 We're going to split it up .

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "chinese_percents": {
      "percentiles": {
        "field": "chinese"
      }
    }
  }
}

You can also customize the quantile ：

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "chinese_percents": {
      "percentiles": {
        "field": "chinese",
        "percents" : [10,20,30,40,50,60,70,80,90] 
      }
    }
  }
}

7、 Range aggregation Statistics

Scene ： The language scores are less than 40 branch 、 Smaller than 50 branch 、 Smaller than 60 The percentage of points

POST  http://localhost:9200/student_test1/_search?size=0 
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "chinese",
        "values": [40,50,60]
      }
    }
  }
}

The above results are less than 40, Smaller than 50, Smaller than 60 % of , The information we get is ： 21.29% 36.09% 51.12% As you can see, it's a sequence that's close to arithmetic , It can be seen that the randomness of the test data is still very good .

The second part Other polymerization methods

1、Term polymerization

Scene ： Want to know the students' Chinese achievement , The number of points on all fractions

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "genres" : {
            "terms" : { 
                "field" : "chinese"
            }
        }
    }
}

This query will put the fields Chinese Aggregate , for example 87 To synthesize a group of ,88 To synthesize a group of , wait ;

But the default here is to sort by group size , And it doesn't show all the groups , Groups that are too small may be ignored , The results are as follows ：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "genres": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 42560269,
            "buckets": [
                {
                    "key": 61,
                    "doc_count": 752863
                },
                {
                    "key": 68,
                    "doc_count": 752835
                },
                {
                    "key": 55,
                    "doc_count": 752749
                },
                {
                    "key": 59,
                    "doc_count": 752444
                },
                {
                    "key": 76,
                    "doc_count": 752405
                },
                {
                    "key": 74,
                    "doc_count": 752309
                },
                {
                    "key": 56,
                    "doc_count": 752283
                },
                {
                    "key": 49,
                    "doc_count": 752273
                },
                {
                    "key": 52,
                    "doc_count": 752201
                },
                {
                    "key": 50,
                    "doc_count": 752197
                }
            ]
        }
    }
}

If you want to customize the filter criteria ,Term Aggregation can also be queried according to the following settings ：

post  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "genres" : {
            "terms" : { 
                "field" : "chinese",
                 "size" : 100,                     //  There may be 100 It's an unused score , We're going to show it all 
                 "order" : { "_count" : "asc" },   //  Sort according to the number of groups 
                  "min_doc_count": 752200          // Filter conditions ： The minimum number of groups is 752200
            }
        }
    }
}

The query result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "genres": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 52,
                    "doc_count": 752201
                },
                {
                    "key": 49,
                    "doc_count": 752273
                },
                {
                    "key": 56,
                    "doc_count": 752283
                },
                {
                    "key": 74,
                    "doc_count": 752309
                },
                {
                    "key": 76,
                    "doc_count": 752405
                },
                {
                    "key": 59,
                    "doc_count": 752444
                },
                {
                    "key": 55,
                    "doc_count": 752749
                },
                {
                    "key": 68,
                    "doc_count": 752835
                },
                {
                    "key": 61,
                    "doc_count": 752863
                }
            ]
        }
    }
}

2、Filter polymerization

Filter Aggregation is conditional filtering first , In the process of polymerization

Scene ： Query the average score of students in South China University of technology （ First screen schools , And then we aggregate the scores ）

{
    "aggs" : {
        "scut_math_avg" : {
            "filter" : { "term": { "school": " South China University of Technology " } },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "math" } }
            }
        }
    }
}

The query result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "scut_math_avg": {
            "doc_count": 1854993,
            "avg_price": {
                "value": 56.93080027795253
            }
        }
    }
}

3、Filters Multiple aggregation

Scene ： Check schools , Chinese 、 Mathematics 、 What is the average score of English , Multiple polymerization can be used , It may be a little slow , as follows

POST  http://localhost:9200/student_test1/_search?size=0
{
  "aggs" : {
    "messages" : {
      "filters" : {
        "filters" : {
          "school_1" :   { "term" : { "school" : " South China University of Technology " }},
          "school_2" : { "term" : { "school" : " Sun Yat sen University " }},
          "school_3" : { "match" : { "school" : " Jinan University " }}
        }
      },
      "aggs" : {
           "avg_chinese" : { "avg" : { "field" : "chinese" } },
           "avg_math" : { "avg" : { "field" : "math" } }
      }
    }
  }
}

So the result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "messages": {
            "buckets": {
                "school_1": {
                    "doc_count": 1854993,
                    "avg_chinese": {
                        "value": 59.353236912484306
                    },
                    "avg_math": {
                        "value": 56.93080027795253
                    }
                },
                "school_2": {
                    "doc_count": 1855016,
                    "avg_chinese": {
                        "value": 59.349129064115886
                    },
                    "avg_math": {
                        "value": 56.93540918245449
                    }
                },
                "school_3": {
                    "doc_count": 44519876,
                    "avg_chinese": {
                        "value": 59.35397212247402
                    },
                    "avg_math": {
                        "value": 56.92948502372289
                    }
                }
            }
        }
    }
}

4、Range Range aggregation

Scene ： Want to query the number of Chinese scores in each segment , You can query

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "chinese_ranges" : {
            "range" : {
                "field" : "chinese",
                "ranges" : [
                    { "to" : 60 },
                    { "from" : 60, "to" : 75 },
                    { "from" : 75, "to" : 85 },
                    { "from" : 85 }
                ]
            }
        }
    }
}

The query result is ：

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "chinese_ranges": {
            "buckets": [
                {
                    "key": "*-60.0",
                    "to": 60,
                    "doc_count": 25096839
                },
                {
                    "key": "60.0-75.0",
                    "from": 60,
                    "to": 75,
                    "doc_count": 11278543
                },
                {
                    "key": "75.0-85.0",
                    "from": 75,
                    "to": 85,
                    "doc_count": 7424634
                },
                {
                    "key": "85.0-*",
                    "from": 85,
                    "doc_count": 6284812
                }
            ]
        }
    }
}

The group names of the returned results are *-60.0 60.0-75.0 75.0-85.0 85.0-*
If we don't want a group name like this , You can customize the group name , for example ：

POST  http://localhost:9200/student_test1/_search?size=0
{
    "aggs" : {
        "chinese_ranges" : {
            "range" : {
                "field" : "chinese",
                "keyed" : true,
                "ranges" : [
                    { "key" : " fail, ", "to" : 60 },
                    { "key" : " pass ", "from" : 60, "to" : 75 },
                    { "key" : " good ", "from" : 75, "to" : 85 },
                    { "key" : " Excellent ", "from" : 85 }
                ]
            }
        }
    }
}

The query result will be ：

{
    "took": 1675,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "chinese_ranges": {
            "buckets": {
                " fail, ": {
                    "to": 60,
                    "doc_count": 25096839
                },
                " pass ": {
                    "from": 60,
                    "to": 75,
                    "doc_count": 11278543
                },
                " good ": {
                    "from": 75,
                    "to": 85,
                    "doc_count": 7424634
                },
                " Excellent ": {
                    "from": 85,
                    "doc_count": 6284812
                }
            }
        }
    }
}

There are all kinds of other 、 Complex aggregate queries , All of them can check information online , It even supports some calculation methods of recommendation system , For example, the concept of matrix and so on .

You can also refer to https://blog.csdn.net/alex_xfboy/article/details/8610

版权声明
本文为[itread01]所创，转载请带上原文链接，感谢

当前位置：网站首页>Elasticsearch Part 6: aggregate statistical query

Elasticsearch Part 6: aggregate statistical query

The first part Basic aggregation

The second part Other polymerization methods

边栏推荐

猜你喜欢

随机推荐