当前位置：网站首页>Summary of SQL aggregate query method for yyds dry goods inventory

Summary of SQL aggregate query method for yyds dry goods inventory

2022-07-01 19:59:00 【51CTO】

SQL Why support aggregate queries ？

This seems like a naive question , But let's think about it step by step . Data is stored at behavioral granularity , The simplest SQL The sentence is select * from test, What you get is the whole two-dimensional table details , But this alone is not enough , For two purposes , need SQL Provide aggregate function ：

1. The detailed data has no statistical significance , For example, I want to know the total turnover today , And don't care much about how much a table guest spends .

2. Although you can check the data into memory first and then aggregate , But when the amount of data is very large, it is easy to burst the memory , Maybe a table has a data volume of... In one day 10TB, and 10TB Even if the data can be read into memory , Aggregation computing may also be unacceptably slow .

In addition, aggregation itself has a certain logical complexity , and SQL Provides aggregation functions and grouping aggregation capabilities , It can easily and quickly count the aggregated data with business value , This lays the foundation for SQL The analytical value of language , Therefore, most analysis software directly adopts SQL As a direct user oriented expression .

Aggregate functions

Common aggregate functions are ：

COUNT： Count .
SUM： Sum up .
AVG： averaging .
MAX： For maximum .
MIN： For the minimum .

COUNT

COUNT Used to calculate how many pieces of data there are , Let's see id How many in this column ：

      
      
       
       SELECT COUNT(id) FROM test
      
      
      
      
       
       1.

But we found that we actually checked any column COUNT It's all the same , The introduction id What is it ？ There is no need to find a specific column to refer to , So it can also be written as ：

      
      
       
       SELECT COUNT(*) FROM test
      
      
      
      
       
       1.

But there are subtle differences between the two .SQL There is a very special value type NULL, If COUNT Specific columns are specified , This column will be skipped during statistics, and the value is NULL The line of , and COUNT(*) Because no specific column is specified , So even if it includes NULL, Even all columns in a row are NULL, Will also be included . therefore COUNT(*) The result must be greater than or equal to COUNT(c1).

Of course, any aggregate function can follow the query criteria WHERE, such as ：

      
      
       
       SELECT COUNT(*) FROM test
       
       
WHERE is_gray = 1
      
      
      
      
       
       1.
       
       2.

SUM

SUM Sum all terms , Therefore, it must act on the numeric field , Not for strings .

      
      
       
       SELECT SUM(cost) FROM test
      
      
      
      
       
       1.

SUM encounter NULL Value when 0 Handle , Because it's equivalent to ignoring .

AVG

AVG Find the value of all terms , Therefore, it must act on the numeric field , Not for strings .

      
      
       
       SELECT AVG(cost) FROM test
      
      
      
      
       
       1.

AVG encounter NULL Value is ignored in the most thorough way , namely NULL Completely not involved in the calculation of numerator and denominator , Just like this line of data does not exist .

MAX、MIN

MAX、MIN Find the maximum and minimum values respectively , When the above is different , It can also act on strings , So you can judge the size according to the letters , From big to small a-z, But even if it can be counted , It has no practical significance and is difficult to understand , Therefore, it is not recommended to extremum the string .

      
      
       
       SELECT MAX(cost) FROM test
      
      
      
      
       
       1.

Multiple aggregate fields

Although they are aggregate functions , but MAX、MIN Strictly speaking, it is not an aggregate function , Because they just look for lines that meet the conditions . You can see the comparison of the query results in the following two paragraphs ：

      
      
       
       SELECT MAX(cost), id FROM test -- id: 100
       
       
SELECT SUM(cost), id FROM test -- id: 1
      
      
      
      
       
       1.
       
       2.

The first query can find the row with the maximum value id, And the second query id It's meaningless , Because I don't know which line to belong to , So only the first data is returned id.

Of course , If you calculate at the same time MAX、MIN, So at this time id Only the value of the first data is returned , Because the query result corresponds to a complex number of rows ：

      
      
       
       SELECT MAX(cost), MIN(cost), id FROM test -- id: 1
      
      
      
      
       
       1.

Based on these characteristics , It's best not to mix polymerization and non polymerization , That is, once a field in a query is aggregated , Then all fields must be aggregated .

A lot now BI All custom fields of the engine have this restriction , Because there are many boundary conditions when mixing aggregation and non aggregation in user-defined memory calculation , although SQL Can support , However, business customized functions may not support .

Group aggregation

Grouping aggregation is GROUP BY, In fact, it can be regarded as an advanced conditional statement .

for instance , Query each country's GDP Total amount ：

      
      
       
       SELECT COUNT(GDP) FROM amazing_table
       
       
GROUP BY country
      
      
      
      
       
       1.
       
       2.

The returned results are grouped by country , At this time , The aggregation function becomes aggregation within a group .

In fact, if we just want to see 、 beauty GDP, Non grouping can also be used to check , Just divide it into two SQL：

      
      
       
       SELECT COUNT(GDP) FROM amazing_table
       
       
WHERE country = ' China '
       
       

       
       
SELECT COUNT(GDP) FROM amazing_table
       
       
WHERE country = ' The United States '
      
      
      
      
       
       1.
       
       2.
       
       3.
       
       4.
       
       5.

therefore GROUP BY It can also be understood as , Find out all enumerable conditions of a field , And integrate it into a table , Each line represents an enumeration case , It doesn't need to be broken down into one by one WHERE The query .

Multi field grouping aggregation

GROUP BY You can use... For multiple dimensions , The meaning is equivalent to the row in table query / Drag columns into multiple dimensions .

It's on it BI Query tool perspective , If there is no context , You can see the following progressive description ：

Group and aggregate by multiple fields .
Multiple fields are combined to become unique Key, namely GROUP BY a,b Express a,b Together, describe a group .
GROUP BY a,b,c You may see many duplicate results of the first column of the query a That's ok , The second column sees the repetition b That's ok , But in the same a There will be no repetition within the value ,c stay b The same is true in the line .

Here is an example ：

       
       SELECT SUM
       
       (GDP
       
       ) 
       
       FROM amazing_table
       
       GROUP 
       
       BY province
       
       , city
       
       , area
      
1.
2.

GROUP BY + WHERE

WHERE It is filtered according to the criteria of the row . therefore GROUP BY + WHERE Not in groups , It's screening the whole .

But because of filtering by row , In fact, the results are exactly the same within or outside the group , So we can hardly perceive the difference ：

      
      
       
       SELECT SUM(GDP) FROM amazing_table
       
       
GROUP BY province, city, area
       
       
WHERE industry = 'internet'
      
      
      
      
       
       1.
       
       2.
       
       3.

However , Ignoring this difference will cause us to run into a wall in aggregation screening .

For example, we should screen out the average score greater than 60 The sum of students' grades , If you don't use subqueries , It cannot be used in ordinary query WHERE Add aggregate function to achieve , For example, the following is an example of a syntax error ：

      
      
       
       SELECT SUM(score) FROM amazing_table
       
       
WHERE AVG(score) > 60
      
      
      
      
       
       1.
       
       2.

Don't fantasize about the above SQL Can be executed successfully , Not in WHERE Use aggregate functions in .

GROUP BY + HAVING

HAVING It is filtered according to the group . So you can HAVING Using aggregate functions ：

      
      
       
       SELECT SUM(score) FROM amazing_table
       
       
GROUP BY class_name
       
       
HAVING AVG(score) > 60
      
      
      
      
       
       1.
       
       2.
       
       3.

In the above example, you can query normally , It means to look at the total score according to the grouping of classes , And only the average score greater than 60 Class .

So why HAVING You can use aggregation conditions ？ because HAVING Filtering is a group , Therefore, you can filter out groups that do not meet the conditions after group aggregation , It makes sense . and WHERE For row granularity , After aggregation, there is only one piece of data in the whole table , It doesn't make sense whether it's filtered or not .

But here's the thing ,GROUP BY Index filtering cannot be used to generate derived tables , therefore WHERE You can optimize performance by indexing fields , and HAVING Does not work for index fields .

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/182/202207011833105830.html