当前位置:网站首页>Interview assault 63: how to remove duplication in MySQL?

Interview assault 63: how to remove duplication in MySQL?

2022-07-06 17:48:00 InfoQ

stay  MySQL  in , There are two most common methods of weight removal : Use  distinct  Or use  group by, What's the difference between them ? Let's take a look at .

1. Create test data

--  Create test table
drop table if exists pageview;
create table pageview(
 id bigint primary key auto_increment comment ' Since the primary key ',
 aid bigint not null comment ' article ID',
 uid bigint not null comment '( visit ) user ID',
 createtime datetime default now() comment ' Creation time '
) default charset='utf8mb4';
--  Add test data
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(2,1);
insert into pageview(aid,uid) values(2,2);

The final display effect is as follows :

null

2.distinct  Use

distinct  The basic grammar is as follows :

SELECT DISTINCT column_name,column_name FROM table_name;

2.1  Separate the heavy ones

We use first  distinct  Realize single column weight removal , according to  aid( article  ID) duplicate removal , The specific implementation is as follows :

null

2.2  More than one, more than one

In addition to single train weight removal ,distinct  It also supports multiple columns ( Two or more trains ) duplicate removal , We according to the  aid( article  ID) and  uid( user  ID) Combined weightlessness , The specific implementation is as follows :

null

2.3  Aggregate functions + duplicate removal

Use  distinct +  Aggregate function de duplication , Calculation  aid  Total number of strips after weight removal , The specific implementation is as follows :

null

3.group by  Use

group by  The basic grammar is as follows :

SELECT column_name,column_name FROM table_name 
WHERE column_name operator value 
GROUP BY column_name

3.1  Separate the heavy ones

according to  aid( article  ID) duplicate removal , The specific implementation is as follows :

null
And  distinct  comparison  group by  More columns can be displayed , and  distinct  Only the de duplicated columns can be displayed .

3.2  More than one, more than one

according to  aid( article  ID) and  uid( user  ID) Combined weightlessness , The specific implementation is as follows :

null

3.3  Aggregate functions  + group by

Count each one  aid  Total quantity ,SQL  The implementation is as follows :

null
As can be seen from the above results , Use  group by  and  distinct  Add  count  The query semantics of is completely different ,distinct + count  It counts the total quantity after weight removal , and  group by + count  Statistics are the total number of each group of data after grouping .

4.distinct  and  group by  The difference between

Official documents describe  distinct  When it comes to :
in the majority of cases  distinct  It's special  group by
, As shown in the figure below :

null
Official document address :
https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html
But there are still some subtle differences between the two , For example, the following .

difference 1: The query result set is different

When using  distinct  When you go to heavy duty , In the query result set, only the de duplication information , As shown in the figure below :

null
When you try to add a non de duplication field ( Inquire about ) when ,SQL  An error will be reported, as shown in the figure below :

null
While using  group by  Sorting can query one or more fields , As shown in the figure below :

null

difference 2: Different business scenarios

To count the total quantity after weight removal, you need to use  distinct, And statistical grouping details , Or when adding query criteria on the basis of grouping details , You have to use  group by  了 . Use  distinct  Count the total quantity of a column after weight removal :

null
The number after statistical grouping is greater than  2  The article , Then use  group by  了 , As shown in the figure below :

null

difference 3: Different performance

If the de duplicated field has an index , that  group by  and  distinct  You can use indexes , In this case, their performance is the same ; and
When the de duplicated field has no index ,distinct  Performance will be higher than  group by, Because in  MySQL 8.0  Before ,group by  There is a hidden function that will sort by default , This will trigger  filesort  This leads to reduced query performance .

summary

In most scenes  distinct  It's special  group by, But there are subtle differences between the two , For example, they are on the query result set 、 Specific business scenarios used , And the performance is different .
Reference resources  &  Acknowledgement
zhuanlan.zhihu.com/p/384840662

It's up to you to judge right and wrong , Disdain is to listen to people , Gain or loss is more important than number .
official account :Java Analysis of the real interview questions
Interview collection :
https://gitee.com/mydb/interview
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060938487852.html