当前位置:网站首页>Interview assault 63: how to remove duplication in MySQL?
Interview assault 63: how to remove duplication in MySQL?
2022-07-06 19:28:00 【Wang Lei】
stay MySQL in , There are two most common methods of weight removal : Use distinct Or use group by, What's the difference between them ? Let's take a look at .
1. Create test data
-- Create test table
drop table if exists pageview;
create table pageview(
id bigint primary key auto_increment comment ' Since the primary key ',
aid bigint not null comment ' article ID',
uid bigint not null comment '( visit ) user ID',
createtime datetime default now() comment ' Creation time '
) default charset='utf8mb4';
-- Add test data
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(2,1);
insert into pageview(aid,uid) values(2,2);
The final display effect is as follows :
2.distinct Use
distinct The basic grammar is as follows :
SELECT DISTINCT column_name,column_name FROM table_name;
2.1 Separate the heavy ones
We use first distinct Realize single column weight removal , according to aid( article ID) duplicate removal , The specific implementation is as follows :
2.2 More than one, more than one
In addition to single train weight removal ,distinct It also supports multiple columns ( Two or more trains ) duplicate removal , We according to the aid( article ID) and uid( user ID) Combined weightlessness , The specific implementation is as follows :
2.3 Aggregate functions + duplicate removal
Use distinct + Aggregate function de duplication , Calculation aid Total number of strips after weight removal , The specific implementation is as follows :
3.group by Use
group by The basic grammar is as follows :
SELECT column_name,column_name FROM table_name
WHERE column_name operator value
GROUP BY column_name
3.1 Separate the heavy ones
according to aid( article ID) duplicate removal , The specific implementation is as follows : And distinct comparison group by More columns can be displayed , and distinct Only the de duplicated columns can be displayed .
3.2 More than one, more than one
according to aid( article ID) and uid( user ID) Combined weightlessness , The specific implementation is as follows :
3.3 Aggregate functions + group by
Count each one aid Total quantity ,SQL The implementation is as follows : As can be seen from the above results , Use group by and distinct Add count The query semantics of is completely different ,distinct + count It counts the total quantity after weight removal , and group by + count Statistics are the total number of each group of data after grouping .
4.distinct and group by The difference between
Official documents describe distinct When it comes to : in the majority of cases distinct It's special group by, As shown in the figure below : Official document address :https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html But there are still some subtle differences between the two , For example, the following .
difference 1: The query result set is different
When using distinct When you go to heavy duty , In the query result set, only the de duplication information , As shown in the figure below : When you try to add a non de duplication field ( Inquire about ) when ,SQL An error will be reported, as shown in the figure below : While using group by Sorting can query one or more fields , As shown in the figure below :
difference 2: Different business scenarios
To count the total quantity after weight removal, you need to use distinct, And statistical grouping details , Or when adding query criteria on the basis of grouping details , You have to use group by 了 . Use distinct Count the total quantity of a column after weight removal : The number after statistical grouping is greater than 2 The article , Then use group by 了 , As shown in the figure below :
difference 3: Different performance
If the de duplicated field has an index , that group by and distinct You can use indexes , In this case, their performance is the same ; and When the de duplicated field has no index ,distinct Performance will be higher than group by, Because in MySQL 8.0 Before ,group by There is a hidden function that will sort by default , This will trigger filesort This leads to reduced query performance .
summary
In most scenes distinct It's special group by, But there are subtle differences between the two , For example, they are on the query result set 、 Specific business scenarios used , And the performance is different .
Reference resources & Acknowledgement
zhuanlan.zhihu.com/p/384840662
It's up to you to judge right and wrong , Disdain is to listen to people , Gain or loss is more important than number .
official account :Java Analysis of the real interview questions
Interview collection :https://gitee.com/mydb/interview
边栏推荐
- MRO industrial products enterprise procurement system: how to refine procurement collaborative management? Industrial products enterprises that want to upgrade must see!
- Leetcode topic [array] - 119 Yang Hui triangle II
- Pychrm Community Edition calls matplotlib pyplot. Solution of imshow() function image not popping up
- 【翻译】供应链安全项目in-toto移至CNCF孵化器
- Digital "new" operation and maintenance of energy industry
- 终于可以一行代码也不用改了!ShardingSphere 原生驱动问世
- Zero foundation entry polardb-x: build a highly available system and link the big data screen
- CCNP Part 11 BGP (III) (essence)
- Is not a drawable (color or path): the vector graph downloaded externally cannot be called when it is put into mipmap, and the calling error program crashes
- 全套教学资料,阿里快手拼多多等7家大厂Android面试真题
猜你喜欢
LeetCode_双指针_中等_61. 旋转链表
Countdown 2 days | live broadcast preview of Tencent cloud message queue data import platform
零基础入门PolarDB-X:搭建高可用系统并联动数据大屏
快速幂模板求逆元,逆元的作用以及例题【第20届上海大学程序设计联赛夏季赛】排列计数
Mysql Information Schema 学习(二)--Innodb表
冒烟测试怎么做
Take a look at how cabloyjs workflow engine implements activiti boundary events
Spark foundation -scala
全套教学资料,阿里快手拼多多等7家大厂Android面试真题
史上超级详细,想找工作的你还不看这份资料就晚了
随机推荐
Mind map + source code + Notes + project, ByteDance + JD +360+ Netease interview question sorting
反射及在运用过程中出现的IllegalAccessException异常
Leetcode topic [array] - 119 Yang Hui triangle II
LeetCode-1279. Traffic light intersection
ZABBIX proxy server and ZABBIX SNMP monitoring
The dplyr package of R language performs data grouping aggregation statistical transformations and calculates the grouping mean of dataframe data
How to type multiple spaces when editing CSDN articles
Use of deg2rad and rad2deg functions in MATLAB
It's super detailed in history. It's too late for you to read this information if you want to find a job
Interface test tool - postman
Characteristic colleges and universities, jointly build Netease Industrial College
Take a look at how cabloyjs workflow engine implements activiti boundary events
JDBC详解
ModuleNotFoundError: No module named ‘PIL‘解决方法
倒计时2天|腾讯云消息队列数据接入平台(Data Import Platform)直播预告
如何自定义动漫头像?这6个免费精品在线卡通头像生成器,看一眼就怦然心动!
Spark foundation -scala
Carte de réflexion + code source + notes + projet, saut d'octets + jd + 360 + tri des questions d'entrevue Netease
Based on butterfly species recognition
10 schemes to ensure interface data security