当前位置:网站首页>Interview assault 63: how to remove duplication in MySQL?
Interview assault 63: how to remove duplication in MySQL?
2022-07-06 19:28:00 【Wang Lei】
stay MySQL in , There are two most common methods of weight removal : Use distinct Or use group by, What's the difference between them ? Let's take a look at .
1. Create test data
-- Create test table
drop table if exists pageview;
create table pageview(
id bigint primary key auto_increment comment ' Since the primary key ',
aid bigint not null comment ' article ID',
uid bigint not null comment '( visit ) user ID',
createtime datetime default now() comment ' Creation time '
) default charset='utf8mb4';
-- Add test data
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(2,1);
insert into pageview(aid,uid) values(2,2);
The final display effect is as follows : 
2.distinct Use
distinct The basic grammar is as follows :
SELECT DISTINCT column_name,column_name FROM table_name;
2.1 Separate the heavy ones
We use first distinct Realize single column weight removal , according to aid( article ID) duplicate removal , The specific implementation is as follows : 
2.2 More than one, more than one
In addition to single train weight removal ,distinct It also supports multiple columns ( Two or more trains ) duplicate removal , We according to the aid( article ID) and uid( user ID) Combined weightlessness , The specific implementation is as follows : 
2.3 Aggregate functions + duplicate removal
Use distinct + Aggregate function de duplication , Calculation aid Total number of strips after weight removal , The specific implementation is as follows : 
3.group by Use
group by The basic grammar is as follows :
SELECT column_name,column_name FROM table_name
WHERE column_name operator value
GROUP BY column_name
3.1 Separate the heavy ones
according to aid( article ID) duplicate removal , The specific implementation is as follows :
And distinct comparison group by More columns can be displayed , and distinct Only the de duplicated columns can be displayed .
3.2 More than one, more than one
according to aid( article ID) and uid( user ID) Combined weightlessness , The specific implementation is as follows : 
3.3 Aggregate functions + group by
Count each one aid Total quantity ,SQL The implementation is as follows :
As can be seen from the above results , Use group by and distinct Add count The query semantics of is completely different ,distinct + count It counts the total quantity after weight removal , and group by + count Statistics are the total number of each group of data after grouping .
4.distinct and group by The difference between
Official documents describe distinct When it comes to : in the majority of cases distinct It's special group by, As shown in the figure below :
Official document address :https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html But there are still some subtle differences between the two , For example, the following .
difference 1: The query result set is different
When using distinct When you go to heavy duty , In the query result set, only the de duplication information , As shown in the figure below :
When you try to add a non de duplication field ( Inquire about ) when ,SQL An error will be reported, as shown in the figure below :
While using group by Sorting can query one or more fields , As shown in the figure below : 
difference 2: Different business scenarios
To count the total quantity after weight removal, you need to use distinct, And statistical grouping details , Or when adding query criteria on the basis of grouping details , You have to use group by 了 . Use distinct Count the total quantity of a column after weight removal :
The number after statistical grouping is greater than 2 The article , Then use group by 了 , As shown in the figure below : 
difference 3: Different performance
If the de duplicated field has an index , that group by and distinct You can use indexes , In this case, their performance is the same ; and When the de duplicated field has no index ,distinct Performance will be higher than group by, Because in MySQL 8.0 Before ,group by There is a hidden function that will sort by default , This will trigger filesort This leads to reduced query performance .
summary
In most scenes distinct It's special group by, But there are subtle differences between the two , For example, they are on the query result set 、 Specific business scenarios used , And the performance is different .
Reference resources & Acknowledgement
zhuanlan.zhihu.com/p/384840662
It's up to you to judge right and wrong , Disdain is to listen to people , Gain or loss is more important than number .
official account :Java Analysis of the real interview questions
Interview collection :https://gitee.com/mydb/interview
边栏推荐
- 【翻译】供应链安全项目in-toto移至CNCF孵化器
- Digital "new" operation and maintenance of energy industry
- Solution of intelligent management platform for suppliers in hardware and electromechanical industry: optimize supply chain management and drive enterprise performance growth
- 【翻译】云原生观察能力微调查。普罗米修斯引领潮流,但要了解系统的健康状况仍有障碍...
- ACTF 2022圆满落幕,0ops战队二连冠!!
- 第五期个人能力认证考核通过名单公布
- A method of removing text blur based on pixel repair
- 助力安全人才专业素养提升 | 个人能力认证考核第一阶段圆满结束!
- An error occurs when installing MySQL: could not create or access the registry key needed for the
- ModuleNotFoundError: No module named ‘PIL‘解决方法
猜你喜欢

数学知识——高斯消元(初等行变换解方程组)代码实现
![Fast power template for inverse element, the role of inverse element and example [the 20th summer competition of Shanghai University Programming League] permutation counting](/img/dd/c3f4a9c38b156e3a9b9adfd6253773.gif)
Fast power template for inverse element, the role of inverse element and example [the 20th summer competition of Shanghai University Programming League] permutation counting

MRO工业品企业采购系统:如何精细化采购协同管理?想要升级的工业品企业必看!

Meilu biological IPO was terminated: the annual revenue was 385million, and Chen Lin was the actual controller
深入分析,Android面试真题解析火爆全网

Yutai micro rushes to the scientific innovation board: Huawei and Xiaomi fund are shareholders to raise 1.3 billion

ROS custom message publishing subscription example

Tongyu Xincai rushes to Shenzhen Stock Exchange: the annual revenue is 947million Zhang Chi and Su Shiguo are the actual controllers

Reflection and illegalaccessexception exception during application

黑马--Redis篇
随机推荐
First day of rhcsa study
Swagger2 reports an error illegal DefaultValue null for parameter type integer
The second day of rhcsa study
10 schemes to ensure interface data security
usb host 驱动 - UVC 掉包
R language ggplot2 visualization: use the ggstripchart function of ggpubr package to visualize the grouped dot strip plot, and set the add parameter to add box plots for different levels of dot strip
R language uses the order function to sort the dataframe data, and descending sorting based on a single field (variable)
test about BinaryTree
中缀表达式转后缀表达式详细思路及代码实现
学习探索-函数防抖
Elastic search indexes are often deleted [closed] - elastic search indexes gets deleted frequently [closed]
Test technology stack arrangement -- self cultivation of test development engineers
【计算情与思】扫地僧、打字员、信息恐慌与奥本海默
Benefit a lot, Android interview questions
Abstract classes and abstract methods
包装行业商业供应链管理平台解决方案:布局智慧供应体系,数字化整合包装行业供应链
R language ggplot2 visual time series histogram: visual time series histogram through two-color gradient color matching color theme
接雨水问题解析
Live broadcast today | the 2022 Hongji ecological partnership conference of "Renji collaboration has come" is ready to go
【翻译】数字内幕。KubeCon + CloudNativeCon在2022年欧洲的选择过程