当前位置:网站首页>Interview assault 63: how to remove duplication in MySQL?
Interview assault 63: how to remove duplication in MySQL?
2022-07-06 19:01:00 【JAVA Chinese community】
stay MySQL in , There are two most common methods of weight removal : Use distinct Or use group by, What's the difference between them ? Let's take a look at .
1. Create test data
-- Create test table
drop table if exists pageview;
create table pageview(
id bigint primary key auto_increment comment ' Since the primary key ',
aid bigint not null comment ' article ID',
uid bigint not null comment '( visit ) user ID',
createtime datetime default now() comment ' Creation time '
) default charset='utf8mb4';
-- Add test data
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(1,1);
insert into pageview(aid,uid) values(2,1);
insert into pageview(aid,uid) values(2,2);
The final display effect is as follows :
2.distinct Use
distinct The basic grammar is as follows :
SELECT DISTINCT column_name,column_name FROM table_name;
2.1 Separate the heavy ones
We use first distinct Realize single column weight removal , according to aid( article ID) duplicate removal , The specific implementation is as follows :
2.2 More than one, more than one
In addition to single train weight removal ,distinct It also supports multiple columns ( Two or more trains ) duplicate removal , We according to the aid( article ID) and uid( user ID) Combined weightlessness , The specific implementation is as follows :
2.3 Aggregate functions + duplicate removal
Use distinct + Aggregate function de duplication , Calculation aid Total number of strips after weight removal , The specific implementation is as follows :
3.group by Use
group by The basic grammar is as follows :
SELECT column_name,column_name FROM table_name
WHERE column_name operator value
GROUP BY column_name
3.1 Separate the heavy ones
according to aid( article ID) duplicate removal , The specific implementation is as follows : And distinct comparison group by More columns can be displayed , and distinct Only the de duplicated columns can be displayed .
3.2 More than one, more than one
according to aid( article ID) and uid( user ID) Combined weightlessness , The specific implementation is as follows :
3.3 Aggregate functions + group by
Count each one aid Total quantity ,SQL The implementation is as follows : As can be seen from the above results , Use group by and distinct Add count The query semantics of is completely different ,distinct + count It counts the total quantity after weight removal , and group by + count Statistics are the total number of each group of data after grouping .
4.distinct and group by The difference between
Official documents describe distinct When it comes to : in the majority of cases distinct It's special group by, As shown in the figure below : Official document address :dev.mysql.com/doc/refman/… But there are still some subtle differences between the two , For example, the following .
difference 1: The query result set is different
When using distinct When you go to heavy duty , In the query result set, only the de duplication information , As shown in the figure below : When you try to add a non de duplication field ( Inquire about ) when ,SQL An error will be reported, as shown in the figure below :
While using group by Sorting can query one or more fields , As shown in the figure below :
difference 2: Different business scenarios
To count the total quantity after weight removal, you need to use distinct, And statistical grouping details , Or when adding query criteria on the basis of grouping details , You have to use group by 了 . Use distinct Count the total quantity of a column after weight removal : The number after statistical grouping is greater than 2 The article , Then use group by 了 , As shown in the figure below :
difference 3: Different performance
If the de duplicated field has an index , that group by and distinct You can use indexes , In this case, their performance is the same ; and When the de duplicated field has no index ,distinct Performance will be higher than group by, Because in MySQL 8.0 Before ,group by There is a hidden function that will sort by default , This will trigger filesort This leads to reduced query performance .
summary
In most scenes distinct It's special group by, But there are subtle differences between the two , For example, they are on the query result set 、 Specific business scenarios used , And the performance is different .
Reference resources & Acknowledgement
zhuanlan.zhihu.com/p/384840662
It's up to you to judge right and wrong , Disdain is to listen to people , Gain or loss is more important than number .
official account :Java Analysis of the real interview questions
Interview collection :gitee.com/mydb/interv…
边栏推荐
- How to improve website weight
- This article discusses the memory layout of objects in the JVM, as well as the principle and application of memory alignment and compression pointer
- 视频化全链路智能上云?一文详解什么是阿里云视频云「智能媒体生产」
- The nearest library of Qinglong panel
- Pychrm Community Edition calls matplotlib pyplot. Solution of imshow() function image not popping up
- Oracle advanced (IV) table connection explanation
- test about BinaryTree
- Jdbc driver, c3p0, druid and jdbctemplate dependent jar packages
- 如何提高网站权重
- [Matlab] Simulink 同一模块的输入输出的变量不能同名
猜你喜欢

同宇新材冲刺深交所:年营收9.47亿 张驰与苏世国为实控人
![[paper notes] transunet: transformers make strongencoders for medical image segmentation](/img/21/3d4710024248b62495e2681ebd1bc4.png)
[paper notes] transunet: transformers make strongencoders for medical image segmentation

When visual studio code starts, it prompts "the code installation seems to be corrupt. Please reinstall." Solution to displaying "unsupported" information in the title bar

多线程基础:线程基本概念与线程的创建

On AAE

ORACLE进阶(四)表连接讲解

Xingnuochi technology's IPO was terminated: it was planned to raise 350million yuan, with an annual revenue of 367million yuan

There is a sound prompt when inserting a USB flash disk under win10 system, but the drive letter is not displayed

helm部署etcd集群

Introduction to the use of SAP Fiori application index tool and SAP Fiori tools
随机推荐
根据PPG估算血压利用频谱谱-时间深度神经网络【翻】
能源行业的数字化“新”运维
node の SQLite
驼峰式与下划线命名规则(Camel case With hungarian notation)
R语言使用dt函数生成t分布密度函数数据、使用plot函数可视化t分布密度函数数据(t Distribution)
44 colleges and universities were selected! Publicity of distributed intelligent computing project list
Meilu biological IPO was terminated: the annual revenue was 385million, and Chen Lin was the actual controller
Xu Xiang's wife Ying Ying responded to the "stock review": she wrote it!
Yutai micro rushes to the scientific innovation board: Huawei and Xiaomi fund are shareholders to raise 1.3 billion
Stm32+hc05 serial port Bluetooth design simple Bluetooth speaker
helm部署etcd集群
R语言ggplot2可视化:使用ggpubr包的ggviolin函数可视化小提琴图
AIRIOT物联网平台赋能集装箱行业构建【焊接工位信息监控系统】
基于蝴蝶种类识别
被疫情占据的上半年,你还好么?| 2022年中总结
How word displays modification traces
Human bone point detection: top-down (part of the theory)
Penetration test information collection - App information
关于静态类型、动态类型、id、instancetype
Deep circulation network long-term blood pressure prediction [translation]