当前位置:网站首页>Why, when you added a unique index or create duplicate data?
Why, when you added a unique index or create duplicate data?
2022-08-04 15:08:00 【InfoQ】
前言
Some time ago I stepped over a pit:在
mysql8
的一张
innodb
引擎的
表
中,加了
唯一索引
,但最后发现
数据
竟然还是
重复
了.
到底怎么回事呢?
In this paper, through a trample pit experience,About the only index,Some interesting knowledge.
1.还原问题现场
前段时间,In order to prevent commodity group repeated data,I added a special
防重表
.
Commodity group in the way of weight on the table.
具体表结构如下:
CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
为了保证数据的
唯一性
,I give the commodity group against the heavy table,建了唯一索引:
alter table product_group_unique add unique index
ux_category_unit_model(category_id,unit_id,model_hash);
According to the classification number、Unit number and the commodity set of propertieshash值,Can only determine a commodity group.
To commodity group against the heavy table creation
唯一索引
之后,The next day to check the data,Found in this table actually produced a repeat data:
Article 2 and article 3 of the data in the table to repeat.
这是为什么呢?
2.The only index field containsnull
If you look closely at the data in the table,Will find it a special place:The commodity set of propertieshash值(model_hash字段)可能为
null
,The commodity groups allow don't configure any attribute.
在product_group_unique表中插入了一条model_hash字段等于100的重复数据:
执行结果:
从上图中看出,mysqlThe uniqueness of the constraint effect,Duplicate data be stopped.
接下来,We will insert the twomodel_hash为null的数据,One third of the data with the secondcategory_id、unit_id和model_hashThe field values are the same.
从图中看出,Actually perform a success.
换句话说,If the only index field,出现了null值,The uniqueness constraints will not take effect.
Finally inserted data is the same:
- 当model_hash字段不为空时,Won't produce repetitive data.
- 当model_hash字段为空时,Generates repetitive data.
我们需要特别注意:Create unique index field,Are not allowed tonull,否则mysqlUniqueness constraints may failure.
3.Logic delete table with unique index
We all know that the only index is very simple to use,但有时候,Add in the table it is not good.
不信,我们一起往下看.
通常情况下,要删除表的某条记录的话,如果用
delete
语句操作的话.
例如:
delete from product where id=123;
这种delete操作是
物理删除
,即该Record to be deleted after,后续通过sql语句基本查不出来.(不过通过其他技术手段可以找回,那是后话了)
Another is
逻辑删除
,主要是通过
update
语句操作的.
例如:
update product set delete_status=1,edit_time=now(3)
where id=123;
逻辑删除需要在表中额外增加一个删除状态字段,用于记录数据是否被删除.在所有的业务查询的地方,都需要过滤掉已经删除的数据.
通过这种方式删除数据之后,数据任然还在表中,只是从逻辑上过滤了删除状态的数据而已.
其实对于这种逻辑删除的表,是没法加唯一索引的.
为什么呢?
假设之前给商品表中的
name
和
model
加了唯一索引,如果用户把某条记录删除了,delete_status设置成1了.后来,该用户发现不对,又重新添加了一模一样的商品.
由于唯一索引的存在,该用户第二次添加商品会失败,即使该商品已经被删除了,也没法再添加了.
这个问题显然有点严重.
有人可能会说:把
name
、
model
和
delete_status
Three fields at the same time make it
唯一索引
不就行了?
答:这样做确实可以解决用户逻辑删除了某个商品,后来又重新添加相同的商品时,添加不了的问题.但如果第二次添加的商品,又被删除了.该用户第三次添加相同的商品,不也出现问题了?
由此可见,如果表中有逻辑删除功能,是不方便创建唯一索引的.
But if you really want to contain logic delete table,增加唯一索引,该怎么办呢?
3.1 删除状态+1
通过前面知道,如果表中有逻辑删除功能,是不方便创建唯一索引的.
其根本原因是,Record to be deleted after,delete_status会被设置成1,默认是0.The same record when the delete the second time,delete_status被设置成1,But due to create a unique index(把name、model和delete_statusThree fields at the same time make it unique index),数据库中已存在delete_status为1的记录,So this time will fail.
Why don't we change a think:不要纠结于delete_status为1,表示删除,当delete_status为1、2、3等等,只要大于1都表示删除.
这样的话,The biggest delete every delete access to the same record state,然后加1.
Such data operation process into:
- 添加记录a,delete_status=0.
- 删除记录a,delete_status=1.
- 添加记录a,delete_status=0.
- 删除记录a,delete_status=2.
- 添加记录a,delete_status=0.
- 删除记录a,delete_status=3.
由于记录a,每次删除时,delete_status都不一样,So can guarantee uniqueness.
该方案的优点是:Don't have to adjust the field,非常简单和直接.
缺点是:可能需要修改sql逻辑,Especially some querysql语句,有些使用delete_status=1Judge delete state,需要改成delete_status>=1.
3.2 增加时间戳字段
Lead to logic delete table,Is bad and the only index in logic delete where most fundamental.
Why don't we add a field,The function of the specialized processing logic delete?
答:可以增加
时间戳
字段.
把name、model、delete_status和timeStamp,Four fields at the same time make it unique index
在添加数据时,timeStampFields to default values
1
.
Then deleted once has the logic operation,Automatically to the write timestamp field.
Even the same records,Logic delete many times,Each generation of timestamp is different also,Also can guarantee the uniqueness of data.
The time stamp generally accurate to
秒
.
Except in the limit of concurrent scenarios,对同一条记录,Two different logic delete,Have the same timestamp.
At this moment the timestamp can be accurate to
毫秒
.
该方案的优点是:Can be without changing existing code logic on the basis of,The uniqueness of by adding new fields to achieve the data.
缺点是:在极限的情况下,Is likely to produce duplicate data.
3.3 增加id字段
其实,The problem can be solved by increasing timestamp field basic.But in under the condition of the limit,Is likely to produce duplicate data.
有没有办法解决这个问题呢?
答:增加
主键
字段:delete_id.
Thinking of the scheme to increase the timestamp field consistent,In the add data todelete_id设置默认值1,And then when logic delete,给delete_idAssignment to the current record the primary key of theid.
把name、model、delete_status和delete_id,Four fields at the same time make it unique index.
This is probably the most optimum scheme,The need to modify existing delete logic,Also can guarantee the uniqueness of data.
4. How to repeat history data and unique index?
So if you have any logic in the table to delete chat in front of the function,Not too good and unique index,But through in this paper, three kinds of scheme,Can add a unique index smooth.
But from the soul asking:If a particular form of,已存在
History repeated data
,How to add index?
最简单的做法是,增加一张
防重表
,Then the data initialization in.
Can write an article like thissql:
insert into product_unqiue(id,name,category_id,unit_id,model)
select max(id), select name,category_id,unit_id,model from product
group by name,category_id,unit_id,model;
Doing so may be can,But today's topic is directly in the original table to add unique index,Without the heavy table.
那么,This unique index how to?
Actually can be a reference section,增加
id
The thinking of field.
增加一个delete_id字段.
But toproductTable to create a unique index before,To do data processing.
Access to the same record the biggestid:
select max(id), select name,category_id,unit_id,model from product
group by name,category_id,unit_id,model;
然后将delete_id字段设置成1.
Then will the other of the same record ofdelete_id字段,Set to the current primary key.
So you can distinguish between history of duplicate data.
当所有的delete_idFields after setting the value,就能给name、model、delete_status和delete_id,Four fields added unique index.
完美.
5.Add the only index to larger field
接下来,We talk about an interesting topic:How to increase the only larger field index.
有时候,We need to add a unique index to several fields at the same time,比如给name、model、delete_status和delete_id等.
但如果model字段很大,This will lead to the unique index,May occupy more storage space.
We all know that the only index,也会走索引.
If the index of each node in big data,The retrieval efficiency is very low.
由此,It is necessary for the only index do limit length.
目前mysql innodbIn the storage engine indexes allow maximum length is3072 bytes,其中unqiue key最大长度是1000 bytes.
If the field is too big,超过了1000 bytes,Obviously can't add unique index.
此时,有没有解决办法呢?
5.1 增加hash字段
我们可以增加一个hash字段,Take big fieldhash值,Generate a new value of the short.This value can be through somehash算法生成,固定长度16位或者32位等.
我们只需要给name、hash、delete_status和delete_id字段,增加唯一索引.
This helps to avoid the only index too long problem.
But it also brings a new problem:
一般hash算法会产生hash冲突,The two different value,通过hashAlgorithm to generate the same value.
Of course, if there are other fields can distinguish,比如:name,And allow this repetitive data on business,不写入数据库,该方案也是可行的.
5.2 Not the only index
If it is no good to add unique index,Is not the only index,Through other technological means to guarantee uniqueness.
If the entry of the new data is less,比如只有job,Or data import,Can a single threaded order,So you can ensure that the data in the table don't repeat.
If the entry of the new data more,Finally all hairmq消息,在mqConsumers in single thread processing.
5.3 redis分布式锁
Because the field is too big,在mysqlThe bad and the only index,为什么不用
redis分布式锁
呢?
But if directly added toname、model、delete_status和delete_id字段,加
redis分布式锁
,Obviously do not have what meaning,效率也不会高.
我们可以结合5.1章节,用name、model、delete_status和delete_id字段,生成一个hash值,And then to the new value lock.
即使遇到hash冲突也没关系,在并发的情况下,毕竟是小概率事件.
6.批量插入数据
Some boys,可能认为,既然有redis分布式锁了,Can not the only index.
那是你没遇到,批量插入数据的场景.
If after the query operation,Found a collection:list的数据,Need bulk insert database.
如果使用redis分布式锁,需要这样操作:
for(Product product: list) {
try {
String hash = hash(product);
rLock.lock(hash);
//查询数据
//插入数据
} catch (InterruptedException e) {
log.error(e);
} finally {
rLock.unlock();
}
}
Need in a loop,For each of the data is locked.
This performance will surely not good.
Of course some friend opposed,说使用redis的
pipeline
Batch operation is not ok?
Also is a one-time give500条,或者1000The data is locked,Finally use the one-time release the lock?
Think a bit wonky,How much this locked so ah.
Very easy to cause the lock timeout,Such as business code are not performed,Lock the expiration time has come to.
For this batch operation,如果此时使用mysql的唯一索引,直接批量insert即可,一条sql语句就能搞定.
Database automatically judge,如果存在重复的数据,会报错.If there is no duplicate data,Allowed to insert the data.
边栏推荐
猜你喜欢
随机推荐
Next -21- 添加相册系列 - 1- 框架设置
动态数组底层是如何实现的
【剑指offer33】二叉搜索树的后序遍历序列
杭电校赛(逆袭指数)
Go 语言快速入门指南: 变量和常量
代码随想录笔记_动态规划_1049最后一块石头的重量II
【Web技术】1401- 图解 Canvas 入门
PTA 6-2 多项式求值
leetcode: 250. Count subtrees of equal value
Android Sqlite3基本命令
Next -18- 添加代码复制按钮
洛谷题解P4326 求圆的面积
FRED Application: Capillary Electrophoresis System
2022杭电多校3
Flutter 运动鞋商铺小demo
2022杭电多校4
Oracle 数据库用户创建、重启、导入导出
Roslyn 通过 nuget 统一管理信息
1403. 非递增顺序的最小子序列
卖家寄卖流程梳理