当前位置:网站首页>[first release in the whole network] (tips for big tables) sometimes it takes only 1 minute for 2 hours of SQL operation
[first release in the whole network] (tips for big tables) sometimes it takes only 1 minute for 2 hours of SQL operation
2022-07-05 10:55:00 【Heapdump performance community】
Hello everyone , I am a yes.
Last article About a 5 I and DBA Of battle After that , Several friends came to ask me SQL How did you dismantle it .
Let's briefly discuss this article , Actually dismantle SQL It is because it involves the deletion of large tables .
such as , Now you need to delete a total 5 In the table of billion data 2021 Annual data , Suppose this table is called yes.
I believe your brain is 1s This one will definitely pop out of the SQL :
delete from yes where create_date > "2020-12-31" and create_date < "2022-01-01";
If you directly execute this SQL What's going to happen ?
Long business
We need to pay attention to a premise : This watch has 5 Billion of data , So it's a huge watch , So the where Conditions can involve a lot of data , So we can check the data volume from the offline data warehouse or standby database , Then we found this SQL Will delete 3 About 100 million data .
So one time delete The finished plan is not good , Because it's going to involve The problem of long affairs .
Long transactions involve locking , The lock will only be released after the transaction is completed , A lot of data is locked due to long transactions , If there are frequent DML Want to manipulate these data , Then it will cause congestion .
The connection is blocked , Business threads naturally block , That is to say, your service thread is waiting for the response of the database , Then it may affect other services , Avalanches may occur , So he GG 了 .
Long transactions may cause master-slave delays , Think about the main database that has been implemented for a long time , Only after the execution of the slave Library , It will take a long time to replay from the Library , Data may be out of sync for a long time .
There's another situation , Businesses have a special downtime window , You think you can do whatever you want , Then start to execute long affairs , And then we did 5 Hours later , I don't know what's wrong , Transaction rolled back , So waste 5 Hours , We have to start over .
Sum up , We need to avoid long transactions .
In the face of possible problems SQL How can we dismantle it ?
Demolition SQL
Let's take the above one SQL For example :
delete from yes where create_date > "2020-12-31" and create_date < "2022-01-01";
See this one SQL, If you want to split , Presumably, many friends will find it very simple , According to the date, it's over ?
delete from yes where create_date > "2020-12-31" and create_date < "2021-02-01";
delete from yes where create_date >= "2021-02-01" and create_date < "2021-03-01";
......
Of course it can , congratulations , You have split it successfully , Yes, it's that simple .
however , If create_date What if there is no index ?
If there is no index , The above table is scanned ?
The impact is not big , Without index, we will create index conditions for him , This condition is the primary key .
Let's go straight to one select min(id)... and select max(id).... Get the minimum and maximum primary key values of this table , Suppose the answer is 233333333 and 666666666.
Then we can start the operation :
delete from yes where (id >= 233333333 and id < 233433333) and create_date > "2020-12-31" and create_date < "2022-01-01";
delete from yes where (id >= 233433333 and id <233533333) and create_date > "2020-12-31" and create_date < "2022-01-01";
......
delete from yes where (id >= 666566666 and id <=666666666) and create_date > "2020-12-31" and create_date < "2022-01-01";
Of course, you can also be more precise , Get through date filtering maxId, This has little effect ( Not satisfying the conditions SQL Fast execution , It won't take much time ).
thus SQL It meets the batch operation , And it can be indexed .
If any statement fails to execute , Only a small part of the data will be rolled back , Let's check again , The impact is not big .
And split SQL Then you can Improve execution efficiency in parallel .
Of course, my previous article said , There may be lock contention in parallel , Cause individual statements to wait for timeout . But it doesn't matter , As long as the machine is in good condition , Fast execution , Waiting caused by lock competition does not necessarily timeout , If individual SQL If the timeout , Just re execute .
Sometimes you have to change your mind
Sometimes we need to change our thinking about deleting large tables , Turn deletion into insertion .
Suppose there is still one 5 Billion tables , At this point, you need to delete the inside 4.8 Billion of data , Then don't think about deleting at this time , Think about inserting .
It's simple , Delete 4.8 Billion of data , It's better to take what you want 2000W Insert into a new table , Just use the new table directly for our later business .
The two data volumes are compared , The difference in time efficiency is self-evident ?
The specific operation is also simple :
Create a new table , be known as yes_temp take yes Tabular 2000W data select into To yes_temp in take yes surface rename become yes_233 take yes_temp surface rename become yes
Civet for Prince , It's done !
There was a record table before, and that's how we operate , Just select into The data of the past month is added to the new table , Old data used to be ignored , then rename once , It's very fast ,1 It will be done in minutes .
This kind of similar operation has tools , such as pt-online-schema-change etc. , But I didn't use it , If you are interested, you can go and have a look , It's the same thing , A few more triggers , I won't go into more details here .
Last
We still have to learn more about the operation and principle of database in development , Because many database operations need you to do it yourself , Small companies don't DBA Don't say anything , We don't know about big companies DBA To what extent will you care , You still have to rely on yourself .
I looked through my previous articles , As if MySQL Most of them are related , If you are interested, you can take a look at this collection :MySQL Collection
However, there should be many articles that have not been added to this collection , I'll tidy up when I'm free .
Welcome to follow my personal public number :【yes Training strategy 】
See you here , If you are interested in what I write , Have any questions , Welcome to leave a message below , I will give you an answer for the first time , thank you !
I am a yes, From a little to a hundred million, I'll see you in the next chapter ~
边栏推荐
猜你喜欢
matlab cov函数详解
2022 Pengcheng cup Web
2022年化工自动化控制仪表考试试题及在线模拟考试
2022 mobile crane driver examination question bank and simulation examination
csdn软件测试入门的测试基本流程
Based on shengteng AI Aibi intelligence, we launched a digital solution for bank outlets to achieve full digital coverage of information from headquarters to outlets
9、 Disk management
Go language learning notes - first acquaintance with go language
Go-3-the first go program
Honing · fusion | know that the official website of Chuangyu mobile terminal is newly launched, and start the journey of digital security!
随机推荐
数组、、、
MFC宠物商店信息管理系统
Taro advanced
Node の MongoDB Driver
Web3 Foundation grant program empowers developers to review four successful projects
LDAP overview
Operation of simulated examination platform of special operation certificate examination question bank for safety production management personnel of hazardous chemical production units in 2022
In the year of "mutual entanglement" of mobile phone manufacturers, the "machine sea tactics" failed, and the "slow pace" playing method rose
[vite] 1371 - develop vite plug-ins by hand
Nine degrees 1480: maximum ascending subsequence sum (dynamic programming idea for the maximum value)
PWA (Progressive Web App)
2022年流动式起重机司机考试题库及模拟考试
go语言学习笔记-初识Go语言
What are the top ten securities companies? Is it safe to open an account online?
Function///
Crawler (9) - scrape framework (1) | scrape asynchronous web crawler framework
When using gbase 8C database, an error is reported: 80000502, cluster:%s is busy. What's going on?
Pull up loading principle
Cross page communication
Go语言-1-开发环境配置