当前位置:网站首页>Case analysis of data inconsistency caused by Pt OSC table change
Case analysis of data inconsistency caused by Pt OSC table change
2022-07-06 11:22:00 【wx5caecf2ed0645】
We usually solve our own problems , Sometimes I help people in the circle , Do some troubleshooting , This case is to help a company DBA Failure analysis conducted , Because it's typical , Let's share , But it's just sharing what happened , Do not make too much evaluation on the occurrence of this case and how to avoid it !
pt-online-schema-change: It is online for big tables alter operation , And try to avoid affecting online business , This is the best mysql One of management work , In normal work , Help us win more .
Environmental statement
pt-osc edition :percona-toolkit-2.2.14
mysql edition : percona-server-5.5
Database architecture : Double master replication ( This time pt-osc The table change is performed on the main database that is not online )
Problem description 
One day, I received help from friends in the circle , Feedback use pt-online-schema-change Adding a field but causing an unexpected deadlock , And there may be a problem with the data , Brother, I can't think of riding. Sister hopes I can help analyze . However, due to the online environment, it is impossible to test and reproduce , Therefore, only the engine log at the time of deadlock is given ( perform SHOW ENGINE innodb STATUS see ).
Let's take a look at the logs of the storage engine at that time , Only transaction related logs are intercepted here for convenience , Other log information is skipped , The specific logs are as follows :
TRANSACTION1
*** (1) TRANSACTION:
TRANSACTION 107BF2CDD, ACTIVE 1 sec setting auto-inc lock
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1248, 1 row lock(s), undo log entries 2
MySQL thread id 6, OS thread handle 0x7fd210190700, query id 1080843123 Reading event from the relay log
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `redcliff`.`_rider_new` trx id 107BF2CDD lock mode AUTO-INC waiting
Here we can read two messages :
1: The transaction is from relaylog Read log 2: Business 1( Business id by 107BF2CDD) Is waiting for _rider_new surface AUTO-INC lock
TRANSACTION2
*** (2) TRANSACTION:
TRANSACTION 107BF2CDC, ACTIVE 1 sec fetching rows
mysql tables in use 2, locked 2
253 lock struct(s), heap size 31160, 10864 row lock(s), undo log entries 10616
MySQL thread id 22433333, OS thread handle 0x7fc781b16700, query id 1080843120 127.0.0.1 dwbdba_mgr Sending data
INSERT LOW_PRIORITY IGNORE INTO `redcliff`.`_rider_new`
************************************( Omitted )
`frozen_provision`, `bloc…. LOCK IN SHARE MODE /*pt-online-schema-change 18153 copy nibble*/
*** (2) HOLDS THE LOCK(S):
TABLE LOCK table `redcliff`.`_rider_new` trx id 107BF2CDC lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 636 page no 4599 n bits 112 index `PRIMARY` of table
`redcliff`.`rider` trx id 107BF2CDC lock mode S waiting
*** WE ROLL BACK TRANSACTION (1)
We can read the following information :
1、 Business 2( Business id by 107BF2CDC) Hold the watch _rider_new Of auto-inc Self increasing lock
2、 Business 2 wait for rider surface S lock
3、pt-osc Tool pass LOCK IN SHARE MODE To read the current read, you also need to ensure that other concurrent transactions cannot modify the currently read records , Ensure the new and old data 100% Agreement , Therefore, add S lock
Through the information read above, we analyze as follows :
Business 1
1、Reading event from the relay log To perform the rider The modification of table
( Here, through analysis afterwards relaylog Confirm that it is right rider Table changes )
Therefore, it is held rider On the table record x lock
2、 wait for _rider_new On the table auto-inc lock
( notes :pt-osc When the tool modifies the table, it will create three triggers for adding, deleting and modifying the table . so rider There are already three triggers on the table , And right rider Tabular update,insert After the action trigger is triggered, it will be converted to _rider_new On the table replace operation , There is self increase id On your watch replace Operation will generate new self increment id value )
Business 2
1、INSERT LOW_PRIORITY IGNORE INTO `redcliff`.`_rider_new` (`id`, `city_id`,
This statement needs to go to _rider_new The table writes data in batches , Here already hold _rider_new On the table auto-inc lock, From the above analysis, we can see that transactions need to wait rider Shared read lock on the table !
By cutting out the superfluous 
Business one :
hold :rider On the table record x lock
wait for :rider_new On the table auto-inc lock
Business two :
hold :_rider_new On the table auto-inc lock
wait for :rider On the table S lock
Perfect deadlock
Finally, rollback the transaction 1( That is, the copy update operation is rolled back , The master and slave data are inconsistent )
My point of view
In the above analysis , We come to the conclusion that ,pt-osc Tools in some cases , Data inconsistency may be caused by deadlock rollback , According to the principle , We can't avoid , Only try to alleviate ( for example : --chunk-size Parameters Set smaller , Or in TPS Great online does not use pt-osc), stay mysql online ddl The development is not perfect , Believe in the present mysql DBA The mainstream of table modification tools used online is still pt-online-schema-change , So I hope that through this sharing, we can reduce the number of pits , Go home early and sleep well .
边栏推荐
- AI benchmark V5 ranking
- Swagger、Yapi接口管理服务_SE
- [number theory] divisor
- [ahoi2009]chess Chinese chess - combination number optimization shape pressure DP
- Solution: log4j:warn please initialize the log4j system properly
- SSM integrated notes easy to understand version
- 学习问题1:127.0.0.1拒绝了我们的访问
- 虚拟机Ping通主机,主机Ping不通虚拟机
- Some problems in the development of unity3d upgraded 2020 VR
- AcWing 1298.曹冲养猪 题解
猜你喜欢

【博主推荐】asp.net WebService 后台数据API JSON(附源码)

基于apache-jena的知识问答
![[recommended by bloggers] C WinForm regularly sends email (with source code)](/img/5d/57f8599a4f02c569c6c3f4bcb8b739.png)
[recommended by bloggers] C WinForm regularly sends email (with source code)

MySQL主從複制、讀寫分離

打开浏览器的同时会在主页外同时打开芒果TV,抖音等网站

Dotnet replaces asp Net core's underlying communication is the IPC Library of named pipes

UDS learning notes on fault codes (0x19 and 0x14 services)
![[number theory] divisor](/img/ec/036d7e76cc566c08d336444f2898e1.jpg)
[number theory] divisor

csdn-Markdown编辑器

MySQL主从复制、读写分离
随机推荐
机器学习--人口普查数据分析
[recommended by bloggers] C WinForm regularly sends email (with source code)
JDBC principle
error C4996: ‘strcpy‘: This function or variable may be unsafe. Consider using strcpy_s instead
[蓝桥杯2017初赛]包子凑数
机器学习笔记-Week02-卷积神经网络
Pytorch基础
Julia 1.6 1.7 common problem solving
[ahoi2009]chess Chinese chess - combination number optimization shape pressure DP
The virtual machine Ping is connected to the host, and the host Ping is not connected to the virtual machine
【博主推荐】C#MVC列表实现增删改查导入导出曲线功能(附源码)
Swagger、Yapi接口管理服务_SE
安全测试涉及的测试对象
[蓝桥杯2017初赛]方格分割
Cookie setting three-day secret free login (run tutorial)
Django running error: error loading mysqldb module solution
Error reporting solution - io UnsupportedOperation: can‘t do nonzero end-relative seeks
MySQL master-slave replication, read-write separation
Unable to call numpy in pycharm, with an error modulenotfounderror: no module named 'numpy‘
Did you forget to register or load this tag