当前位置:网站首页>Case analysis of data inconsistency caused by Pt OSC table change
Case analysis of data inconsistency caused by Pt OSC table change
2022-07-06 11:22:00 【wx5caecf2ed0645】
We usually solve our own problems , Sometimes I help people in the circle , Do some troubleshooting , This case is to help a company DBA Failure analysis conducted , Because it's typical , Let's share , But it's just sharing what happened , Do not make too much evaluation on the occurrence of this case and how to avoid it !
pt-online-schema-change: It is online for big tables alter operation , And try to avoid affecting online business , This is the best mysql One of management work , In normal work , Help us win more .
Environmental statement
pt-osc edition :percona-toolkit-2.2.14
mysql edition : percona-server-5.5
Database architecture : Double master replication ( This time pt-osc The table change is performed on the main database that is not online )
Problem description
One day, I received help from friends in the circle , Feedback use pt-online-schema-change Adding a field but causing an unexpected deadlock , And there may be a problem with the data , Brother, I can't think of riding. Sister hopes I can help analyze . However, due to the online environment, it is impossible to test and reproduce , Therefore, only the engine log at the time of deadlock is given ( perform SHOW ENGINE innodb STATUS see ).
Let's take a look at the logs of the storage engine at that time , Only transaction related logs are intercepted here for convenience , Other log information is skipped , The specific logs are as follows :
TRANSACTION1
*** (1) TRANSACTION:
TRANSACTION 107BF2CDD, ACTIVE 1 sec setting auto-inc lock
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1248, 1 row lock(s), undo log entries 2
MySQL thread id 6, OS thread handle 0x7fd210190700, query id 1080843123 Reading event from the relay log
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `redcliff`.`_rider_new` trx id 107BF2CDD lock mode AUTO-INC waiting
Here we can read two messages :
1: The transaction is from relaylog Read log 2: Business 1( Business id by 107BF2CDD) Is waiting for _rider_new surface AUTO-INC lock
TRANSACTION2
*** (2) TRANSACTION:
TRANSACTION 107BF2CDC, ACTIVE 1 sec fetching rows
mysql tables in use 2, locked 2
253 lock struct(s), heap size 31160, 10864 row lock(s), undo log entries 10616
MySQL thread id 22433333, OS thread handle 0x7fc781b16700, query id 1080843120 127.0.0.1 dwbdba_mgr Sending data
INSERT LOW_PRIORITY IGNORE INTO `redcliff`.`_rider_new`
************************************( Omitted )
`frozen_provision`, `bloc…. LOCK IN SHARE MODE /*pt-online-schema-change 18153 copy nibble*/
*** (2) HOLDS THE LOCK(S):
TABLE LOCK table `redcliff`.`_rider_new` trx id 107BF2CDC lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 636 page no 4599 n bits 112 index `PRIMARY` of table
`redcliff`.`rider` trx id 107BF2CDC lock mode S waiting
*** WE ROLL BACK TRANSACTION (1)
We can read the following information :
1、 Business 2( Business id by 107BF2CDC) Hold the watch _rider_new Of auto-inc Self increasing lock
2、 Business 2 wait for rider surface S lock
3、pt-osc Tool pass LOCK IN SHARE MODE To read the current read, you also need to ensure that other concurrent transactions cannot modify the currently read records , Ensure the new and old data 100% Agreement , Therefore, add S lock
Through the information read above, we analyze as follows :
Business 1
1、Reading event from the relay log To perform the rider The modification of table
( Here, through analysis afterwards relaylog Confirm that it is right rider Table changes )
Therefore, it is held rider On the table record x lock
2、 wait for _rider_new On the table auto-inc lock
( notes :pt-osc When the tool modifies the table, it will create three triggers for adding, deleting and modifying the table . so rider There are already three triggers on the table , And right rider Tabular update,insert After the action trigger is triggered, it will be converted to _rider_new On the table replace operation , There is self increase id On your watch replace Operation will generate new self increment id value )
Business 2
1、INSERT LOW_PRIORITY IGNORE INTO `redcliff`.`_rider_new` (`id`, `city_id`,
This statement needs to go to _rider_new The table writes data in batches , Here already hold _rider_new On the table auto-inc lock, From the above analysis, we can see that transactions need to wait rider Shared read lock on the table !
By cutting out the superfluous
Business one :
hold :rider On the table record x lock
wait for :rider_new On the table auto-inc lock
Business two :
hold :_rider_new On the table auto-inc lock
wait for :rider On the table S lock
Perfect deadlock
Finally, rollback the transaction 1( That is, the copy update operation is rolled back , The master and slave data are inconsistent )
My point of view
In the above analysis , We come to the conclusion that ,pt-osc Tools in some cases , Data inconsistency may be caused by deadlock rollback , According to the principle , We can't avoid , Only try to alleviate ( for example : --chunk-size Parameters Set smaller , Or in TPS Great online does not use pt-osc), stay mysql online ddl The development is not perfect , Believe in the present mysql DBA The mainstream of table modification tools used online is still pt-online-schema-change , So I hope that through this sharing, we can reduce the number of pits , Go home early and sleep well .
边栏推荐
- Database advanced learning notes -- SQL statement
- Error connecting to MySQL database: 2059 - authentication plugin 'caching_ sha2_ The solution of 'password'
- [download app for free]ineukernel OCR image data recognition and acquisition principle and product application
- Why can't I use the @test annotation after introducing JUnit
- [recommended by bloggers] asp Net WebService background data API JSON (with source code)
- QT creator uses Valgrind code analysis tool
- Rhcsa certification exam exercise (configured on the first host)
- How to configure flymcu (STM32 serial port download software) is shown in super detail
- Software testing and quality learning notes 3 -- white box testing
- 一键提取pdf中的表格
猜你喜欢
机器学习--人口普查数据分析
Error connecting to MySQL database: 2059 - authentication plugin 'caching_ sha2_ The solution of 'password'
[recommended by bloggers] C # generate a good-looking QR code (with source code)
Basic use of redis
AcWing 242. A simple integer problem (tree array + difference)
02-项目实战之后台员工信息管理
图像识别问题 — pytesseract.TesseractNotFoundError: tesseract is not installed or it‘s not in your path
QT creator create button
Introduction and use of automatic machine learning framework (flaml, H2O)
Windows cannot start the MySQL service (located on the local computer) error 1067 the process terminated unexpectedly
随机推荐
The virtual machine Ping is connected to the host, and the host Ping is not connected to the virtual machine
Some notes of MySQL
Install mongdb tutorial and redis tutorial under Windows
Deoldify项目问题——OMP:Error#15:Initializing libiomp5md.dll,but found libiomp5md.dll already initialized.
neo4j安装教程
Idea import / export settings file
Remember the interview algorithm of a company: find the number of times a number appears in an ordered array
01项目需求分析 (点餐系统)
AI benchmark V5 ranking
Cookie setting three-day secret free login (run tutorial)
学习问题1:127.0.0.1拒绝了我们的访问
[Thesis Writing] how to write function description of jsp online examination system
[recommended by bloggers] C WinForm regularly sends email (with source code)
AcWing 1298. Solution to Cao Chong's pig raising problem
自动机器学习框架介绍与使用(flaml、h2o)
L2-001 紧急救援 (25 分)
01 project demand analysis (ordering system)
SSM integrated notes easy to understand version
Navicat 導出錶生成PDM文件
Ansible实战系列三 _ task常用命令