当前位置:网站首页>Case analysis of data inconsistency caused by Pt OSC table change

Case analysis of data inconsistency caused by Pt OSC table change

2022-07-06 11:22:00 wx5caecf2ed0645


We usually solve our own problems , Sometimes I help people in the circle , Do some troubleshooting , This case is to help a company DBA Failure analysis conducted , Because it's typical , Let's share , But it's just sharing what happened , Do not make too much evaluation on the occurrence of this case and how to avoid it !


pt-online-schema-change: It is online for big tables alter operation , And try to avoid affecting online business , This is the best mysql One of management work , In normal work , Help us win more .

Environmental statement


             pt-osc edition :percona-toolkit-2.2.14

              mysql edition : percona-server-5.5

              Database architecture : Double master replication ( This time pt-osc The table change is performed on the main database that is not online )

Problem description pt-osc Case analysis of data inconsistency caused by table change _ data

One day, I received help from friends in the circle , Feedback use pt-online-schema-change  Adding a field but causing an unexpected deadlock , And there may be a problem with the data , Brother, I can't think of riding. Sister hopes I can help analyze . However, due to the online environment, it is impossible to test and reproduce , Therefore, only the engine log at the time of deadlock is given ( perform  SHOW ENGINE  innodb STATUS  see ).      

Let's take a look at the logs of the storage engine at that time , Only transaction related logs are intercepted here for convenience , Other log information is skipped , The specific logs are as follows :

TRANSACTION1


*** (1) TRANSACTION: 

TRANSACTION 107BF2CDD, ACTIVE 1 sec setting auto-inc lock

mysql tables in use 2, locked 2

LOCK WAIT 4 lock struct(s), heap size 1248, 1 row lock(s), undo log entries 2

MySQL thread id 6, OS thread handle 0x7fd210190700, query id 1080843123 Reading event from the relay log 

*** (1) WAITING FOR THIS LOCK TO BE GRANTED: 

TABLE LOCK table `redcliff`.`_rider_new` trx id 107BF2CDD lock mode AUTO-INC waiting 

Here we can read two messages :

1: The transaction is from relaylog  Read log                                                      2: Business 1( Business id by 107BF2CDD) Is waiting for _rider_new surface AUTO-INC lock

TRANSACTION2


*** (2) TRANSACTION:

TRANSACTION 107BF2CDC, ACTIVE 1 sec fetching rows

mysql tables in use 2, locked 2

253 lock struct(s), heap size 31160, 10864 row lock(s), undo log entries 10616

MySQL thread id 22433333, OS thread handle 0x7fc781b16700, query id 1080843120 127.0.0.1 dwbdba_mgr Sending data

INSERT LOW_PRIORITY IGNORE INTO `redcliff`.`_rider_new` 

************************************( Omitted )

    `frozen_provision`, `bloc…. LOCK IN SHARE MODE  /*pt-online-schema-change 18153 copy nibble*/

*** (2) HOLDS THE LOCK(S): 

TABLE LOCK table `redcliff`.`_rider_new` trx id 107BF2CDC lock mode AUTO-INC 

*** (2) WAITING FOR THIS LOCK TO BE GRANTED: 

RECORD LOCKS space id 636 page no 4599 n bits 112 index `PRIMARY` of table 

`redcliff`.`rider` trx id 107BF2CDC lock mode S waiting 

*** WE ROLL BACK TRANSACTION (1) 

We can read the following information :

1、 Business 2( Business id  by 107BF2CDC) Hold the watch _rider_new  Of auto-inc  Self increasing lock  

2、 Business 2 wait for rider surface S lock                                             

3、pt-osc Tool pass LOCK IN SHARE MODE To read the current read, you also need to ensure that other concurrent transactions cannot modify the currently read records , Ensure the new and old data 100% Agreement , Therefore, add S lock

Through the information read above, we analyze as follows :


Business 1

1、Reading event from the relay log  To perform the rider The modification of table

( Here, through analysis afterwards relaylog Confirm that it is right rider Table changes )

      Therefore, it is held rider On the table record x lock

 2、 wait for _rider_new  On the table auto-inc lock

( notes :pt-osc When the tool modifies the table, it will create three triggers for adding, deleting and modifying the table . so rider There are already three triggers on the table , And right rider Tabular update,insert After the action trigger is triggered, it will be converted to _rider_new On the table replace  operation , There is self increase id On your watch replace Operation will generate new self increment id value )

Business 2

1、INSERT LOW_PRIORITY IGNORE INTO `redcliff`.`_rider_new` (`id`, `city_id`,

This statement needs to go to _rider_new The table writes data in batches , Here already hold  _rider_new  On the table auto-inc lock,  From the above analysis, we can see that transactions need to wait rider  Shared read lock on the table !



By cutting out the superfluous pt-osc Case analysis of data inconsistency caused by table change _ data



Business one :

            hold :rider On the table record x lock

            wait for :rider_new On the table auto-inc lock

Business two :

            hold :_rider_new On the table auto-inc lock

            wait for :rider On the table S lock

Perfect deadlock




Finally, rollback the transaction 1( That is, the copy update operation is rolled back , The master and slave data are inconsistent )



My point of view

In the above analysis , We come to the conclusion that ,pt-osc Tools in some cases , Data inconsistency may be caused by deadlock rollback , According to the principle , We can't avoid , Only try to alleviate ( for example : --chunk-size Parameters Set smaller , Or in TPS Great online does not use pt-osc), stay mysql online ddl  The development is not perfect , Believe in the present mysql DBA  The mainstream of table modification tools used online is still pt-online-schema-change , So I hope that through this sharing, we can reduce the number of pits , Go home early and sleep well .


原网站

版权声明
本文为[wx5caecf2ed0645]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131626548502.html