当前位置:网站首页>PT OSC deadlock analysis

PT OSC deadlock analysis

2022-07-06 12:12:00 wx5caecf2ed0645


1、 Before going online one day , Take a look at the online SQL list , Found a SQL Large table ( About 30 million records ) Add a field , This table 24 There will be business use every hour , Only in the evening, the operation is not so frequent . In order to reduce the impact on the business , Decide to use pt-online-schema-change Tool to update the table structure .

       
pt-online-schema-change: It is online for big tables alter operation , And try to avoid affecting online business , This is the best mysql One of management work , It is often used in operation and maintenance .
  • 1.

(1) The environment of occurrence :

       
  • 1.

  1. ​percona-toolkit :3.0.4
  2. ​mysql:5.6.27-log
  3. ​isolation:Read committed


(2) Problem description :

       
pt-online-schema-change After the tool finishes changing the table structure , The next day, the developer came over and said to help check the online database around 11 o'clock last night , From the perspective of application error reporting , About half an hour before and after this point, the application reported threeorfour deadlocks , The business side began to swear .
  • 1.

(3)error log

    After hearing what development said , Right now mysql error( It is recommended that each deadlock be recorded error log) Analyze , The deadlock contents are as follows :

       
*** (1) TRANSACTION:
TRANSACTION 164993957148, ACTIVE 3 sec setting auto-inc lock
mysql tables in use 2, locked 2
LOCK WAIT 6 lock struct(s), heap size 1184, 3 row lock(s)
MySQL thread id 189071372, OS thread handle 0x7f9e5e82d700, query id 310630851540 10.10.168.169 prodect_user update
REPLACE INTO `pppppp_ccc`.`_t_good_price_new` (`gid`,`price`,`shop_id`,`updatetime`,...) VALUES (`NEW.gid`,`NEW.price`,`NEW.shop_id`,`NEW.updatetime`,...)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `pppppp_ccc`.`_t_good_price_new` trx id 164993957148 lock mode AUTO-INC waiting

*** (2) TRANSACTION:
TRANSACTION 164993956744, ACTIVE 3 sec fetching rows
mysql tables in use 2, locked 2
53 lock struct(s), heap size 6544, 4024 row lock(s), undo log entries 3481
MySQL thread id 189078502, OS thread handle 0x7f9e539e7700, query id 310630850764 10-10-8-8 10.10.168.168 pt_user Sending data
INSERT LOW_PRIORITY IGNORE INTO `pppppp_ccc`.`_t_good_price_new` (`gid`,`price`,`shop_id`,`updatetime`,...) SELECT `gid`,`price`,`shop_id`,`updatetime`,... FROM `pppppp_ccc`.`t_good_price` FORCE INDEX(`PRIMARY`) WHERE ((`gid` >= '32081571')) AND ((`gid` <= '32087240')) LOCK IN SHARE MODE /*pt-online-schema-change 19391 copy nibble*/
*** (2) HOLDS THE LOCK(S):
TABLE LOCK table `pppppp_ccc`.`_t_good_price_new` trx id 164993956744 lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2905 page no 349991 n bits 224 index `PRIMARY` of table `pppppp_ccc`.`t_good_price` trx id 164993956744 lock mode S locks rec but not gap waiting
Record lock, heap no 109 PHYSICAL RECORD: n_fields 22; compact format; info bits 0

*** WE ROLL BACK TRANSACTION (1)
2017-10-23 23:06:27 7f9e539e7700InnoDB: transactions deadlock detected, dumping detailed information.
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.

2、 analysis

    According to the deadlock information, we can get 2 Messages :

       
  • 1.

  1. ​(1) Transaction one is waiting "_t_good_price_new" Tabular AUTO-INC Table locks ;​

  2. ​(2) Business 2 To obtain the "_t_good_price_new" Of AUTO-INC Table locks , wait for "t_good_price" The record lock of .​

    MySQL The deadlock information record of is not detailed , Just by the two above SQL, It won't cause deadlock at all , Then we can only find out what the whole transaction is SQL Composed of , And the order .

    Business 1 Of replace into sentence , It's obviously running pt-osc Generated by the trigger created , When the records of the original table are updated , Trigger and record with replace Mode to synchronize to the new table , So is there any deadlock in this situation ? The answer is yes .

       
  • 1.

  1. Business 1:
  2. ​(1)t_good_price Update according to conditions , Yes t_good_price Hold exclusive RECORD LOCKS;​
  3. ​(2) The trigger is triggered after the update , And then to replace The way to insert _t_good_price_new surface , Need to be right _t_good_price_new Hold an implicit self incrementing lock .​

       
  • 1.

  1. Business 2:
  2. ​(1) We can know from the last article ,insert into select from Lock sequence of , Business 2 First of all, _t_good_price_new Add the table level self incrementing lock ;​
  3. ​(2) After locking the new watch , Then according to the primary key in the condition id To apply for the original form t_good_price The record lock of .​

  From above , Due to transaction 1 Update the original table first t_good_price, Put an exclusive lock on the updated records , When the trigger has not been triggered , Business 2 Start execution , This is the time for business 2 Now lock the new watch , When it applies for a record level shared lock on the original table , It was found that some records were locked exclusively , So we need to wait . When the transaction 1 Trigger triggered , You need to obtain a self incrementing lock for the new table , Cause a loop , Generate deadlock .

       
  • 1.

  1. ​ Business 1:​
  2. ​ hold :t_good_price On the table record x lock ​
  3. ​ wait for :_t_good_price_new On the table auto-inc lock​
  4. ​ Business 2:​
  5. ​ hold :_t_good_price_new On the table auto-inc lock​
  6. ​ wait for :t_good_price The record on the table S lock ​

    The above conditions are : Business 1 The updated record happens to be a transaction 2 Need to add S The lock , This becomes a perfect deadlock .

    Combined with our business , At that time, many commodity prices needed to be revised , It is also a batch operation , So the above things happened , Interested , You can continue to do the following tests to reproduce .

3、 repeat  

 (1) Preparation conditions :

       
  • 1.


(1) Build table



​create table test_deadlock (id int auto_increment primary key,name varchar(10));​



(2) insert data



​insert into test_deadlock select null,' Zhang San ';​



​insert into test_deadlock select null,' Li Si ';​



​insert into test_deadlock select null,' Wang Wu ';​



​insert into test_deadlock select null,' Chen Liu ';​



​insert into test_deadlock select null,' Panax notoginseng ';​



(3) Build a new watch , simulation pt



​create table _test_deadlock_new like test_deadlock;​



​alter table _test_deadlock_new add column age int default null;​



(4) Build triggers , Only newer ones are built here , simulation pt, Because there are few records , Update soon , So when the trigger triggers sleep 5s, Increase the operational time



​delimiter //​



​create trigger  pt_osc_test_test_deadlock_upd  after update on test.test_deadlock for  each row  begin  declare x int; set x = sleep(5); delete  IGNORE from  test._test_deadlock_new  where  !(OLD.`id` <=> NEW.`id`) and  test._test_deadlock_new.id  <=> OLD.`id`;replace into test._test_deadlock_new  (`id`, `name`)  values  (NEW.`id`, NEW.`name`);END//
delimiter ;
(5) The data in the table



mysql> select * from test_deadlock;

+----+--------+

| id | name |

+----+--------+

| 1 | Zhang San |

| 2 | Li Si |

| 3 | Wang Wu |

| 4 | Chen Liu |

| 5 | Panax notoginseng |

+----+--------+

(2) Operation steps :

session 1:

       
begin;
update test_deadlock set name = ' Dried tangerine or orange peel ' where id = 4;
  • 1.
  • 2.

session 2( Be careful , Need to be in 5s Internal operation ):

       
begin;
insert into _test_deadlock_new(id,name) select id,name from test_deadlock where id > 2 and id <5 lock in share mode;
  • 1.
  • 2.

    5s As soon as the clock arrives ,session 1 It's a mistake , The information is as follows :

       
mysql> update test_deadlock set name = ' Dried tangerine or orange peel ' where id = 4;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
  • 1.
  • 2.

    The lock information is as follows :

       
*** (1) TRANSACTION:
TRANSACTION 1880306, ACTIVE 1 sec fetching rows
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 1
MySQL thread id 3342, OS thread handle 0x7f209e5e0700, query id 3056466 localhost root Sending data
insert into _test_deadlock_new(id,name) select id,name from test_deadlock where id > 2 and id <5 lock in share mode
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1642 page no 3 n bits 72 index `PRIMARY` of table `test`.`test_deadlock` trx id 1880306 lock mode S locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 0000001cb0f1; asc ;;
2: len 7; hex 120000271805ca; asc ' ;;
3: len 6; hex e99988e79aae; asc ;;

*** (2) TRANSACTION:
TRANSACTION 1880305, ACTIVE 5 sec setting auto-inc lock, thread declared inside InnoDB 5000
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 2
MySQL thread id 3341, OS thread handle 0x7f209f59e700, query id 3056468 localhost root update
replace into test._test_deadlock_new (`id`, `name`) values (NEW.`id`, NEW.`name`)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1642 page no 3 n bits 72 index `PRIMARY` of table `test`.`test_deadlock` trx id 1880305 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 0000001cb0f1; asc ;;
2: len 7; hex 120000271805ca; asc ' ;;
3: len 6; hex e99988e79aae; asc ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `test`.`_test_deadlock_new` trx id 1880305 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (2)
------------
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.

    Those who are interested can study , So execute pt-osc There are also risks on busy servers . For example, we encounter this situation , just insert The record to the new table is also the data updated by the business , Although the probability is very low , But there will also be such a situation . It can also be used through --chunk-size And some other parameters , Try to reduce the probability of this happening .                            


原网站

版权声明
本文为[wx5caecf2ed0645]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131557025886.html