当前位置:网站首页>Tidb DM alarm DM_ sync_ process_ exists_ with_ Error troubleshooting
Tidb DM alarm DM_ sync_ process_ exists_ with_ Error troubleshooting
2022-07-02 11:18:00 【On the way of data communication】
One 、 background
dm Synchronization task alarm DM_sync_process_exists_with_error, Automatically recover in a minute , Want to check the reason
Two 、 Observation log error
1.dm journal
[2022/06/28 14:31:13.364 +00:00] [ERROR] [db.go:201] ["execute statements failed after retry"] [task=task-name] [unit="binlog replication"] [queries="[sql]"] [arguments="[[]]"] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: invalid connection"]
2. The upstream mysql journal
2022-06-28T14:31:19.413211Z 28801 [Note] Aborted connection 28801 to db: 'unconnected' user: '***' host: 'ip' (Got an error reading communication packets)
2022-06-28T14:31:22.154980Z 28802 [Note] Aborted connection 28802 to db: 'unconnected' user: '***' host: 'ip' (Got an error reading communication packets)
2022-06-28T14:31:32.158508Z 28804 [Note] Start binlog_dump to master_thread_id(28804) slave_server(429505412), pos(mysql-bin-changelog.103037, 36247149)
2022-06-28T14:31:32.158739Z 28803 [Note] Start binlog_dump to master_thread_id(28803) slave_server(429505202), pos(mysql-bin-changelog.103037, 40373779)
3. The downstream tidb journal
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
4. The downstream tikv journal
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2641161, leader may Some(id: 2641164 store_id: 5)\" not_leader { region_id: 2641161 leader { id: 2641164 store_id: 5 } }"]
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2641165, leader may Some(id: 2641167 store_id: 4)\" not_leader { region_id: 2641165 leader { id: 2641167 store_id: 4 } }"]
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2709997, leader may Some(id: 2709999 store_id: 4)\" not_leader { region_id: 2709997 leader { id: 2709999 store_id: 4 } }"]
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2839445, leader may Some(id: 2839447 store_id: 4)\" not_leader { region_id: 2839445 leader { id: 2839447 store_id: 4 } }"]
[2022/06/28 14:31:20.400 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2957169, leader may Some(id: 2957170 store_id: 1)\" not_leader { region_id: 2957169 leader { id: 2957170 store_id: 1 } }"]
[2022/06/28 14:31:20.400 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2957169, leader may Some(id: 2957170 store_id: 1)\" not_leader { region_id: 2957169 leader { id: 2957170 store_id: 1 } }"]
[2022/06/28 14:31:20.400 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2957169, leader may Some(id: 2957170 store_id: 1)\" not_leader { region_id: 2957169 leader { id: 2957170 store_id: 1 } }"]
[2022/06/28 14:31:05.617 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Key is locked (will clean up) primary_lock: 748000000F000 lock_version: 434222311815512066 key: 748000009725552F000 lock_ttl: 3003 txn_size: 1"]
[2022/06/28 14:31:05.634 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000092F000 lock_version: 434222311815512092 key: 748000000000 lock_ttl: 3018 txn_size: 5"]
[2022/06/28 14:31:15.389 +00:00] [ERROR] [kv.rs:931] ["KvService response batch commands fail"]
[2022/06/28 14:31:15.432 +00:00] [ERROR] [kv.rs:931] ["KvService response batch commands fail"]
5.pd journal
[2022/06/28 14:30:55.329 +00:00] [INFO] [operator_controller.go:424] ["add operator"] [region-id=2641161] [operator="\"transfer-hot-read-leader {transfer leader: store 1 to 5} (kind:hot-region,leader, region:2641161(25913,5), createAt:2022-06-28 14:30:55.329497692 +0000 UTC m=+8421773.911777457, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 1 to store 5])\""] ["additional info"=]
[2022/06/28 14:30:55.329 +00:00] [INFO] [operator_controller.go:620] ["send schedule command"] [region-id=2641161] [step="transfer leader from store 1 to store 5"] [source=create]
[2022/06/28 14:30:55.342 +00:00] [INFO] [cluster.go:567] ["leader changed"] [region-id=2641161] [from=1] [to=5]
[2022/06/28 14:30:55.342 +00:00] [INFO] [operator_controller.go:537] ["operator finish"] [region-id=2641161] [takes=12.961676ms] [operator="\"transfer-hot-read-leader {transfer leader: store 1 to 5} (kind:hot-region,leader, region:2641161(25913,5), createAt:2022-06-28 14:30:55.329497692 +0000 UTC m=+8421773.911777457, startAt:2022-06-28 14:30:55.329597613 +0000 UTC m=+8421773.911877386, currentStep:1, steps:[transfer leader from store 1 to store 5]) finished\""] ["additional info"=]
6. monitor cluster_tidb --> kv errors

3、 ... and 、 Conclusion
It can be seen that this alarm is caused by dm-worker There are errors invalid connection, This error is due to tidb There is wait response is cancelled, and tidb This kind of problem is caused by tikv There are locks and backoff As a result of , As for why locks and backoff, You can see pd My log is right hot-read-leader Scheduled , This is the production of backoff The key to , and lock The reason is from the business sql To find the
Official documents : Lock conflict description document
边栏推荐
- Implementation of six singleton modes
- 数字化转型挂帅复产复工,线上线下全融合重建商业逻辑
- C#多维数组的属性获取方法及操作注意
- TIPC协议
- 三.芯片啟動和時鐘系統
- ros缺少xacro的包
- Special topic of binary tree -- Logu p1229 traversal problem (the number of traversals in the middle order is calculated when the pre and post order traversals of the multiplication principle are know
- [play with FPGA learning 2 in simple terms ----- design skills (basic grammar)]
- Win11 arm system configuration Net core environment variable
- Mongodb learning and sorting (condition operator, $type operator, limit() method, skip() method and sort() method)
猜你喜欢

【深入浅出玩转FPGA学习2----设计技巧(基本语法)】

enumrate的start属性的坑

Webauthn - official development document

Flink two Open, implement Batch Lookup join (attached source)

II Stm32f407 chip GPIO programming, register operation, library function operation and bit segment operation

Mongodb learning and sorting (condition operator, $type operator, limit() method, skip() method and sort() method)

Special topic of binary tree -- Logu p1229 traversal problem (the number of traversals in the middle order is calculated when the pre and post order traversals of the multiplication principle are know

Special topic of binary tree -- acwing 1589 Building binary search tree

tqdm的多行显示与单行显示

V2x SIM dataset (Shanghai Jiaotong University & New York University)
随机推荐
ROS lacks catkin_ pkg
[play with FPGA learning 5 in simple terms ----- reset design]
I STM32 development environment, keil5/mdk5.14 installation tutorial (with download link)
[quick application] there are many words in the text component. How to solve the problem that the div style next to it will be stretched
QT learning diary 7 - qmainwindow
PKG package manager usage instance in FreeBSD
[quick application] win7 system cannot run and debug projects using Huawei ide
解决uniapp列表快速滑动页面数据空白问题
Astparser parsing class files with enum enumeration methods
II Stm32f407 chip GPIO programming, register operation, library function operation and bit segment operation
Iii. Système de démarrage et d'horloge à puce
[cloud native] 2.5 kubernetes core practice (Part 2)
tidb-dm报警DM_sync_process_exists_with_error排查
liftOver进行基因组坐标转换
Special topic of binary tree -- acwing 3384 Binary tree traversal (known preorder traversal, while building a tree, while outputting middle order traversal)
spritejs
[ark UI] implementation of the startup page of harmonios ETS
八大排序汇总
Functional interfaces and method references
Leetcode 182 Find duplicate email (2022.07.01)