当前位置:网站首页>Tidb DM alarm DM_ sync_ process_ exists_ with_ Error troubleshooting
Tidb DM alarm DM_ sync_ process_ exists_ with_ Error troubleshooting
2022-07-02 11:18:00 【On the way of data communication】
One 、 background
dm Synchronization task alarm DM_sync_process_exists_with_error, Automatically recover in a minute , Want to check the reason
Two 、 Observation log error
1.dm journal
[2022/06/28 14:31:13.364 +00:00] [ERROR] [db.go:201] ["execute statements failed after retry"] [task=task-name] [unit="binlog replication"] [queries="[sql]"] [arguments="[[]]"] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: invalid connection"]
2. The upstream mysql journal
2022-06-28T14:31:19.413211Z 28801 [Note] Aborted connection 28801 to db: 'unconnected' user: '***' host: 'ip' (Got an error reading communication packets)
2022-06-28T14:31:22.154980Z 28802 [Note] Aborted connection 28802 to db: 'unconnected' user: '***' host: 'ip' (Got an error reading communication packets)
2022-06-28T14:31:32.158508Z 28804 [Note] Start binlog_dump to master_thread_id(28804) slave_server(429505412), pos(mysql-bin-changelog.103037, 36247149)
2022-06-28T14:31:32.158739Z 28803 [Note] Start binlog_dump to master_thread_id(28803) slave_server(429505202), pos(mysql-bin-changelog.103037, 40373779)
3. The downstream tidb journal
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
[2022/06/28 14:31:12.419 +00:00] [WARN] [client_batch.go:638] ["wait response is cancelled"] [to=dm_worker_ip:20160] [cause="context canceled"]
4. The downstream tikv journal
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2641161, leader may Some(id: 2641164 store_id: 5)\" not_leader { region_id: 2641161 leader { id: 2641164 store_id: 5 } }"]
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2641165, leader may Some(id: 2641167 store_id: 4)\" not_leader { region_id: 2641165 leader { id: 2641167 store_id: 4 } }"]
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2709997, leader may Some(id: 2709999 store_id: 4)\" not_leader { region_id: 2709997 leader { id: 2709999 store_id: 4 } }"]
[2022/06/28 14:31:12.585 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2839445, leader may Some(id: 2839447 store_id: 4)\" not_leader { region_id: 2839445 leader { id: 2839447 store_id: 4 } }"]
[2022/06/28 14:31:20.400 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2957169, leader may Some(id: 2957170 store_id: 1)\" not_leader { region_id: 2957169 leader { id: 2957170 store_id: 1 } }"]
[2022/06/28 14:31:20.400 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2957169, leader may Some(id: 2957170 store_id: 1)\" not_leader { region_id: 2957169 leader { id: 2957170 store_id: 1 } }"]
[2022/06/28 14:31:20.400 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 2957169, leader may Some(id: 2957170 store_id: 1)\" not_leader { region_id: 2957169 leader { id: 2957170 store_id: 1 } }"]
[2022/06/28 14:31:05.617 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Key is locked (will clean up) primary_lock: 748000000F000 lock_version: 434222311815512066 key: 748000009725552F000 lock_ttl: 3003 txn_size: 1"]
[2022/06/28 14:31:05.634 +00:00] [WARN] [endpoint.rs:537] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000092F000 lock_version: 434222311815512092 key: 748000000000 lock_ttl: 3018 txn_size: 5"]
[2022/06/28 14:31:15.389 +00:00] [ERROR] [kv.rs:931] ["KvService response batch commands fail"]
[2022/06/28 14:31:15.432 +00:00] [ERROR] [kv.rs:931] ["KvService response batch commands fail"]
5.pd journal
[2022/06/28 14:30:55.329 +00:00] [INFO] [operator_controller.go:424] ["add operator"] [region-id=2641161] [operator="\"transfer-hot-read-leader {transfer leader: store 1 to 5} (kind:hot-region,leader, region:2641161(25913,5), createAt:2022-06-28 14:30:55.329497692 +0000 UTC m=+8421773.911777457, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, steps:[transfer leader from store 1 to store 5])\""] ["additional info"=]
[2022/06/28 14:30:55.329 +00:00] [INFO] [operator_controller.go:620] ["send schedule command"] [region-id=2641161] [step="transfer leader from store 1 to store 5"] [source=create]
[2022/06/28 14:30:55.342 +00:00] [INFO] [cluster.go:567] ["leader changed"] [region-id=2641161] [from=1] [to=5]
[2022/06/28 14:30:55.342 +00:00] [INFO] [operator_controller.go:537] ["operator finish"] [region-id=2641161] [takes=12.961676ms] [operator="\"transfer-hot-read-leader {transfer leader: store 1 to 5} (kind:hot-region,leader, region:2641161(25913,5), createAt:2022-06-28 14:30:55.329497692 +0000 UTC m=+8421773.911777457, startAt:2022-06-28 14:30:55.329597613 +0000 UTC m=+8421773.911877386, currentStep:1, steps:[transfer leader from store 1 to store 5]) finished\""] ["additional info"=]
6. monitor cluster_tidb --> kv errors
3、 ... and 、 Conclusion
It can be seen that this alarm is caused by dm-worker There are errors invalid connection, This error is due to tidb There is wait response is cancelled, and tidb This kind of problem is caused by tikv There are locks and backoff As a result of , As for why locks and backoff, You can see pd My log is right hot-read-leader Scheduled , This is the production of backoff The key to , and lock The reason is from the business sql To find the
Official documents : Lock conflict description document
边栏推荐
- Special topic of binary tree -- acwing 18 Rebuild the binary tree (construct the binary tree by traversing the front and middle order)
- 接口调试工具概论
- C file and folder operation
- 解决uniapp列表快速滑动页面数据空白问题
- [quick application] win7 system cannot run and debug projects using Huawei ide
- ROS lacks xacro package
- Win11 arm system configuration Net core environment variable
- 金山云——2023届暑期实习
- The difference between SQL left join main table restrictions written after on and where
- Is the Ren domain name valuable? Is it worth investing? What is the application scope of Ren domain name?
猜你喜欢
Luogu p5536 [xr-3] core city (greed + tree DP looking for the center of the tree)
数字化转型挂帅复产复工,线上线下全融合重建商业逻辑
Verilog 和VHDL有符号数和无符号数相关运算
【IDEA】使用插件一键逆向生成代码
Special topic of binary tree -- acwing 47 Path with a certain value in binary tree (preorder traversal)
How to use ide to automatically sign and debug Hongmeng application
PKG package manager usage instance in FreeBSD
Basic usage of MySQL in centos8
Internship report skywalking distributed link tracking?
MTK full dump抓取
随机推荐
在网上开股票账户安全吗?我是新手,还请指导
Verilog 和VHDL有符号数和无符号数相关运算
Indexer in C #
MTK full dump抓取
Is bond fund safe? Does the bond buying foundation lose principal?
二.Stm32f407芯片GPIO编程,寄存器操作,库函数操作和位段操作
[play with FPGA learning 5 in simple terms ----- reset design]
Installation of ROS gazebo related packages
Jenkins安装
ren域名有价值吗?值不值得投资?ren域名的应用范围有哪些?
Creation and use of unified links in Huawei applinking
Luogu p1892 [boi2003] Gang (and search for variant anti set)
PowerBI中导出数据方法汇总
enumrate的start属性的坑
Implementation of six singleton modes
Logu p3398 hamster looks for sugar (double LCA on the tree to judge whether the two paths in the tree intersect)
ctf 记录
Gaode draws lines according to the track
PLC-Recorder快速监控多个PLC位的技巧
Regular and common formulas