当前位置:网站首页>Record MySQL troubleshooting caused by disk sector damage
Record MySQL troubleshooting caused by disk sector damage
2022-07-01 06:23:00 【Great masses】
List of articles
Description of the incident
At night, mobile phone text messages suddenly received a large number of hosts unreachable Alarm of . Immediately log in to the relevant server for troubleshooting. No exceptions are found , The business test is normal .
Almost all hosts have been alarmed unreachable, So the suspicion is zabbix Self abnormality leads to , Then check .
Investigation thought
One 、 see zabbixserver journal , Directly found mysql Connection interruption exception
First, a large number of logs of database connection loss appear in the log
55187:20220629:035108.704 [Z3005] query failed: [2013] Lost connection to MySQL server at 'reading initial communication packet', system error: 104 [begin;]
55235:20220629:035108.704 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='testserver' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
55235:20220629:035108.704 slow query: 11.197553 sec, "select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='testserver' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null"
54930:20220629:035108.704 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid,acknowledgeid from escalations where triggerid is not null and nextcheck<=1656445826 order by actionid,triggerid,itemid,escalationid]
54930:20220629:035108.705 slow query: 45.073163 sec, "select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid,acknowledgeid from escalations where triggerid is not null and nextcheck<=1656445826 order by actionid,triggerid,itemid,escalationid"
55220:20220629:035108.705 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='testserver2' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
54918:20220629:035108.705 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select refresh_unsupported,discovery_groupid,snmptrap_logging,severity_name_0,severity_name_1,severity_name_2,severity_name_3,severity_name_4,severity_name_5,hk_events_mode,hk_events_trigger,hk_events_internal,hk_events_discovery,hk_events_autoreg,hk_services_mode,hk_services,hk_audit_mode,hk_audit,hk_sessions_mode,hk_sessions,hk_history_mode,hk_history_global,hk_history,hk_trends_mode,hk_trends_global,hk_trends,default_inventory_mode from config order by configid]
54922:20220629:035108.705 [Z3005] query failed: [2013] Lost connection to MySQL server during query [delete from history where itemid=44780 and clock<1655830340]
54922:20220629:035108.705 slow query: 101.087109 sec, "delete from history where itemid=44780 and clock<1655830340"
54922:20220629:035108.705 database is down: retrying in 10 seconds
55187:20220629:035108.706 [Z3001] connection to database 'zabbix' failed: [2003] Can't connect to MySQL server on '192.168.2.99' (111)
After that, there will be a large number of mysql Database reconnection log
55174:20220629:041906.024 database connection re-established
55228:20220629:041906.024 database connection re-established
54925:20220629:041906.024 database connection re-established
55195:20220629:041906.024 database connection re-established
55291:20220629:041906.110 database connection re-established
55318:20220629:041906.137 database connection re-established
55248:20220629:041906.317 database connection re-established
55367:20220629:041906.898 database connection re-established
Two 、 Check mysql The discovery log is as follows , Locate the disk problem
2022-06-28T18:24:16.532664Z 23 [Warning] InnoDB: Retry attempts for reading partial data failed.
2022-06-28T18:24:16.532718Z 23 [ERROR] InnoDB: Tried to read 16384 bytes at offset 5098094592, but was only able to read 0
2022-06-28T18:24:16.532751Z 23 [ERROR] InnoDB: Operating system error number 5 in a file operation.
2022-06-28T18:24:16.532769Z 23 [ERROR] InnoDB: Error number 5 means 'Input/output error'
2022-06-28T18:24:16.532784Z 23 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2022-06-28T18:24:16.532796Z 23 [ERROR] InnoDB: File (unknown): 'read' returned OS error 105. Cannot continue operation
2022-06-28T18:24:16.532806Z 23 [ERROR] InnoDB: Cannot continue operation.
2022-06-28T18:24:19.274054Z 0 [Note] InnoDB: FTS optimize thread exiting.
3、 ... and 、 see message journal , Find out mysql There are frequent restarts , The disk has a damaged sector
Jun 29 04:24:29 localhost systemd: mysqld.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jun 29 04:24:29 localhost systemd: Unit mysqld.service entered failed state.
Jun 29 04:24:29 localhost systemd: mysqld.service failed.
Jun 29 04:24:29 localhost systemd: mysqld.service holdoff time over, scheduling restart.
Jun 29 04:24:29 localhost systemd: Cannot add dependency job for unit sshd.socket, ignoring: Unit not found.
Jun 29 04:24:29 localhost systemd: Stopped MySQL Server.
Jun 29 04:24:29 localhost systemd: Starting MySQL Server...
Jun 29 04:24:32 localhost systemd: Started MySQL Server.
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] Sense Key : Medium Error [current]
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] Add. Sense: No additional sense information
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] CDB: Read(10) 28 00 40 aa da 20 00 00 08 00
Jun 29 04:24:39 localhost kernel: blk_update_request: I/O error, dev sdb, sector 1084938784
Four 、 adopt badblocks Do a disk check
stay linux Terminal input command
badblocks -s -v /dev/sdb
A large number of bad blocks can be detected

So far, the root cause of the problem has been determined !
Processing results
Try to move the data out of the replacement disk , But some data can no longer be read . Finally, the latest snapshot Restore to new storage , But the price is lost history data . Fortunately, it is not a business production environment .
Test the replaced disk , Attempts to perform a logical repair failed , It is judged as physical damage .
learn from one's mistakes , Remember to keep copies of important business data .
Reference resources
blog.51cto.com/u_13236892/5278888
边栏推荐
- C语言课设学生信息管理系统(大作业)
- 手把手教你实现一个深度学习框架...
- 【ManageEngine】如何实现网络自动化运维
- 【LeetCode】Day91-存在重复元素
- C语言课设学生考勤系统(大作业)
- MongoDB:一、MongoDB是什么?MongoDB的优缺点
- 做技术,自信不可或缺
- Tidb single machine simulation deployment production environment cluster (closed pit practice, personal test is effective)
- SystemVerilog learning-06-class encapsulation
- JMM details
猜你喜欢

【企业数据安全】升级备份策略 保障企业数据安全
![[postgraduate entrance examination advanced mathematics Wu Zhongxiang +880 version for personal use] advanced mathematics Chapter II Basic Stage mind map](/img/c0/299a406efea51f24b1701b66adc1e3.png)
[postgraduate entrance examination advanced mathematics Wu Zhongxiang +880 version for personal use] advanced mathematics Chapter II Basic Stage mind map

【自动化运维】自动化运维平台有什么用

C# ManualResetEvent 类的理解

让田头村变甜头村的特色农产品是仙景芋还是白菜

Uniapp tree level selector

MongoDB:一、MongoDB是什么?MongoDB的优缺点
![kotlin位运算的坑(bytes[i] and 0xff 报错)](/img/2c/de0608c29d8af558f6f8dab4eb7fd8.png)
kotlin位运算的坑(bytes[i] and 0xff 报错)

FPGA - clocking -02- clock wiring resources of internal structure of 7 Series FPGA

【#Unity Shader#自定义材质面板_第二篇】
随机推荐
Freeswitch dial the extension number
Factorial divisor (unique decomposition theorem)
【企业数据安全】升级备份策略 保障企业数据安全
sql中TCL语句(事务控制语句)
ArcServer密码重置(账号不可以重置)
UOW of dev XPO comparison
[postgraduate entrance examination advanced mathematics Wu Zhongxiang +880 version for personal use] advanced mathematics Chapter II Basic Stage mind map
[file system] how to run squashfs on UBI
C语言课设工资管理系统(大作业)
Uniapp tree level selector
Minio error correction code, construction and startup of distributed Minio cluster
Dongle data collection
[enterprise data security] upgrade backup strategy to ensure enterprise data security
Top 10 Free 3D modeling software for beginners in 2022
Self confidence is indispensable for technology
HDU - 1501 Zipper(记忆化深搜)
XAF Bo of dev XPO comparison
69 Cesium代码datasource加载geojson
Teach you how to implement a deep learning framework
端口扫描工具对企业有什么帮助?