当前位置:网站首页>Record MySQL troubleshooting caused by disk sector damage
Record MySQL troubleshooting caused by disk sector damage
2022-07-01 06:23:00 【Great masses】
List of articles
Description of the incident
At night, mobile phone text messages suddenly received a large number of hosts unreachable Alarm of . Immediately log in to the relevant server for troubleshooting. No exceptions are found , The business test is normal .
Almost all hosts have been alarmed unreachable, So the suspicion is zabbix Self abnormality leads to , Then check .
Investigation thought
One 、 see zabbixserver journal , Directly found mysql Connection interruption exception
First, a large number of logs of database connection loss appear in the log
55187:20220629:035108.704 [Z3005] query failed: [2013] Lost connection to MySQL server at 'reading initial communication packet', system error: 104 [begin;]
55235:20220629:035108.704 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='testserver' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
55235:20220629:035108.704 slow query: 11.197553 sec, "select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='testserver' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null"
54930:20220629:035108.704 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid,acknowledgeid from escalations where triggerid is not null and nextcheck<=1656445826 order by actionid,triggerid,itemid,escalationid]
54930:20220629:035108.705 slow query: 45.073163 sec, "select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid,acknowledgeid from escalations where triggerid is not null and nextcheck<=1656445826 order by actionid,triggerid,itemid,escalationid"
55220:20220629:035108.705 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='testserver2' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
54918:20220629:035108.705 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select refresh_unsupported,discovery_groupid,snmptrap_logging,severity_name_0,severity_name_1,severity_name_2,severity_name_3,severity_name_4,severity_name_5,hk_events_mode,hk_events_trigger,hk_events_internal,hk_events_discovery,hk_events_autoreg,hk_services_mode,hk_services,hk_audit_mode,hk_audit,hk_sessions_mode,hk_sessions,hk_history_mode,hk_history_global,hk_history,hk_trends_mode,hk_trends_global,hk_trends,default_inventory_mode from config order by configid]
54922:20220629:035108.705 [Z3005] query failed: [2013] Lost connection to MySQL server during query [delete from history where itemid=44780 and clock<1655830340]
54922:20220629:035108.705 slow query: 101.087109 sec, "delete from history where itemid=44780 and clock<1655830340"
54922:20220629:035108.705 database is down: retrying in 10 seconds
55187:20220629:035108.706 [Z3001] connection to database 'zabbix' failed: [2003] Can't connect to MySQL server on '192.168.2.99' (111)
After that, there will be a large number of mysql Database reconnection log
55174:20220629:041906.024 database connection re-established
55228:20220629:041906.024 database connection re-established
54925:20220629:041906.024 database connection re-established
55195:20220629:041906.024 database connection re-established
55291:20220629:041906.110 database connection re-established
55318:20220629:041906.137 database connection re-established
55248:20220629:041906.317 database connection re-established
55367:20220629:041906.898 database connection re-established
Two 、 Check mysql The discovery log is as follows , Locate the disk problem
2022-06-28T18:24:16.532664Z 23 [Warning] InnoDB: Retry attempts for reading partial data failed.
2022-06-28T18:24:16.532718Z 23 [ERROR] InnoDB: Tried to read 16384 bytes at offset 5098094592, but was only able to read 0
2022-06-28T18:24:16.532751Z 23 [ERROR] InnoDB: Operating system error number 5 in a file operation.
2022-06-28T18:24:16.532769Z 23 [ERROR] InnoDB: Error number 5 means 'Input/output error'
2022-06-28T18:24:16.532784Z 23 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2022-06-28T18:24:16.532796Z 23 [ERROR] InnoDB: File (unknown): 'read' returned OS error 105. Cannot continue operation
2022-06-28T18:24:16.532806Z 23 [ERROR] InnoDB: Cannot continue operation.
2022-06-28T18:24:19.274054Z 0 [Note] InnoDB: FTS optimize thread exiting.
3、 ... and 、 see message journal , Find out mysql There are frequent restarts , The disk has a damaged sector
Jun 29 04:24:29 localhost systemd: mysqld.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jun 29 04:24:29 localhost systemd: Unit mysqld.service entered failed state.
Jun 29 04:24:29 localhost systemd: mysqld.service failed.
Jun 29 04:24:29 localhost systemd: mysqld.service holdoff time over, scheduling restart.
Jun 29 04:24:29 localhost systemd: Cannot add dependency job for unit sshd.socket, ignoring: Unit not found.
Jun 29 04:24:29 localhost systemd: Stopped MySQL Server.
Jun 29 04:24:29 localhost systemd: Starting MySQL Server...
Jun 29 04:24:32 localhost systemd: Started MySQL Server.
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] Sense Key : Medium Error [current]
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] Add. Sense: No additional sense information
Jun 29 04:24:39 localhost kernel: sd 0:0:1:0: [sdb] CDB: Read(10) 28 00 40 aa da 20 00 00 08 00
Jun 29 04:24:39 localhost kernel: blk_update_request: I/O error, dev sdb, sector 1084938784
Four 、 adopt badblocks Do a disk check
stay linux Terminal input command
badblocks -s -v /dev/sdb
A large number of bad blocks can be detected

So far, the root cause of the problem has been determined !
Processing results
Try to move the data out of the replacement disk , But some data can no longer be read . Finally, the latest snapshot Restore to new storage , But the price is lost history data . Fortunately, it is not a business production environment .
Test the replaced disk , Attempts to perform a logical repair failed , It is judged as physical damage .
learn from one's mistakes , Remember to keep copies of important business data .
Reference resources
blog.51cto.com/u_13236892/5278888
边栏推荐
- webapck打包原理--启动过程分析
- 相同区域 多源栅格数据 各个像元行列号一致,即行数列数相同,像元大小相同
- 让田头村变甜头村的特色农产品是仙景芋还是白菜
- 10 golang operator
- 图片服务器项目测试
- 数据库产生死锁了请问一下有没有解决办法
- SQL中DML语句(数据操作语言)
- The row and column numbers of each pixel of multi-source grid data in the same area are the same, that is, the number of rows and columns are the same, and the pixel size is the same
- Transformer le village de tiantou en un village de betteraves sucrières
- HCM Beginner (IV) - time
猜你喜欢

Top 10 Free 3D modeling software for beginners in 2022

【#Unity Shader#自定义材质面板_第二篇】
![[file system] how to run squashfs on UBI](/img/d7/a4769420c510c47f3c2a615b514a8e.png)
[file system] how to run squashfs on UBI

JMM详解
![阿里OSS Postman Invalid according to Policy: Policy Condition failed: [“starts-with“, “$key“, “test/“]](/img/3c/7684b7c594f7871471f89007294703.png)
阿里OSS Postman Invalid according to Policy: Policy Condition failed: [“starts-with“, “$key“, “test/“]
![[summary of knowledge points] chi square distribution, t distribution, F distribution](/img/a6/bb5cabbfffb0edc9449c4c251354ae.png)
[summary of knowledge points] chi square distribution, t distribution, F distribution

HCM Beginner (II) - information type

Forkjoin and stream flow test

分布式锁实现

68 cesium code datasource loading czml
随机推荐
Servlet
数据库产生死锁了请问一下有没有解决办法
【网络安全工具】USB控制软件有什么用
[postgraduate entrance examination advanced mathematics Wu Zhongxiang +880 version for personal use] advanced mathematics Chapter II Basic Stage mind map
虚幻 简单的屏幕雨滴后处理效果
阶乘约数(唯一分解定理)
kotlin位运算的坑(bytes[i] and 0xff 报错)
Redis安装到Windows系统上的详细步骤
B-树系列
自开发软件NoiseCreater1.1版本免费试用
JMM详解
Tidb single machine simulation deployment production environment cluster (closed pit practice, personal test is effective)
json模块
【Unity Shader 消融效果_案例分享】
Kubedm builds kubenetes cluster (Personal Learning version)
What are the functions of LAN monitoring software
Distributed lock implementation
[automatic operation and maintenance] what is the use of the automatic operation and maintenance platform
SOE spatial analysis server MySQL and PostGIS geospatial database of Postgres anti injection attack
码力十足学量化|如何在财务报告寻找合适的财务公告