当前位置:网站首页>4tb production database cannot be accessed due to disk rejecting i/o to offline device failure
4tb production database cannot be accessed due to disk rejecting i/o to offline device failure
2022-06-22 12:19:00 【weixin_ forty-one million five hundred and sixty-one thousand n】
1、 Project background
An important project uses oracle database , be based on ADG Build one active and one standby , The total amount of data is 4TB about , Normal operation for nearly 5 year . Recently, the master-slave synchronization of the standby database has been delayed due to the severe degradation of disk performance 1.4T Archive log of has not been applied ( Produce... Every day 350G Log , It's delayed 4 God ), At present, all applications are switched to the main database , Spare warehouse for maintenance . But just switched 2 God , Main warehouse on week 5 I hung up at night , So there's this article , It aims to provide some ideas and methods for friends who encounter the same problems , Of course, I hope you won't see this article .
2、 System environment
2 Database servers :system x3950 x6 , altogether 16 Block hard disk , front 4 Block hard disk size is 300GB, after 12 Block size is 4T, With 4 Pieces of hard disk are made for a group raid10.
Operating system version :redhat6.5
Database version :oracle11.2.0.4
OGG edition :ogg12.2.0.1
Database architecture : be based on ADG Build one active and one standby
3、 Fault description
1、 The disk partition is mounted on several directories to execute ls Unable to display information , newspaper input/output error
2、/var/log/messages Relevant error reporting information is as follows
Jan 31 03:47:03 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
Jan 31 03:49:47 test-db-2 kernel: sd 1:2:0:0: rejecting I/O to offline device
3、 The database cannot be accessed , among 45 Data files in sdc2,sdc3 On the partition .
4、 The database is backed up in sdd1 On the partition , Unable to restore in a timely manner based on backup .
4、 Troubleshoot problems
1、 Confirm the reason why the database cannot be opened normally
2、 Confirm whether the hard disk on the database server is alarmed
5、 Problem analysis and solution
1、 disk IO The error resulted in... On the disk partition 45 Data file (s) cannot be used , So the database cannot be opened directly , take 45 Data files offline, After trying to open the database , It is found that a large number of data files reported by the access business do not exist , At this time, the database cannot provide normal external access .
2、 Server hard disk 16 block , With 4 The plates are made of a group raid10, They correspond to each other sda,sdb,sdc,sdd, Now find sdd The corresponding disk lights yellow to give an alarm , Under normal circumstances, the yellow light on a disk should not be affected , There may be a logical error here , You can partition these partitions umount adopt fsck Check if there are any bad blocks ( Unfortunately through fsck These partitions cannot be detected ), Next, consider restarting the server ( Turn it off first 、 Start up )
Reason for restart :
1、 The database cannot normally provide external access
2、 Database backup is inaccessible
3、 Only one block disk is lit yellow , During the restart process, check whether there are other error messages or whether the file system can be repaired automatically
During database server shutdown , The flowers are near 1 It didn't shut down normally for hours , Always deal with the following interface 
Last , The server is forced to shut down by the on key on the server , Then pull out the disk with the yellow light , Restart the server 


After the database server starts normally, it is found that sdc2,sdc3 Mounted and accessed ,sdd Unable to mount normally at present , Try to open the database and put all in offline The database file of becomes online
First of all, will 30 No. data file recover Again online
SQL> alter database datafile 30 online;
alter database datafile 30 online
*
ERROR at line 1:
ORA-01113: file 30 needs media recovery
ORA-01110: data file 30: ‘/test/test2/teststg01.dbf’
SQL> recover datafile 30;
Media recovery complete.
SQL> select file#,status,name from v$datafile;
30 OFFLINE /test/test2/teststg01.dbf
31 RECOVER /test/test2/teststg02.dbf
32 RECOVER /test/test2/teststg03.dbf
33 RECOVER /test/test2/bigdata01.dbf
…
SQL> alter database datafile 30 online;
Database altered.
Batch execution
SQL>recover datafile 31,32,33,34,35,36,37,38,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60, 63;
SQL>alter database datafile 31,32,33,34,35,36,37,38,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,63 online;
All data files are now online 了 , Then perform dump Back up and copy to other servers
6、 Summary and reflection
When all business data is exported , The data before the database failure exists , The next step is to resume business in time , There is still a lot of work to be done , Update later . However, many problems were found through this database failure , Therefore, it is necessary to summarize and think accordingly , Avoid similar situations .
When a problem occurs in the standby database , The following points should be better
1、 Although we reported to the company in time , However, it did not attract enough attention from relevant leaders ( Because the main database or leaders are not here , Where does the cost come from )
2、 You should go to the machine room for patrol inspection immediately , Two sets of equipment are purchased at the same time ( Due to the complicated procedures for entering the computer room 、 The division of labor is omitted or unclear )
3、 The backup on the primary database should be backed up to other servers in a timely manner
4、 Prepare other standby servers , It is better to have the same configuration , If you do not have the same configuration , You can also lower the configuration , Synchronize the most important business data in real time .
边栏推荐
- input输入框只能输入,0-100之间的数组,保留两位小数
- Redis - 11、集群(Cluster)
- Successful cases | an Chaoyun helped the second hospital of Lanzhou University build a new IT infrastructure platform to improve the utilization of medical information resources
- Oracle用游标分解号码次数
- Linux Installation and deployment mysql5.7 (enterprise common Edition) ultra detailed
- 《梦华录》成吸金王:广告主投500万排不上队,腾讯视频赢麻了?
- Authenticated cookies, sessions, JWT
- Solution to 94d problem of Niuke practice match
- TIS教程03-导出
- Struggle, programmer chapter 38 in the old days, when the road turned to the edge of the forest, I suddenly saw
猜你喜欢
随机推荐
TIS教程04-客户端
PyCharm编写shell脚本无法运行
Solution to the 55D problem of Niuke challenge
如何从数据库层面统计每天业务数据的增长量?
Reddit产品主管:为Web3创作者准备的NFT会员实用指南
Yunshang people and IOT technology joined the dragon lizard community to jointly create a software and hardware service ecosystem
Redis - 11. Cluster
得物技术复杂 C 端项目的重构实践
论文解读——Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model
oracle处理插入重复记录的技巧
VS2010中配置cplex12.4教程
Configure the GPU version of pytorch and torchvision, and learn the GPU version of torch step
MAUI使用Masa blazor组件库
TiFlash 函数下推必知必会丨十分钟成为 TiFlash Contributor
Linux Installation and deployment mysql5.7 (enterprise common Edition) ultra detailed
redisTemplate序列化
Oracle用户空间统计
Messari年度报告-2022
求100-200之间全部的素数
Comparison between channel and lock in go question bank · 10








