当前位置:网站首页>[server data recovery] a case of RAID data recovery of a brand StorageWorks server

[server data recovery] a case of RAID data recovery of a brand StorageWorks server

2022-07-07 15:00:00 North Asia data recovery

Server data recovery environment :
A brand StorageWorks The server ;
8 block SAS Hard disk composition raid5, A hot spare .

Server failure :
During the operation of the server, two hard disks are offline successively , The server is down ,lun Not working properly . The server administrator contacts our data recovery center for data recovery .
The server data recovery Engineer in our data recovery center performs physical detection and bad trace detection on all disks in the server , No problems were found .

Server data recovery process :
1、 Mirror all hard disks of the failed server , In case of secondary damage to the original data in the process of data recovery .
Part of the data backed up is shown in the figure below :

 

2、 Cause analysis of server failure :
At present, the preliminary understanding is based on RAID Of the group LUN Yes 6 individual , All assigned to HP-Unix Small computers use , Made by the top LVM Logic volume , The important data is Oracle Database and OA Server side . In case of failure, the performance of some disks in the server is unstable , The controller in this type of server will kick out the disk that it considers to be a bad disk RAID Group . And once RAID The dropped disk in the group reaches RAID The limit of the level allowed to drop the disk , So this RAID Will not be available , The server is down .

3、 Analysis server RAID Group structure :
Server's LUN It's all based on RAID Of the group , To recover server data, you need to analyze the underlying RAID Group information , Then reconstruct the original according to the analysis information RAID Group . The server data recovery engineer analyzed all the hard disks and found 4 The data of disk No. is different from that of other disks , I think it is hot Spare disc . Then analyze other data disks , analysis Oracle The distribution of database pages in each disk , And according to the data distribution, we get
RAID The stripe size of the Group , Disk sequence and data trend, etc RAID Important information about the group .

4、 Analysis server RAID Set up the cable tray :
According to the above analysis RAID Information , Developed independently through North Asia RAID The virtual recombiner will be the original RAID Group virtual out . But because of the whole RAID There are two offline disks in the group , Therefore, we need to analyze the order of the two hard disks dropping . Carefully analyze each piece of hard disk data , It is found that the data of a hard disk on the same stripe is obviously different from other hard disks , Therefore, it is preliminarily determined that this hard disk may be the first to be disconnected , Developed independently through North Asia RAID The verification program checks this strip , Finally determine the hard disk that drops the line first .

5、 analysis RAID In group LUN Information :
because LUN Is based on RAID Of the group , So it is necessary to use the above analysis information to RAID The latest status of the group is virtualized , Then analysis LUN stay RAID Allocation in the group , as well as LUN Allocated data block MAP. Because there is 6 individual LUN, So just put each LUN Data block distribution MAP extracted , Then write the corresponding program for this information , For all LUN The data of MAP analytical , Then according to the data MAP Export all LUN The data of .
The exported data is shown in the figure below :

 

6、 The server LVM Logical volumes and VXFS File system repair :
The server data recovery engineer analyzes all generated LUN, Find out all LUN All included in HP-Unix Of LVM Logical volume information . Data recovery engineers try to parse each LUN Medium LVM Information , Found a total of three sets LVM:45G Of LVM There is a LV, Deposit OA Server side data ;190G Of LVM There is a LV, Store temporary backup data ; The remaining 4 individual LUN Form a 2.1T Left and right LVM, Divided into one LV, Deposit Oracle Database files .
The server data recovery engineer writes the explanation LVM The program , Try to put each set LVM Medium LV The volume is explained , But the interpreter was found to be wrong . Carefully analyze the cause of the program error , Development Engineer debug Where the program went wrong , File system engineers are interested in restoring LUN Make a test , testing LVM Whether information will be caused by storage paralysis LMV The information of the logical volume is corrupted . After testing, it was found that it was really caused by storage paralysis LVM Information corruption .
Try to repair the damaged area manually , And synchronously modify the program , Reinterpret LVM Logic volume .
build HP-Unix Environmental Science , Will be explained LV Volume mapping to HP-Unix, And try Mount file system , result Mount File system error . Try to use “fsck –F vxfs” The command to repair vxfs file system , After repair, it still cannot be mounted . Doubt the bottom vxfs Some metadata of the file system may be corrupted , Manual repair is required .
Analyze it carefully LV, And according to VXFS The underlying structure of the file system verifies whether the file system is complete . Analysis found that the bottom VXFS There is something wrong with the file system , The file system is executing while the original storage is paralyzed IO operation , Therefore, some file system meta files are not updated and damaged . Data recovery engineers manually repair these damaged meta files , Guarantee VXFS The file system can parse normally . Once again, the repaired LV Mount the volume to HP-Unix On the little plane , Try Mount file system , There is no error in the file system , Successfully mount .

7、 testing Oracle Database file and start the database :
stay HP-Unix On the machine mount After the file system , Back up all user data to the specified disk space . The size of all user data is 1.2TB about .
Screenshots of some file directories are as follows :

 

Use Oracle The database file detection tool detects whether each database file is complete , No errors found . Use the... Independently developed by North Asia Oracle Database detection tool detects , It is found that some database files and log files are inconsistent , The database data recovery engineer repairs and verifies such files , Until all documents have passed the verification .
Will recover Oracle The database is attached to the original production environment HP-Unix Server , Try to start Oracle database ,Oracle Database started successfully .


 

8、 start-up Oracle database , start-up OA Server side , Install... On your local computer OA client . adopt OA The client verifies the latest data records and historical data records , And arrange personnel from different departments to conduct remote verification . The final data is verified to be correct , Data integrity , Data recovery successful .

原网站

版权声明
本文为[North Asia data recovery]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071253322025.html