Software Raid: readding disks after replacement

As we have recently seen it on a client’s server we manage, hosted in another ISP’s DC: Below please see a quick typical way to get a broken (degraded) software raid array back to healthy and clean:

First, check the raid status as such:

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [raid1] [raid10] [raid0]
md0 : active raid1 sdb1[1] sda1[0]
4198976 blocks [3/2] [UU_]

md1 : active raid1 sdb2[1] sda2[0]
2104448 blocks [3/2] [UU_]

md2 : active raid5 sdc3[2] sdb3[1] sda3[0]
2917660800 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

This shows that md0 and md1 have issues.

Go into details for each problem device to get some additional information:

# /sbin/mdadm –detail /dev/md0

/dev/md0:
Version : 0.90
Creation Time : Tue Jun 8 07:47:09 2010
Raid Level : raid1
Array Size : 4198976 (4.00 GiB 4.30 GB)
Used Dev Size : 4198976 (4.00 GiB 4.30 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Aug 21 06:24:07 2013
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : ...
Events : 0.3000

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 0 0 2 removed

Removed devices no longer need to be removed with mdadm (this could be done with mdadm –manage /dev/mdX –fail /dev/sdYY). Repeat for /dev/md1 or any other failed / degraded raid devices.

Replace the failed drive (hot swap, recable, etc. – may have to reboot the machine for the OS to recognise the new disk even in a hot swap scenario).

After replacing the bad disk, clone the partition table to the new disk:

# /sbin/sfdisk -d /dev/sda | sfdisk /dev/sdY

Then, re-add the new devices to the array:

# /sbin/mdadm –manage /dev/mdX –add /dev/sdYY

Repeat for all failed devices – the software raid system will automatically schedule the respective resyncs.

Please also see: http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array