Hard Disks: Bad Block HowTo

Hardware fails, that is a fact. Nowadays, hard drives are rather reliable, but nevertheless every now and then we will see drives failing or at least having hiccups. Using smartcl/smartd to monitor disks is a good thing, below we will discuss how some lesser issues can be handled without actually having to reboot the system – it is still up to a sys admin’s own discretion to judge circumstances correctly and evaluate whether disk errors encountered are a one time incident or indicative of an entirely failing disk.

Let’s have a look at a typical smartcl -a DEVICE output:

# smartctl -a /dev/sda

197 Current_Pending_Sector  .... 2

OK, so we have an oops here. Time to find out what is going on:

# smartctl –test=short /dev/sda

This will take a very short time, a couple of minutes at most, e.g.:

Please wait 2 minutes for test to complete.
Test will complete after Sat Feb  2 16:25:10 2013

Now, with a current pending sector count > 0 we will most likely have an ouch after the test completes:

Num  ..    Status                  Remaining  ..  LBA_of_first_error
# 2  ..    Completed: read failure 90%        ..  1825221261

LBA counts sectors in units of 512 bytes and starts at 0, so we now need to find out where 1825221261 is actually located:

# fdisk -lu /dev/sda

will display some information about the device in question:

   Device Boot      Start         End      Blocks   Id  System
/dev/sda3        31641600  1953523711   960941056   83  Linux

Obviously, 1825221261 is on /dev/sda3, thus. Now we need to determine the file system block for our LBA in question, so we first have to get the block size:

# tune2fs -l /dev/sda3 | grep Block

Block count:              240235264
Block size:               4096
Blocks per group:         32768

OK, 4096 bytes. So, the actual block number will be:


In our case, this is:

(1825221261 – 31641600) * (512 / 4096) = 224197457.625

We only need the integer part, the fraction just tells us that we are into the 6th sector out of eight that make up this file system block.

It is good practice to find out which inode/file has been affected by using debugfs (operations can take a while with this tool):

# debugfs

debugfs:  open /dev/sda3
debugfs:  icheck BLOCK (224197457 in our case)
Block   Inode number
224197457       56025154
debugfs:  ncheck 56025154
Inode   Pathname
56025154        /some/path/to/file

Now, if this file isn’t anything crucial, then we can start correcting things now:

# dd if=/dev/zero of=/dev/sda3 bs=4096 count=1 seek=BLOCK
  (224197457 here)
# sync

smartctl -a will now show an updated current pending sector count, and you can re-run a short smartctl test.

Source: http://www.vanderzee.org/bad_blocks_howto