Christmas Carol (an almost true story)
On the Nth day of Christmas, my server gave to me:
a bad block on hda3.
2 toasted drives;
3 boxes down;
4 volume backups;
RAID-5 failure;
6 friends a-leaving;
7 megs of spam;
8 failed recovers;
9 hours restoring;
10 calls from jarich;
11 clients off-line;
12 hours downtime;
Notes
Yup, on the morning of Boxing Day, two of the drives in my RAID-5 developed bad-blocks within two hours of each other, resulting in complete failure of my main storage device. After some failed attempts at resurrecting the array, I was forced to re-partition away the bad areas, reconstruct the RAID, re-format, and restore from tape. Luckily my last backup was made a mere four hours before the failure, so the entire process occured with no data loss. The drive that holds mail was unaffected, so I didn't lose any of the tremendous amount of spam that I seem to receive each day.
I'm investigating return of the limping drives to the manufacturer for replacement.
Remember:
- Make backups frequently.
- Test your backups frequently.
- Keep off-site backups.
- Murphy's law applies to RAIDs, too.