It's 3am...

(You can suggest changes to this post.)

It's 3am...
And I've learnt a new scenario to add to the diaster recovery plan. What if it rains? What if it rains a lot?

We just discovered that when we have huge amounts of rain, combined with big fat hailstones, that something goes wrong in the roof adjoining the office and water gets in. Quite a lot of water. This could just be because the top of the gutters is flush with the bottom of the roof, and so if they overflow, water can get into the roof cavity. This area happens to be about the only area that doesn't have any eaves, so rather than the water running away or down the exterior walls, it gets onto the ceiling -- which in this area is lower than the rest of the house as well.

Other possibilities include broken tiles, bad valleys, or the solar hot-water service somehow making it easier to flood.

In any case, at 1am we performed an emergency removal of all gear from the area, which was mainly used for storage. Some books got damp, some old ISA network cards were sodden, and a couple of pentium-90s that should have been tossed years ago were drenched. Otherwise, no significant damage that we're aware of.

However, water in the roof cavity had another danger. Electrical problems. Something, somewhere, tripped the safety switch (earth leakage detector), disabling all appliances.

Our server dutifully switched to UPS, and then ate the battery and died while we were trying to locate the fault. That's very bad -- it's supposed to prepare itself for an emergency shutdown, and proceed with it if power is not restored in a timely fashion, or if the battery goes low.

We discovered that isolating the front-half of the property allowed us to restore power to the office. We also discovered that a light fitting in the watery area was shorting, producing a distinctive smell of ozone and a very disturbing noise.

Cut the lights at the circuit-breaker, tested the light to see if it was hot (temperature or current), tested a few times in different places (using the back of one's hand), and proceeded to remove the light. We're going to wait until everything dries out before we try to turn it back on.

Powered back on our server, it grumbled but started. One of the drives complained hugely during RAID re-sync. I dropped a job into RT that it should be removed from the array and the fresh reserve drive (already sitting in the machine) added in.

Sure enough, 70% into the reconstruction, the flakey drive reported a bad sector and was dropped from the array. I checked our backups were good, and initiated reconstruction with the fresh drive installed for this purpose.

Throughout this, there was much cursing of IDE drive manufacturers who produce such poor quality drives. It's a real shame when a '3 year warranty' means it can be almost guaranteed I'll be returning the drive to them after 18 months.

I really should be moving to SCSI. Sure, they're much more expensive, but the reliability is worth it. I've got much better things to do with my time than monkey around with poor hardware.

Drive reconstruction ends in about an hour. Hopefully we won't see any more issues.

Bitcoin QR code This site is ad-free, and all text, style, and code may be re-used under a Creative Commons Attribution 3.0 license. If like what I do, please consider supporting me on Patreon, or donating via Bitcoin (1P9iGHMiQwRrnZuA6USp5PNSuJrEcH411f).

comments powered by Disqus