I’ve had so many hilarious things going on lately, I don’t even know where to begin.
Last weekend, IBM Director started emailing us that a couple of servers were overheating. In a matter of minutes, the heat problems were spreading across different racks, and the IT crew all started driving into the office simultaneously. Our two redundant AC units for the datacenter both failed within minutes of each other.
We shut down non-essential servers as fast as we could, but the room temps were still climbing over 120. Infrared thermometers were showing the rack equipment over 150. We ended up shutting down everything but our really, really mission-critical stuff, and we were thinking about failing that over to our alternate datacenter. Hours later, the ACs came back online.
No lost data. Hooah! Gotta love happy endings. Still, the whole event was a heck of a hard way to learn some ugly lessons, and I’m looking forward to the lessons-learned meeting on Monday.
Mostly, I’m just relieved that we didn’t have to disassemble server racks, carry huge servers out into the office area, plug them in, wire them up, and run a distributed datacenter out of a bunch of cubicles. That would have been so embarrassing. I can live with long hours in the datacenter trying to rebuild machines with fried drives, I can deal with failing things over to other datacenters, but I don’t like users seeing the datacenter in disarray.