Insulating yourself from disaster

Late last week, a friend of mine’s website exploded.

More accurately, his webhost’s electrical room exploded and caught fire and took down NINE THOUSAND servers with it.

Because my friend knows these things happen, he was well prepared for it.

He had backups in place. He had an alternate server in place and in more or less the time it took to switch a “Go here” sign from server A (the exploding place) to server B, his site was back up for the most part.

Meanwhile, all 9000 servers are still not back in service after 4 days.

Some of this is their fault, and some is not.

The fire department wouldn’t let them run their backup generators, which is ultimately what forced 9000 servers (and no doubt hundreds of thousands of web sites) offline, but physical damage to the facility would have made that a moot point as they later found.

When the fire department had finished their work, the generators were started, but failed. More delays as a new generator had to be brought in and installed.

Now the generator is up but each of 9000 servers has to be started manually and then checked to make sure it is working. And some of those are going to be problematic because of destroyed cabling on a lower floor.

Imagine if you had your e-commerce online store on those servers. Maybe you’re Brad Fallon and your $750k a month wedding favors online store has been down for a week. A week to Brad is almost $200 grand, and remember, he has a pile of employees and warehouse space to pay for. Can you afford that kind of hit?

Redundancy is an expense AND an investment

The real issue here is a lack of redundancy on the part of The Planet. My understanding is that they have 5 different data centers, yet it appears that they are simply 5 standalone data centers that do not replicate each other.

Ironically, their blog talked about redundancy and recovery from catastrophic events just a few short (pardon the pun) weeks ago. Ironic?

While the expense of redundancy at that level (estimates are that they have 50,000 servers) is substantial, exactly how much do you think it will cost them because they didn’t have that redundancy in place?

It should be noted that they do offer a backup server service, where you get the main server and a backup server (presumably in another location). Wonder how many of those will get sold this month?

So how many customers will this cost them? 9000? 900?

I doubt we’ll ever know, but let’s put on the speculation hat and take a look at the math.

If it cost them 10% of their customer base at that location (900 lowest price dedicated server accounts), at the lowest service level, that’s a loss of $80,100 PER MONTH.

What about future customers who will find out about the explosion and the lack of redundancy and decide to go elsewhere? Pretty hard to measure that. After only 4 days (probably far sooner), there are Google AdWords ads on the net for the keywords “Houston webhost” (and probably others) that suggest “moving off of The Planet”. Their competition isn’t going to let this go away.

And what does this have to do with you?

Are you as well prepared as my friend was?

  • What happens if your web host provider explodes tonight?
  • Do you have backups?
  • Do you have a plan in place – or at least the knowledge – to move your site elsewhere and restore backups to that location in a timely manner?
  • Do you know what kind of redundancy your web host offers? Some have duplicate systems in other locations, some do not. Some do backups for you. Some do not. All serious web hosts have power protection in the form of battery backup systems and generators.

What doesn’t matter to you: The Planet’s problems. There are plenty of other good web hosts out there if you need one.

What does matter to you: How does an event like this affect your business when those problems occur? How many customers do you lose because your systems are unavailable? How many new customers give up after you’ve spent X dollars to get them to your site in a mindset that is ready to buy?

With that in mind, spend some time thinking about what makes sense regarding an investment in redundancy. Then take action.

What do I do for my sites and the ones I manage?

Weekly – I have automated server-level backups taken. They are downloaded to a server here in my office in Montana (the web server is in Michigan). They are then copied to a high end RAID-5 network addressable storage (NAS) drive in my office. All of this happens automatically.

Daily – SQL databases on my web sites (and my clients’ sites) are backed up to a different web server, and in the wee hours, they are downloaded to a server here in my Montana office. They are then copied to the NAS drive mentioned earlier. All of this happens automatically. In addition, all web programming and images here on my main development machines is copied to the NAS drive on a nightly basis. Automatically. Finally, those same items are copied to a laptop (yes, nightly and automatically) so that if my office and the server in Michigan decide to explode, I have the laptop as a last resort.

What’s next? I am in the process of establishing a second web hosting center account so that these automated backups can be pushed to that server so that I can quickly switch to the new site in the event of a disaster, without having hours of restore-from-backups time. Even though the backups are automated and kept current, the time to restore them can be critical for some accounts.

These are the kinds of processes and situations you need to be thinking about.

Don’t depend on someone else to protect your business assets. Let them, but make sure you are covered by things within your control.

These events would likely bury a lesser-prepared competitor. Handled properly, they will make you shine even brighter as the expert in your market – even if your market isn’t tech-related.

Related info: Lessons learned: Think like a fire marshal.