When I design a backup & recovery strategy for a database, I don’t talk to the developers, database administrators, or systems administrators first. The first people I go to are the business managers, and I ask three questions:
- How much money would you lose if you lost the data altogether?
- How much money would you lose if you were down for X hours?
- How much time & money can you afford to devote to backups?
Each of these questions drive the strategy, and none of these questions are actually answered by the DBA. In Jeff Atwood’s post about StackOverflow’s backup & recovery strategy this week, there’s been a lot of comments suggesting alternate backup methods. Curiously, though, none of them seem to actually ask any of those three questions. Let’s examine these questions one by one, and then look at StackOverflow‘s choices to see if they make sense.
How much is your data worth?
If you had to reconstruct the data from scratch, using other systems and records, how much would it cost you to do it? And would you actually pursue that objective, or would you consider folding up the business altogether, or maybe living without the data?
If your database holds financial transactions for your customers, with live incoming credit card transactions and debit card withdrawals, it’s obviously very valuable, and you’d be forced to spend a fortune to get back in business. If you’re owned by Microsoft and you lose all Sidekick customer contact data, you pour money and manpower into getting it back. Other companies like Magnolia and Journalspace, on the other hand, have decided to pack their bags and call it a day.
The business has to work with IT to come up with a quick back-of-the-envelope calculation as to what complete data loss would cost, and that’s part of the formula that dictates what we spend on data protection. Sometimes this simple question leads businesses to realize, “That particular data doesn’t really matter – we could rebuild it all from other sources for next to nothing. Maybe we shouldn’t back it up at all.”
How much money does downtime cost?
While this database is down, can you still sell your products? If not, then it’s easy to calculate the cost of downtime – it’s your sales metrics. If you sell an average of $100,000 per hour, then an hour of downtime costs you $100,000.
And furthermore, if you can’t sell products, do your customers hold off on the purchase, or do they switch to another vendor? If Amazon.com’s databases go down, then their customers probably won’t wait around until the site comes back up. They’ll head straight over to another web site and spend money with a competitor. This has a hidden business danger, too – if your customers like that new site better, they might stick with it and bypass you for future orders.
However, if you can still sell products, keep your customers & employees happy, and business moves along unaffected except for a few bumps, then that might guide your backup & recovery strategy too. Or if your company isn’t making all that much per hour, then maybe you don’t want to dedicate a fortune to having your systems highly available.
How much resources can you devote to backups?
Availability costs time and money.
The more available your system needs to be, the more time and money it costs. If you’re a global enterprise with a killer cash flow, then you can make more conservative decisions, back up more databases more often, and not be as concerned with the costs. If you’re a startup with three guys, and all your revenue goes towards paying salaries, then you want to watch those backup costs a little more closely.
In addition, backups cost more than just money. If you need up-to-the-minute recovery with constant transaction log backups, you have to put your database in full recovery mode – which can slow things down. If you want the fastest possible response times, and you’re looking for every millisecond edge against your competitors on each page load, backups are going to hit your radar.
So how does StackOverflow stack up?
Let’s ask the three questions:
- How much is their data worth? Their data consists of questions and answers from the programming community. Sure, they’re the #1 programming site in the world, but even the words of Jon Skeet are only worth so much.
- How much money does downtime cost? This might sound callous to users, but if StackOverflow was down for four hours, the vast majority of users would get over it. They might post a few questions elsewhere, but for the most part, they’d just sit around on Twitter complaining, refreshing their browser while they waited for StackOverflow to come back up. They’re addicted, and they’ll tolerate downtime.
- How much resources can they devote to backups? StackOverflow is a small startup trying to make a living off ad revenue. Their primary target users are extremely tech-savvy people who are fully aware of tools like Firefox and Adblock Plus, making it even more challenging. In an ideal world, they’d have a SAN with snapshot sub-second backup & restore technology – but that costs a lot of money, and it’s not realistic. Frankly, every bit of traffic in and out of their colo servers costs them money, and not an insignificant amount.
With these answers in mind, StackOverflow’s decisions not to do transaction log backups, offsite log shipping, database mirroring, and so on make good business sense. Us geeks in the crowd may not like it, and we might demand the latest and greatest in backup & recovery technology, but at the same time we want StackOverflow to remain free. As their volunteer DBA, I’d love to do 24×7 log shipping or database mirroring to a secondary server at another colo facility – but I wouldn’t be willing to pay out of my own pocket for expenses like that.
To drive the resources part home, take a look at the database server as shown in Jeff’s Stack Overflow Rack Glamour Shots post this week. Count the number of hard drives. That’s six SATA drives shared by the OS, page file, database files, log files, and full text catalogs to serve over one million pageviews per day. Many of you out there use a server like this as your development server, and you complain that it’s slow. Guess what – this is both their production server and development server. They’re achieving some incredible stuff with a very limited hardware budget, and it’s a testimonial to what you can do if you really, really focus on performance.
And while I’ve got you thinking about backups, now’s a great time to check out some of my other backup articles: