Production DBA

How to Test Your Backup Strategy: Five Simple Questions

By Kendra Little · February 7, 2012 · 13 comments

Backup strategies are like rashes: people tend to ignore them for as long as they can, then they ask a specialist if it’s going to be OK.

There’s a lot of good ways to back up databases. The challenge is finding best strategy for your data. To find out if your SQL Server backup strategy measures up, ask yourself the following questions:

1. How Do You Know When Your Backups Aren’t Successful?

People deal with lots of databases, and it’s hard to keep track of every last one. Different strategies are used to manage backups for all these databases — sometimes a service external to the database runs the backup. Do you know when that service fails, or doesn’t start? At other times, individual SQL Server Agent jobs handle backups for each database. If a new backup is added and a backup job isn’t created, will you know it wasn’t run?

Make sure you have an alert system for failures.
Supplement your backups with regular checks of last backup date. Make sure you’re checking the last date of log backups if you’re not using the simple recovery model.

2. How Much Do You Lose if Even Just One Backup File Goes Bad?

Backups take up a lot of space. There are a lot of backups, and people rarely use them. It’s human nature to go a little cheap, and keep them only in one place. That place may have been provisioned using RAID 0. Wherever it is, a backup file might get deleted or corrupted.

You need to look at all the different types of backups you run and consider things like this:

If you’re using just full and transaction log backups, the chain ends at the last good log backup.
If you’re doing full backups once a week and differentials on weeknights, those differentials are no good without the full.

For critical databases, I prefer to keep backups in more than one place— onsite and offsite. Think you can’t afford an offsite location? Here are the rates for Amazon S3.

3. How Fast Do You Need to Restore Your Data?

If you’re a manager, this is your ‘Recovery Time Objective’. If you’re a project manager, you just call this ‘RTO’. If you’re a database administrator, this is the number of minutes until you’re pumped full of adrenaline and sweating heavily when something goes wrong. Whenever I’m considering a backup technology, I want to know how long it will take me to restore my data if everything is gone except the backup files.

This means you need to know:

How long will it take to get to the backup files? If they’re on tape or in a magical Data Domain device, can you access them yourself?
How long will it take to copy those files to a good place to restore them from?
If you don’t have a server to restore to, how long will it take to bring one up and configure it?
How long will the restore take?

If my backup technology can’t meet my Recovery Time Objective, I need to start thinking about high availability options that can mitigate my risk. Once you’ve got high availability in place, you still want to keep a plan to restore from backups and test it periodically, but you’re less likely to need to use it.

4. How Much Data Can You Afford to Lose if Your Transaction Log File is Vaporized?

When you stop and think about it, isn’t ‘Recovery Point Objective’ a strange way of saying how much data it’s OK to lose? When you think about backup strategies, you need to know:

Can you do incremental restores? (And have you tested it?)
Can you restore to a single point in time? (Example time: right before the data was truncated. And have you tested it?)
If your database and transaction log files disappeared and you were left with only the backup files, how much data would be lost?

5. How Much Can Your Backup Impact Performance?

Whenever your backup is running, you’re burning precious resources: CPU, disk IO, memory, and (depending on your configuration), network. In many cases, it’s just fine to use this up during non peak times. However, for VLDBs which are backing up multiple terabytes and for critical OLTP databases which serve customers around the globe, backups need to be run as fast as possible. After all, backups are only one part of maintenance.

When your performance is critical, these are the questions you need to ask:

Are you backing up at the right frequency? If we’re talking about a slow transaction log backup, check how often it runs.
Are you using all the available magic? If you’re using SAN storage, have you explored all the options for backing up your databases from the SAN?
Are you using backup compression? Compression can burn more CPU, but reduce the amount of writes and overall duration of your backup.
Does your backup have enough throughput to the storage device? Whether you’re using iSCSI or Fiber Channel, throughput is often a big part of the solution.
Have you read the SQL CAT Team’s Case Study on backing up VLDB’s? It’s chock full of configuration tips to make your backups blaze through faster.

That was More than Five Questions

It’s true, I cheated. Got your own backup test? Tell us in the comments!

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

13 comments

Robert L Davis

February 7, 2012 at 9:32 am

Another question I like to ask people on this front is will your application work if you have to restore the database to a brand new server? Or in other words, does you plan include the need to restore logins, server permissions, SQL jobs, SSIS packages, server configurations, etc.

Reply
1. Kendra Little
  
  February 7, 2012 at 12:52 pm
  
  Absolutely, great point! That can be a very time consuming process if you’re not prepared for it.
  
  The companies I work with who have this down the best have practices of rebuilding an environment periodically– they’re either running a disaster recovery test regularly, or they’re rebuilding a pre-production environment on a frequent basis. If you can’t do that, it’s best to prepare everything you can, but it’s funny how nobody remembers those critical legacy DTS packages until they’re not there anymore!
  
  Reply
sreekanth B

February 7, 2012 at 11:17 am

Another Gotcha – Do we really have Physical Backup Files in place when you need them? long back,I remember a situation where all our backups Jobs are running without any issues(hence, no failure alerts), But there’s an another job(created by a Jr DBA by mistake) which is cleaning up all the Backups on a nightly basis, even before NETBACKUP is picking up those files!! Very Very Bad…Fortunately, We noticed this when we were testing our restore strategy and fixed it accordingly! Testing, Testing and testing our Restore strategy on regular basis is MUST as Brent/Paul always says!…

Reply
Chris H

February 13, 2012 at 10:37 am

I personally never rely on any automated system letting me know if backups are occurring and successful for a long period of time. My strategy is to have my production dba also verify all backup jobs daily using a checklist. We also restore backups at least weekly to a staging environment which checks consistency. For redundancy, backups are written locally, to a remote server and to tape. The tape backup strategy then moves the DB backups off site into the weekly and monthly tapes. There are plenty of ways to have gaps in any backup strategy and the best way to find an issue is to have your hands and eyes on the process as much as possible.

Reply
vikash prasad

May 16, 2013 at 2:45 am

Hi mam ,
I am completed Bachelor in Computer Application through correspondence.
I want to make my career in business intelligence..Which certification is useful for me.
I don’t have any work experience..

Reply
1. Brent Ozar
  
  May 16, 2013 at 9:36 am
  
  Vikash – we’ve blogged about that here: https://www.brentozar.com/archive/2009/04/certifications-are-the-icing-on-the-cake/
  
  Reply
David C

September 26, 2013 at 7:45 am

Using a Frankencode of tsql and PowerShell, we have implemented a nightly multistage job that moves backup files into place, restores our OLTP database backup from that day, scans our log for a successful restore, and then runs a DBCC CHECKDB for integrity. I’ve built in alerting at each step, so we know what failed and where. We manage our tlogs in a separate log-shipping process and have alerting on those as well, should one fail or become corrupt. I’d like to explore more ‘checks and balances’ that might be useful in the validation process, but this at least gives us confidence that our backups are valid and able to be restored on a 24 hour cycle.

Reply
1. Brent Ozar
  
  September 26, 2013 at 7:51 am
  
  David – great, glad you found something that works for you. Have you thought about open sourcing that?
  
  Reply
Michael Riley

March 20, 2014 at 9:11 am

Your link to Case Study on backing up VLDB’s gets redirected to an index page.

http://sqlcat.com/sqlcat/b/whitepapers/archive/2009/08/13/a-technical-case-study-fast-and-reliable-backup-and-restore-of-a-vldb-over-the-network.aspx

Reply
1. Brent Ozar
  
  March 20, 2014 at 10:16 am
  
  Yeah, sadly, SQLCAT has disappeared and taken their links offline with them. (sigh) Really crappy web administration there.
  
  Reply
  1. David
    
    August 29, 2014 at 3:53 pm
    
    Ah, but someone is doing their backups correctly:
    
    https://web.archive.org/web/20130903001514/http://sqlcat.com/sqlcat/b/whitepapers/archive/2009/08/13/a-technical-case-study-fast-and-reliable-backup-and-restore-of-a-vldb-over-the-network.aspx
    
    And the whitepaper is still available at MS:
    http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/Technical Case Study -Backup VLDB Over Network_Final.docx
    
    Reply
Shepherd Moon

August 31, 2021 at 6:49 pm

Thanks for this reminder. I would also impress upon any DBAs reading this (yes, I learned the hard way) that backup testing is critical because it allows the DBA to (1) confirm that the backup file is actually there and (2) that it can be restored.

Regarding (1), it is easily possible for msdb to report a backup that was made, but that is no guarantee that the backup was not cleaned up or otherwise removed. Routine backup testing can allow DBAs to catch those unfortunate events before you promise to restore a backup that no longer exists.

And of course, same with (2) except a useless backup that exists as a file doesn’t really help you.

Reply
Bryan B

December 18, 2024 at 10:11 pm

Is it still recommended to format your volumes that contain your database and log files with 64K instead of default Windows 4K? I used to be a DBA, but now just a backup guy and I have been arguing with the DBAs for years on this one as I made this change (albeit it took a couple years) to 64K block size on NTFS and the performance increased substantially, just as MSFT said it would. That was a shock, but the math makes perfect sense. I am being told that MSFT no longer cares what the block size is but as a backup/storage dude, maybe they still should? I can’t find any documentation supporting the DBA Team’s argument with 4K NTFS block sizes, yet they complain a lot about performance. I’d love to hear your thoughts.

Reply