Who’s testing your restores? No, really – who’s testing your restores?

I recently posed the question, “Who’s taking your backups?” This is of utmost importance, because your business’s data is the business. Protecting it and backing it up need to be priorities for DBAs.

Now, for my next question: who is responsible for restoring those backups?

The saying goes that your data is only as good as your last backup, and your backup is only as good as your ability to restore it. It’s the truth! After you back up your database, there could still be problems. You could have corruption in your database. There could have been corruption on the disk your backup was stored on. You could be missing one transaction log file in the sequence.

In the event of an actual disaster, the spotlight will be on the person who needs to restore the data. This person should have experience restoring databases, and be able to remain cool, calm, and collected. Or at least not run into the server room crying.

This is a test. This is only a test.

It’s Me

Excellent! You have an important job to do. Do you have a list of which databases need to be restored? As new databases are introduced to your environment, do you make sure they are added to the list? If your users depend on three applications to do their jobs, and databases for only two of those are backed up, your disaster recovery scenario is not complete.

Do you have a schedule for doing it regularly, whether that is daily, weekly, or monthly? It is not enough to test the restore once. Change is constant. The amount of data stored will change. The data stored will change. The media it is stored on will change. Make sure you are testing regularly so any changes are accounted for.

Are you running a DBCC CHECKDB against the restored database to ensure integrity? You need to know that the integrity of all objects in the database has been preserved. Finding corruption after restoring a database under pressure is not fun.

Are you checking how long it takes to do a full restore, and does that meet your RTO guidelines? If the business expects data to be available in 30 minutes, but the restore takes one hour and fifteen minutes, you’ll need to come up with a plan to meet the objective.

It’s Someone Else

I have the same request as last time – walk over to that person’s desk, or call them. Ask them these questions. This is another process that should regularly be reviewed and tested, to ensure that the process is still optimal for your environment.

Who’s Testing What?

While backups are important because your business data is your business, your disaster recovery plan is only as reliable as your recovery process. If there is corrupt data in your database or it was on bad media, and you can’t restore it, you are in as much trouble as if you had no backup. If you don’t know who is responsible for this task, or it isn’t being done, consider this an opportunity.

You should be familiar with the backup strategy for each database. Is it in Full, Bulk Logged, or Simple recovery model? Are you taking full, differential, and log backups? You’ll need to be familiar with how to restore each type of backup, and in what order to do it.

Here is what you should do: find the latest backups of your database. Test restoring them to a secondary server. Make sure the restore completes. Run a DBCC CHECKDB on the database and make sure it comes back error-free. Now, and only now, you can be sure that your disaster recovery process is effective.

Time the restore process. This way, when asked, “If there’s a disaster, how long will we be out of business?”, you are prepared to answer.

Automate this process. This is not a one-time operation. Just as you need to regularly back up the data, you need to regularly ensure those backups are usable. Have a monthly or quarterly disaster recovery drill in place.

I need to know more about restores!

This is a great topic to learn more about! I recommend Kendra’s article “How to Test Your Backup Strategy: Five Simple Questions” and Brent’s post on “The 9 Letters That Get DBAs Fired”. Another great resource is Grant Fritchey’s article SQL Server Backup and Restore for the Accidental DBA.

Previous Post
Trade Offs: Code Quality
Next Post
Our PASS Summit 2012 Schedule

5 Comments. Leave new

  • Years ago I knew a DBA at a different company who never did test restores or backup verifications. One of their key servers barfed and when they tried to restore from tape, it turned out none of them had completed successfully for the last 8 months. Needless to say, he was shown the door. That little lesson has stuck with me for my entire career.

    Reply
  • Very timely post! My manager is working on a DR plan right now, and this will be helpful! Of course, it will be more work on *my* plate, but I sent it to him anyways…

    There’s a reason to document the heck out of how to do things. That way, you can be relaxing on a sunny beach, and when the office calls because the main site just got stomped on by Godzilla, you can tell the bosses that Joe can handle it and here’s where to find the instructions, then go back to your MaiTai…

    Jason A.

    Reply
  • Many years ago this happened to a fellow DBA. The tapes were at offsite storage and were compromised. So I do it with my current backups but randomly pull tapes and make sure that the backups written to tape will work as well.

    In my current position I have a server that I set up to do this for me. There are scripts that copy the latest backup from all my SQL Server instances; restores several each night (and records how long it took), runs DBCC CHECKDB and writes the output into a table.

    Then I put it into another database where I record other info such as the file locations and when the last backups/transaction dumps were run, etc.

    I get a daily report from SSRS telling me if there were DBCC errors or if backups/logs have not been backed up in the past 3 days.

    Took a while to build it but it is great for helping me find errors or new databases that have been added or even new servers that have been added!

    Reply
  • What’s the best CHECKDB parameters to use when testing Database restores on large Databases. Is it safe to assume PHYSICAL_ONLY and No Index is sufficient?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.