Managing data is about managing risk, but no matter how we good we are at managing risks, they’re still risks.
We’ve seen several high-profile data failures recently:
- Sony’s database of Playstation users and financial data was hacked
- Amazon’s supposedly independent cloud services went down together, and some data was lost permanently
- The state of Texas left unencrypted personal data on 3.5 million Texans lying around on the web for a year
- Sony’s database of credit card data was hacked (yep, a different one)
- Lady Gaga’s Twitter account was hacked
Ouch. It’s time we start a series of DBA Nightmares to cover basic preparations that should be a part of every DBA’s career planning. Why career planning? Because if one of these happens to you and you’re not prepared, it’s a URLT moment – Update Resume, Leave Town. If, on the other hand, you’re well-prepared and react smoothly, this could be your moment to shine.
Today’s Nightmare: From-Scratch Server Restore
Let’s be honest: most of us have never rebuilt a product server from scratch under duress. Many of us bury our heads in the sand, hoping production will just keep on keepin’ on. We don’t test our backups, and even if we do, we don’t go to the extreme of attempting a complete from-scratch reinstall. When the system is down and the CIO’s standing behind us, tapping us on the shoulder, we learn some ugly lessons.
Right away, you need to choose one of two recovery plans: will you try to restore everything exactly as it was (including the system databases), or will you build a new server from scratch and just restore the user databases? Ideally, you’ve designed your recovery plan ahead of time, but in a nightmare scenario, you’re standing in the datacenter with empty pockets and no game plan.
If you decide to restore the system databases, you should try this ahead of time. Restoring the master database is different than typical user databases because you can’t use SQL Server Management Studio. You have to set the SQL Server to run in single-user mode, then use SQLCMD to restore the master database, then remove the -m parameter that you added to start SQL Server, and start it back up again. If you’re using a third-party product to do your database backups, it’ll require separate instructions like Tom LaRock’s instructions on restoring master with Quest LiteSpeed. After restoring master, you’ll need to restore the msdb database, but fortunately that one can be done through the SSMS GUI as long as the SQL Server Agent is shut down.
If you don’t restore the system databases, you may be able to get your server up and running faster – at the cost of some configuration data. For example, logins, Agent jobs, and linked servers are stored in the system databases. On a small development server with a handful of logins and only maintenance jobs, it might be easier to install a fresh instance of SQL Server on a newly installed server, then just restore the user databases. (This is one of the reasons I try to avoid excessive custom logins or Agent jobs where possible on development servers.) Knowing your recovery process and risk will help you design your SQL Server security and Agent job configuration better.
If you decide ahead of time that your recovery plan involves a fresh OS and SQL Server, there’s one thing you can do to make your recovery process either: automate login creation. Schedule a job to run weekly with Robert Davis’s login copy script and send the results to yourself via email. That way, at the very least, you’ll have the exact list of logins, passwords, and SIDs to avoid the orphaned login problem when you restore databases. Run the create-login script sent to you by Robert’s tool, then as you restore each user database, the logins will automatically be associated and users can resume work as normal.
To help plan your build-versus-restore decision ahead of time, it helps to think through all of the implications. These are just some of the questions you’ll need to think through when designing a disaster recovery plan:
- What service pack & cumulative update pack was the server running?
- Did we have any non-SQL applications installed?
- Were any server-level settings like trace flags configured?
For answers, try running my Blitz script ahead of time. I bet you’ll learn a lot about your servers – I know my clients do!