Fellow DBA Jason Crider asked me about my ideal dev, test and production SQL Server environments.
The Perfect Test/QA Server Has Nightly Restores from Production
In a perfect world, I like setting up a QA box that’s got enough space to handle my production database. The QA box’s storage can be much slower than production, like using large/cheap SATA drives instead of small/expensive SAS drives.
Every night at 9pm, the production full backups kick off, and they’re finished by 10pm. The backups are written to a file share. At 11pm, the QA box kicks off a restore job that grabs the latest full backups off the file share and restores them.
Now when QA tests deployment scripts in the QA environment, they can be reasonably sure that the scripts will perform the same way in production. Sometimes developers and QA staff assume that production is exactly like development: they’ll make small tweaks over time to the dev server and schema, and their deployment scripts will fail because the production server is missing those tweaks. If the QA server is restored every night, that possibility of failure is reduced.
Developers love this fresh-QA-box approach because it gives them fresh data every morning for testing. Want to find out how a new query would perform with real production data? Development environments are rarely refreshed from production because people might lose work, but an automated QA restore system means there’s always a non-production box with the freshest data possible.
The next perk is that now I’ve got an automated fire drill restore system going. Every night, I know for sure that my production backups worked. Unfortunately, this can also bite me in the rear: if my production backups fail for some reason, then my QA system is unusable until I fix it. Backup failures suddenly become much more time-sensitive, and I may not have enough time to rerun the production backups. To prevent problems, I recommend keeping at least two days of backups online so that you can restore from the previous (and successful) backup if the normal one fails. I wasn’t ever ambitious enough to implement an automated restore-the-second-backup script because my production backups so rarely failed.
This approach doesn’t work if:
- You’ve got sensitive data in production (credit card data, medical data, and other things you don’t want developers to have free reign over)
- You can’t restore your production database fast enough (in which case you should be checking out backup compression tools like LiteSpeed)
- Your QA box isn’t big enough to handle production (in which case you don’t know that dev code will scale to production anyway)
- Your QA team needs 24/7 access to the QA box (in which case you might consider multiple QA boxes)
- You don’t do full backups nightly (if you’re doing diffs, you’ll just need a more complex auto-restore script)
Never Restore QA From Development
Some developers like to have a QA server that they restore from development periodically, and then tell the QA people to test their work on the QA server.
The only time I allow that scenario is when they’re going to take the same approach with the production server. If their production upgrade plan is to take down production and restore the dev database onto it, then restoring QA from dev is a valid approach. Otherwise, they’re cheating!
Restoring QA from production forces the developers to script out their deployments instead of point-and-click-table-changes in SSMS.
The Perfect Test/QA Server is Identical to Production
Keeping identical sets of CPUs, memory and storage in both environments means that you can run performance testing with confidence before a new dev query knocks the production server over. This means query execution plans on each environment should be identical. If the QA server has a different CPU count or different number of RAID arrays from production, your query plans can be different, making it harder to predict production performance.
For example, I worked in one environment where the servers were:
- Production – dual quad-core Xeons (8 cores total) with 4 raid arrays for data, plus separate arrays for logs and TempDB
- QA – four single-core Xeons (4 cores total) with 1 giant SATA array for data, plus separate arrays for logs and TempDB
Unfortunately, when the SQL Server engine builds an execution plan for a query, it’ll build differently based on the number of CPUs and the number of filegroups/files involved. In this environment, the DBA had never restored QA from the production environment, and the production database had 4 filegroups (one for each array) whereas QA only had 1 filegroup. The queries and indexes were tuned for the QA box, but production performance never seemed to match up with their expectations. Keeping identical hardware and identical database configurations mitigates that risk.
In the real world, we can’t always afford to keep these two environments identical. I’ve also used the hand-me-down approach:
- Production is the latest and greatest hardware
- Disaster Recovery is the hand-me-down from production
- QA is the hand-me-down from DR
- Development is the hand-me-down from QA
The drawback with this approach is that every time you buy a new production environment, you’re taking four times as many outages as you’d like to – you have to rebuild and shuffle servers all over the place. Manpower isn’t free.
Virtualization is a four-letter word to database administrators, but it’s got a huge advantage here. If the DR, QA, and dev servers are all virtual servers, then it’s very easy to shuffle them around to different pieces of host hardware. The hand-me-down process becomes much simpler:
- Put the new hardware into production
- Take the old production hardware and install your favorite virtualization software
- Migrate your instances around to rebalance dev/QA/DR load with a few mouse clicks
- Change the number of virtual CPUs and RAM in each instance to take advantage of the new underlying hardware
Presto, everybody’s got better gear and you’ve got a free weekend.
“But wait!” you say. “Virtual servers perform differently than the physical SQL Server we’re using in production!” Correct, but you’re already using shuffled-around hand-me-down hardware that performs differently than production. Take a 3-5% hit on performance, and save yourself days of labor. Never rebuild a dev/QA/DR server again. I’ll take free weekends in exchange for a 3-5% dev/QA/DR performance hit anytime.
I haven’t talked enough about virtualization lately, and it’s high time I fixed that. I’ll be blogging this week about what virtualization means for the DBA. First up: why your sysadmin wants to virtualize your servers.