In a perfect world, my test/QA servers get restored nightly from production. Let’s say every night at 9pm, the production full backups kick off, and they’re finished by 10pm. At 11pm, the QA box kicks off a restore job that grabs the latest full backups off the file share and restores them. The production backups are written to a file share, never local storage, so there’s no additional overhead on the production box for this restore process.
Now when QA tests deployment scripts in the QA environment, they can be reasonably sure that the scripts will perform the same way in production. Sometimes developers and QA staff assume that production is exactly like development: they’ll make small tweaks over time to the dev server and schema, and their deployment scripts will fail because the production server is missing those tweaks. If the QA server is restored every night, that possibility of failure is reduced.
Developers love this fresh-QA-box approach because it gives them fresh data every morning for testing. Want to find out how a new query would perform with real production data? Development environments are rarely refreshed from production because people might lose work, but an automated QA restore system means there’s always a non-production box with the freshest data possible.
The next perk is that now I’ve got an automated fire drill restore system going. Every night, I know for sure that my production backups worked. Unfortunately, this can also bite me in the rear: if my production backups fail for some reason, then my QA system is unusable until I fix it. Backup failures suddenly become much more time-sensitive, and I may not have enough time to rerun the production backups. To prevent problems, I recommend keeping at least two days of backups online so that you can restore from the previous (and successful) backup if the normal one fails. I wasn’t ever ambitious enough to implement an automated restore-the-second-backup script because my production backups so rarely failed.
This approach doesn’t work if:
- You’ve got sensitive data in production (credit card data, medical data, and other things you don’t want developers to have free reign over)
- You can’t restore your production database fast enough (in which case, you probably want to work on tuning that anyway in order to meet your RPO and RTO)
- Your QA box isn’t big enough to handle production (in which case you don’t know that dev code will scale to production anyway)
- Your QA team needs 24/7 access to the QA box (in which case, I have questions)
- You don’t do full backups nightly (if you’re doing diffs, you’ll just need a more complex auto-restore script like sp_DatabaseRestore)
Never Restore QA From Development
Some developers like to have a QA server that they restore from development periodically, and then tell the QA people to test their work on the QA server.
The only time I allow that scenario is when they’re going to take the same approach with the production server. If their production upgrade plan is to take down production and restore the dev database onto it, then restoring QA from dev is a valid approach. Otherwise, they’re cheating!
Restoring QA from production forces the developers to script out their deployments instead of point-and-click-table-changes in SSMS.
The Perfect Test/QA Server is Identical to Production
Keeping identical sets of CPUs, memory and storage in both environments means that you can run performance testing with confidence before a new dev query knocks the production server over. This means query execution plans on each environment should be identical. If the QA server has a different CPU count or memory size, your query plans can be different, making it harder to predict production performance.
For example, I worked in one environment where the servers were configured as:
- Production – dual quad-core CPUs (2×4, 8 cores total) with 128GB RAM
- QA – four single-core CPUs (4×1, 4 cores total) with 16GB RAM
Unfortunately, when the SQL Server engine builds an execution plan for a query, it’ll build differently based on the number of CPUs and the amount of memory. In this environment, the queries and indexes were tuned for the QA box, but production performance never seemed to match up with their expectations. Keeping identical hardware and identical database configurations mitigates that risk.
In the real world, we can’t always afford to keep these two environments identical, especially in infrastructure-as-a-service cloud providers when we’re paying every moment that a resource sits idle. Everything’s a compromise. The closer you can make test/QA to production, the more accurate your testing will be – but the more you’ll spend doing it.