Blog

Fellow DBA Jason Crider asked me about my ideal dev, test and production SQL Server environments.

The Perfect Test/QA Server Has Nightly Restores from Production

In a perfect world, I like setting up a QA box that’s got enough space to handle my production database.  The QA box’s storage can be much slower than production, like using large/cheap SATA drives instead of small/expensive SAS drives.

Every night at 9pm, the production full backups kick off, and they’re finished by 10pm.  The backups are written to a file share.  At 11pm, the QA box kicks off a restore job that grabs the latest full backups off the file share and restores them.

Now when QA tests deployment scripts in the QA environment, they can be reasonably sure that the scripts will perform the same way in production.  Sometimes developers and QA staff assume that production is exactly like development: they’ll make small tweaks over time to the dev server and schema, and their deployment scripts will fail because the production server is missing those tweaks.  If the QA server is restored every night, that possibility of failure is reduced.

Developers love this fresh-QA-box approach because it gives them fresh data every morning for testing.  Want to find out how a new query would perform with real production data?  Development environments are rarely refreshed from production because people might lose work, but an automated QA restore system means there’s always a non-production box with the freshest data possible.

The next perk is that now I’ve got an automated fire drill restore system going.  Every night, I know for sure that my production backups worked.  Unfortunately, this can also bite me in the rear: if my production backups fail for some reason, then my QA system is unusable until I fix it.  Backup failures suddenly become much more time-sensitive, and I may not have enough time to rerun the production backups.  To prevent problems, I recommend keeping at least two days of backups online so that you can restore from the previous (and successful) backup if the normal one fails.  I wasn’t ever ambitious enough to implement an automated restore-the-second-backup script because my production backups so rarely failed.

This approach doesn’t work if:

  • You’ve got sensitive data in production (credit card data, medical data, and other things you don’t want developers to have free reign over)
  • You can’t restore your production database fast enough (in which case you should be checking out backup compression tools like LiteSpeed)
  • Your QA box isn’t big enough to handle production (in which case you don’t know that dev code will scale to production anyway)
  • Your QA team needs 24/7 access to the QA box (in which case you might consider multiple QA boxes)
  • You don’t do full backups nightly (if you’re doing diffs, you’ll just need a more complex auto-restore script)

Never Restore QA From Development

Some developers like to have a QA server that they restore from development periodically, and then tell the QA people to test their work on the QA server.

The only time I allow that scenario is when they’re going to take the same approach with the production server.  If their production upgrade plan is to take down production and restore the dev database onto it, then restoring QA from dev is a valid approach.  Otherwise, they’re cheating!

Restoring QA from production forces the developers to script out their deployments instead of point-and-click-table-changes in SSMS.

The Perfect Test/QA Server is Identical to Production

Keeping identical sets of CPUs, memory and storage in both environments means that you can run performance testing with confidence before a new dev query knocks the production server over.  This means query execution plans on each environment should be identical.  If the QA server has a different CPU count or different number of RAID arrays from production, your query plans can be different, making it harder to predict production performance.

For example, I worked in one environment where the servers were:

  • Production – dual quad-core Xeons (8 cores total) with 4 raid arrays for data, plus separate arrays for logs and TempDB
  • QA – four single-core Xeons (4 cores total) with 1 giant SATA array for data, plus separate arrays for logs and TempDB

Unfortunately, when the SQL Server engine builds an execution plan for a query, it’ll build differently based on the number of CPUs and the number of filegroups/files involved.  In this environment, the DBA had never restored QA from the production environment, and the production database had 4 filegroups (one for each array) whereas QA only had 1 filegroup.  The queries and indexes were tuned for the QA box, but production performance never seemed to match up with their expectations.  Keeping identical hardware and identical database configurations mitigates that risk.

In the real world, we can’t always afford to keep these two environments identical.  I’ve also used the hand-me-down approach:

  • Production is the latest and greatest hardware
  • Disaster Recovery is the hand-me-down from production
  • QA is the hand-me-down from DR
  • Development is the hand-me-down from QA

The drawback with this approach is that every time you buy a new production environment, you’re taking four times as many outages as you’d like to – you have to rebuild and shuffle servers all over the place.  Manpower isn’t free.

Enter Virtualization

Virtualization is a four-letter word to database administrators, but it’s got a huge advantage here.  If the DR, QA, and dev servers are all virtual servers, then it’s very easy to shuffle them around to different pieces of host hardware.  The hand-me-down process becomes much simpler:

  • Put the new hardware into production
  • Take the old production hardware and install your favorite virtualization software
  • Migrate your instances around to rebalance dev/QA/DR load with a few mouse clicks
  • Change the number of virtual CPUs and RAM in each instance to take advantage of the new underlying hardware

Presto, everybody’s got better gear and you’ve got a free weekend.

“But wait!” you say. “Virtual servers perform differently than the physical SQL Server we’re using in production!”  Correct, but you’re already using shuffled-around hand-me-down hardware that performs differently than production.  Take a 3-5% hit on performance, and save yourself days of labor.  Never rebuild a dev/QA/DR server again.  I’ll take free weekends in exchange for a 3-5% dev/QA/DR performance hit anytime.

I haven’t talked enough about virtualization lately, and it’s high time I fixed that.  I’ll be blogging this week about what virtualization means for the DBA.  First up: why your sysadmin wants to virtualize your servers.

↑ Back to top
  1. It seems like the nightly refresh of QA environment is key, even if you don’t have a production environment yet (as in a greenfield development project). Then, once the team makes that first release, you don’t have to change anything but the source of the backup to restore from in order to invoke your ideal process.

  2. I wish I could do the nightly refresh. The problem is that the ERP system I administer here would also require a daily application to run to re-sync the Visual FoxPro and SQL databases. I haven’t been able to figure out how to automate that.

    Oh, and Brent I am regularly amazed and humbled by the amount of time you put into blogging. It inspires me to write more.

  3. I loved the backup-auto-restore, when it’s working, it’s beautiful!

    Another not so fun scenario is external Database hosting for production while QA/DEV are in-house
    Then I had to compress (or LiteSpeed/SQL Backup), download via FTP hoping it’s fast enough, then restore in-house

    Downloading 6GB (compressed) nightly is no fun, imagine some bigger files!

  4. I love the idea of the nightly restore – but my production environment is just way too big, and now that I’ve suddenly been thrust into the world of SAN management (hence my sudden departure from the internets), I’m even more sensitive to the space I’m allocating.

    We really haven’t sorted out a good solution just yet. Our QA folks have managed to make “skinny” versions of the databases, but obviously this doesn’t allow for equal performance testing, only functionality. They still keep one fully loaded QA environment, but we aren’t in that perfect realm of matching hardware.

    That said, thanks for yet another insightful post!

  5. I played around with this idea in the past, but always had issues on the secondary box that the .bak files were being restored on. I wrote a custom script but could never quite get it to work. Guess I’ll be playing with it again.

  6. Brent and everyone, is it NOT considered “Best Practice” to put SQL Test and Production environments (databases) on the same hardware or even the same VM server?
    One vendor doesn’t see anything wrong with this. I work at a large hospital. Thanks eveyone, Cheryl

    • Cheryl – putting test and production on the same hardware is a bad idea for safety purposes. If for some reason someone screws up and picks the wrong database or has the wrong connection string set up, it’s easier to trash records on the production database. Setting them up as different virtual OS instances on the same VM server may not be so bad, though – if you’re sharing hardware anyway, it doesn’t really matter if they’re on the same host. You want to be careful to avoid performance bottlenecks, but that’s true no matter where you put them in virtual machines.

  7. what about the stored procedures that are different in prod and qa?

    • Laurie – can you give me an example? Usually your code should be identical between servers, and it should fetch configuration variables from data sources (like config tables). Having different code from prod to QA to dev introduces risk for problems.

  8. Hi Brent,

    We recently just ordered 2 brand new servers: HP Prliant DL380 G7 – Xeon X 5690 3.46 Ghz X 2 with 144GB Memory and a 160gb Fusion iODrive for TempDB. these 2 new servers will replace the current production server and DR. This is for a central data warehouse environment.

    As we don’t have enough capex to buy the third one right now, my question is:

    is it better to use these 2 new servers as one for Production and one for DR or one for production and one for Test/QA?

    In worst situation, a down time of 2-3 hours is OK as our reporting servers are on defferent OLAP servers.

    what is the best way to utilize these 2 new servers?

    Thanks a lot.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>