How to Set Up Your Own Mastering Class VM

When you buy one of my Live Class Season Passes, you can save a lot of money by building your own VM to follow along with the labs. Or, you might just wanna re-run the labs later to see if you can do a better job, or if you’re still mastering a concept.

You’re going to need:

  • A server
  • The Stack Overflow database
  • The indexes & scripts set up
  • Then ideally, back the database up, and set up an Agent job to rapidly restore the database between labs
  • SQLQueryStress

Let’s talk through each piece.

Building a SQL Server

To get a rough idea of how much hardware to use, let’s look at the AWS EC2 VMs I give the class students:

  • Mastering Index Tuning & Mastering Query Tuning use an i3.xlarge: 4 cores, 30GB RAM, local NVMe SSD storage
  • Mastering Server Tuning uses an i3.2xlarge: 8 cores, 61GB RAM, local NVMe storage – we use more cores & RAM here because we run heavier stress tests, since we’re doing server-level discussions

Each server needs at least 500GB local SSD space to deal with the 350GB Stack Overflow database, index creation space, TempDB, backups, and restores. It needs to be fast storage, too: for a rough idea, you’ll need to be able to restore the ~350GB backup in 15 minutes or less. (I’m just using restore speed as one example here – you’ll be doing lots of storage-intensive tasks, like creating indexes on large tables.)

If you want to check your storage speed before downloading the Stack database, run a test with CrystalDiskMark. You want at least 1,000 MB/sec for sequential reads and writes. As an example, my laptop’s SSD speeds are shown at right, and it would be fine for the labs.

Can you get by with less hardware? Sure, but of course your performance will be different than what we’re seeing in class. I actually don’t think that’s a bad thing – every server out there is different – but just be aware that it’ll pose additional challenges for you if you try something like 4 cores, 8GB RAM. Your index creations will be terribly slow, and you probably won’t be able to keep up in class.

Can you use more hardware? Yep, and as long as you’re not caching the entire database in RAM, you’ll probably still have the same basic challenges that we tackle in all of the classes.

After building the server, install:

Getting the Stack Overflow database

Download the 2018-06 version of the Stack Overflow database: 38GB torrent (magnet.) If you’re not familiar with BitTorrent, here are detailed instructions, but just make sure to get the Stack version mentioned earlier in this paragraph – it’s important so you reproduce demos the same way. Your company may block BitTorrent – many do – but it’s the most efficient way to get a >300GB database out there. There aren’t a lot of file hosts willing to share a file that big, heh. You may need to download it from home.

When you extract it, it’ll expand to a ~304GB SQL Server database that you can attach. There are other smaller versions of the database too, but be aware that you’ll get radically different query plans, and some of the demos won’t work the same way since we’re querying for specific date ranges.

It ships in SQL 2008 compatible format, but you usually wanna learn on the most recent Cardinality Estimator (CE). To set that, in SSMS, right-click on the database, go into Options, and change Compatibility Level to the most current version you’re on – but not SQL Server 2019, which behaves quite differently – hold off on that compat level until SQL Server 2019 is released. I’m updating the classes with more modules for SQL Server 2019’s behavior, but if you try that now, you’re going to get wildly unpredictable behavior compared to the rest of your classmates.

Setting Up the Stored Procs and Indexes

We use the First Responder Kit scripts for performance analysis, and we use the below procs to help the demos along. Create these in the Stack Overflow database:

Don’t install that in production, bucko.

Next, run the index creation scripts below. It’s going to take a while, depending on how fast your lab’s storage is. After creating these indexes, we’ll take a backup of the database so that you can restore it each time to set back to a known starting point – rather than creating the indexes every time you restart labs, which can take tens of minutes.

Back It Up and Set Up a Restore Job

During class, between each lab, the students restore their database to this starting point. For sheer performance, you want to run the backup across 4 files – SQL Server backups & restores can actually go faster when you stripe them across more files, even when writing to the same storage.

Then, set up an Agent job to restore them. Here’s the script I use – just modify the file path locations for your backups. Note that mine also has a step to set Cost Threshold for Parallelism and MAXDOP back to normal defaults for the VM we’re working with in the cloud.

In my example below, I’m doing my restores from E:\MSSQL\BACKUP – you may need to modify your backup path and file names.

Test your restore job to get an idea of how long it’ll take – if it takes more than 15 minutes, you probably want to use a faster VM to be able to keep up in class. If the restore takes, say, 20-30 minutes, you’re going to have a tough time keeping up when you’re doing storage-intensive things like creating indexes.

SQLQueryStress for load testing

SQLQueryStress is an easy, free, open source load testing utility. Download the latest version (zip), and extract it to a folder. You don’t have to run a setup or anything – it’s just a single .exe file that you can run whenever you want to do a load test.

Then, download my load test files and extract those to the same folder. Your folder will end up looking like this:

SQLQueryStress folder

And that’s it! See you in class, and hope you have as much fun as Vadim had.

Previous Post
#TSQL2sday: I Just Wish SQL Server Could Restore a Single Object from Backup.
Next Post
DBA Training Plan 20: Planning Your Next Server

12 Comments. Leave new

  • How about running a docker of SQL Server? It makes jumping between versions for demos and tests so easy!

    Reply
    • Alex – we don’t jump between versions for the demos, so it’s kinda irrelevant.

      Reply
      • Sure, but one day maybe you’ll want to update to a newer CU/version, and maybe compare between them, etc. 🙂

        Reply
        • Nah, that’s not even remotely on the radar.

          I get that you’re a Docker fan, but here, that’s a solution looking for a problem. 😉

          Reply
          • It’s funny, as I wasn’t a docker fan until really recently, when I did have to change versions for an open-source project I’m helping out with. Directly installing and upgrading versions was such a pain, not to mention trying to clean them up to switch back. Then I tried docker and was instantly converted, heh (admittedly, after some pain figuring out the initial setup). I think that for any test/demo server it’s much easier to set up, but of course it’s definitely not a must 🙂

          • Exactly – the pain of the setup means that it’s worse, not better, for solutions when you need to get hundreds of training students ready. Docker is the exact opposite of what I need here. I’m not trying to teach them Docker, so using containers would actually cost me time and money.

            It kills me when I hear the Docker and K8s crowd say things like, “it’s super easy once you get past setting it up!” Yeah, okay, great. Real helpful.

          • Well, the pain of setup is due to lack of knowledge, not the setup itself (talking docker and not K8 here). Once you have a working command, and especially on Linux, you basically run one command to install docker, and another command to get and run the latest version of the DB you need, and that’s it.
            But yeah, in this case I get what you’re saying, it’s probably simpler with a regular installation.

          • “Once you have a working command” sums it up pretty well.

            Reminds me of the old Steve Martin joke: “if you want to be a millionaire, it’s really easy. First, get a million dollars.”

          • Hehehe good point. That working command is reusable and can be shared to everyone who needs it without the need for everyone to come up with a million dollars, though 😛

  • …Bob’s bank. I got your money. White suit, right-hand pocket.

    Reply
    • not the same bit or album, but that bit from comedy is not pretty reminded me of the one from wild and crazy guy.

      Reply
  • Thank you for the detailed instructions!

    I’m making use of the recommended AWS EC2 VM (second time round it was easier than the first time I attempted it…old dog learning new tricks, woof, it seems!)

    I’m gearing up for Monday’s class. Tonight is my second round of setting up…

    Mental note to myself and maybe anyone else visiting this page: EPHEMERAL storage is a very beautiful word to say: once, you’ve shut down your EC2 instance: everything you’ve got stored there goes whoosh gone baby gone.

    This time round:
    1 Restore the database
    2 Run the scripts above [make a pot of coffee and consume while this completes]
    3 Backup the database
    4-5-6…Zip the backup | Move the backup to C Drive |Then and only then: shut down the instance.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.