If you want to run the labs on your own (either during class, or later), you’re going to need:

  • A fast SQL Server
  • The right Stack Overflow database
  • An Agent job to rapidly restore the database between labs
  • SQLQueryStress

The setup is exactly the same for all of my Mastering classes, so as long as you’ve set up for any of ’em, you’re good to go. If this is your first class, let’s talk through each piece.

Building a SQL Server

In class, we use AWS EC2 VMs:

  • Mastering Index Tuning, Query Tuning, and Parameter Sniffing use an i3.xlarge: 4 cores, 30GB RAM, local NVMe SSD storage.
  • Mastering Server Tuning uses an i3.2xlarge: 8 cores, 61GB RAM, local NVMe storage. We use more cores & RAM here because we run heavier stress tests, since we’re doing server-level discussions

Each server needs at least 300GB local SSD space to deal with the ~200GB Stack Overflow database and index creation space. It needs to be fast storage, too: for a rough idea, you’ll need to be able to restore the backup in 15 minutes or less. (I’m just using restore speed as one example here – you’ll be doing lots of storage-intensive tasks, like creating indexes on large tables.)

Before you install anything on your VM, run a test with CrystalDiskMark. You want at least 1,000 MB/sec for sequential reads and 400 MB/sec for sequential writes. Results for an AWS i3.xlarge like we use in class is shown at right: if you can’t get the top line’s results to be at least that high, stop: you’re not going to be able to keep up in class.

If you want to use an Azure VM, use the L8s v2 instance type with 8 cores, 64GB RAM, and 1.8TB of local NVMe storage. It’s the smallest instance type Microsoft offers with NVMe storage, which is what you’ll need to keep up with the labs. Be aware that local NVMe storage erases itself whenever the VM is shut down.

Can you use faster hardware? Yep, and as long as you’re not caching the entire database in RAM, you’ll probably still have the same basic challenges that we tackle in all of the classes.

After building the server, install:

  • SQL Server 2017 or 2019 Developer Edition (Express Edition won’t work)
  • During the install, check the box for “Grant Perform Volume Maintenance Task” – this lets SQL Server restore databases way faster, which will be important during the class
  • Use the default collation during install, SQL_Latin1_General_CP1_CI_AS (don’t get fancy with binary collations)
  • Use at least 4 equally sized TempDB data files, each at least 10GB in size (more files is okay, but just definitely get at least 4 to avoid PFS/SGAM contention during the labs)
  • The latest Cumulative Update
  • The most recent SQL Server Management Studio (and it really does have to be the most recent one)
  • The most recent First Responder Kit scripts in the master database – use Install-Core-Blitz-With-Query-Store.sql

Getting the Stack Overflow database

In class, we use the 2018-06 version of the Stack Overflow database, but I’ve customized it to include a lot of indexes. Download it from the links below (only visible to logged-in Mastering class students):

That’s 4 files because if you stripe a backup across 4 files, it generally goes faster (as do the restores.) You’ll need all 4 of them in order to restore the database. If you’re working in a cloud VM, save these backup files on the permanent storage of your VM, not the local ephemeral storage (which is erased each time your VM shuts down.)

Set Up a Restore Job

Don’t just restore it once manually: you’ll need to automate this process. During class, between each lab, the students restore their database to this starting point. To do that, set up an Agent job to restore them. Here’s the script I use – just modify the file path locations for your backups. In my example below, I’m doing my restores from E:\MSSQL\BACKUP – you may need to modify your backup path and file names.

If you’re working in the cloud, put the backup files on your permanent storage, but restore the database onto your ephemeral (fast) storage. That way, whenever you start your SQL Server VM up for the first time in a while, you can load the data into the fast storage.

Note that mine also has a step to set Cost Threshold for Parallelism and MAXDOP back to normal defaults for the VM we’re working with in the cloud. I also set compatibility level to 140 (SQL Server 2017) – that’s just because I do most of my client work on 2016-2017, so I want the database to behave that way. If you do most of your work on a different version, you can use a different compat level. We’ll talk during the class about how some labs need specific compat levels.

Run your restore job to get an idea of how long it’ll take – if it takes more than 15 minutes, you probably want to use a faster VM to be able to keep up in class. If the restore takes, say, 20-30 minutes, you’re going to have a tough time keeping up when you’re doing storage-intensive things like creating indexes.

SQLQueryStress for load testing

SQLQueryStress is an easy, free, open source load testing utility. Download the latest version (zip), and extract it to a folder. You don’t have to run a setup or anything – it’s just a single .exe file that you can run whenever you want to do a load test.

Then, download my load test files and extract those to the same folder. Your folder will end up looking like this:

SQLQueryStress folder

And that’s it! See you in class!

47 Comments. Leave new

  • The new version of the DropIndexes Stored Procedure uses the string_split function that is only available for compatibility level 2016 and up. Dont be like me and leave the compatibility level at the default of 2008 or you are going to have a bad time…

  • With the Black Friday Sale, I’m guessing this page is going to see quite a bit more activity 🙂

    Brent – do you currently build your lab machines with any automation tools? If so, would you be willing to share what you’ve got on github?

    If not, I’ll probably start work on defining the lab server in a Packer image. I’ll post a link here once there’s something reasonably workable that others can use.

  • I get $150 / month Azure credits w/ MSDN Enterprise. Should I expect that to cover my VM costs for a 3-day mastering class?
    Will that be a reasonable substitute for buying VMs w/ my BrentOzar.com subscription? I need the Azure VM experience anyway.

  • What is the expected cost to have this VM up and running for 3-4 days and do all the labs?

    • Charles – sorry, I really have no idea what your infrastructure costs. Some folks have access to a ton of compute power at their jobs, or have spare lab machines that they’ve built for learning purposes. Other folks have been buying their own used server to save money.

  • oswaldo.morales
    December 6, 2019 9:38 am

    Brent, I am taking the MIT on 12/18 – 20. I am setting up my virtual my machine myself. How long in advance do you recommend me building my virtual environment? I suppose I have to pay daily for my amazon box. Thanks.

    • If it’s an Amazon EC2 VM, you can build it at any time, and then just shut it down when you’re not using it. You only pay for compute while it’s on (although you pay for storage all the time.) Just make sure you don’t build the VM in a way that causes it to terminate when it’s shut down.

      • Brent, I am getting my EC2 instance ready. Does it matter what version of windows server? 2012, 2016, 2019?
        Also, they offer it with SQL Standard, should I pick that one? Or just build a Windows box and then download and install SQL developer for free? Thanks.

        • The version of Windows doesn’t matter for the class. About the rest, make sure to read this post carefully. Thanks!

          • Hi Brent, one question about EC2 box volumes. The server I built has a persistent 30 GB C:\ drive and an ephemeral 800 D:\ drive. I put the StackOverflow database on D:\ but it was lost when I stopped the instance. Do you recommend me adding a persistent volume? or just use the ephemeral and leave the instance running all the time. Thanks!

          • Ozzy – I would do the ephemeral drives and set up an Agent job to restore the databases when you want to work on them. I tried to make that clear in the post, too.

  • Just FYI: the SQL job had a minor issue: The configuration option ‘cost threshold for parallelism’ does not exist, or it may be an advanced option. [SQLSTATE 42000] (Error 15123) The configuration option ‘max degree of parallelism’ does not exist, or it may be an advanced option.

    • Ok, you understand how to fix that, right?

    • These are advanced options so you have to turn that on first (and back off because you’re a good DBA).

      USE master;
      EXEC sys.sp_configure ‘show advanced option’, ‘1’;

      EXEC sys.sp_configure N’cost threshold for parallelism’, N’50’;
      EXEC sys.sp_configure N’max degree of parallelism’, N’0′;
      EXEC sys.sp_configure ‘show advanced option’, ‘0’;

  • parikshit.paliwal
    December 13, 2019 5:42 pm

    I downloaded the Stackoverflow database (40 GB) Zipped size. It has 4 data files and 1 log file. When I try to unzip using 7z zip software it unzipped only the first data file. It is not unzipping the other. Are they corrupted?

  • Anyone have a combination of Azure VM size and managed\attached disk (P20/P30/etc) that hits those IO numbers?

    • I ended up using an L8s_v2. Note that the storage is ephemeral so if you shut it down you lose your data. You also have to add the drive at startup and format it in disk management. I just made sure that the backup was copied to the C:\ drive in case I lose it or something happens (I have one on the NVMe drive too with a job for the restore.)

      • Thanks Ryan…what region was that in…East US?

        • I put mine in West US2, but you can also use these.
          East US, East US 2, West Europe, SE Asia, and North Europe.
          It also recommended to use Windows 2019 with this series so I used W19 and SQL17 Developer.

          • Cool. I think I have mine working…not sure how but I ended up with an extra 1TB SSD drive that persists, in addition to the 2TB NvMe. Hope this is helping others as well 🙂

  • Steve Scheffler
    December 17, 2019 3:55 pm

    Wow, thanks for the pointer to the L8s series Ryan, missed that azure announcement!

    How do you know that the NVMe drive is ephemeral? I know the smaller D: temp storage one always is, but didn’t see a notice on this in the docs.

  • The links to the torrent/magnet seem to not find any files to download. Should I be using the 52GB version from the detailed blog?

    • What exact error are you getting? I just tried and it’s working fine.

      • When I use magnet:?xt=urn:btih:a2c6a416758acc885d21bf90e0833f6c79a12e92&dn=StackOverflow-SQL-Server-201806 I don’t get anyone seeding the file, but when I use magnet:?xt=urn:btih:1fa77dfa302484dedbf06a03cb19f10eafe0af57&dn=StackOverflow-SQL-Server-201912 I do. Is there any problem using the newer one?

        • Try the torrent file rather than the magnet. I’m seeding with multiple seed boxes, so you may just need to hold tight for a few minutes to let seeding start. Use the exact version specified in the post. Thanks!

  • Brent and Everyone, I am scheduled to attend Brent’s Mastering Index tuning class on 02/14/2020 and trying to setup my own VM. I am not able to choose the VMSize “l8s”. It is greyed out for me.
    Can someone help me?

    • Siva – unfortunately no, I can’t do free personal support on Azure. You get a discounted rate if you build your own VM, but the reason you get the discount is that you do the setup work yourself. If you need my assistance to build the VM, contact me via the site about upgrading your membership to include the VM pricing.

      • Thanks Brent. Will try it out a little bit myself and upgrade my membership if I am not able to set it up.

    • Victor Hugo Barajas Sosa
      February 12, 2020 10:45 am

      Hi Siva. L8s VM is not available in all regions, make sure to select a region where the VM is available like East US.

    • Siva, you probably got this sorted months ago, but just for people scanning these comments in the future…

      In Azure you want the Standard L8s_v2 (8 vcpus, 64 GiB memory), and it is only available in some regions. You’ll have to shop around.

  • Andrey Zavadskiy
    April 21, 2020 1:28 pm

    Stored procedure DropIndexes fails when you have different collation on SQL Server and StackOverflow database. The last has SQL_Latin1_General_CP1_CI_AS. To fix this you need to specify collation before subqueries from #ExceptIndexNames.
    Line 34:
    AND UPPER(o.name) COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT IndexName FROM #ExceptIndexNames);
    Line 53:
    D UPPER(i.name) COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT IndexName FROM #ExceptIndexNames);

  • Hi,
    Do I need to have a login in order to Run the First Lab for Testing Purposes ?
    When I use the link Lab 1 page I am forwarded to:
    but there is no Lab 1 setup script that I could copy from that page.
    I haven’t enrolled for the course yet but I already set up my hardware and I am trying to test it.
    Would it be possible to Run the First Lab for Testing Purposes without having a login ?

  • On using an Azure VM: “It’s the smallest instance type Microsoft offers with NVMe storage, which is what you’ll need to keep up with the labs.”

    Why NVMe? Is it the amount of throughput that it offers? Could you not achieve the same goals with Premium SSD attached disks?

    • Marc – yes, but achieving that goal with Premium SSD will be expensive. Even the largest premium SSD (p80, 32TB) only hits 900 MB/sec theoretical, so you would need to get a couple and pool them together as described in this post:


      Or, you could just, you know, follow my instructions and get the speed for free. 😉

      • Ahhh, I see. I was trying to take advantage of a production-like server that I already have setup for scalability testing. I didn’t know the throughput difference was that large. I am still reading up on NVMe.

        Thank you. Looking forward to Tuesday.

        • Yeah, companies are usually shocked at how slow and expensive storage is in the cloud. Makes for a pretty fun learning journey, and good times for us database folks!

  • Thomas Lovie
    July 20, 2020 7:10 pm

    Does this version of the StackOverflow database pass a dbcc checkdb cleanly? I’m wondering if I need to re-download it.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.