If you want to run the labs on your own (either during class, or later), you’re going to need:

  • A server
  • The Stack Overflow database
  • The indexes & scripts set up
  • Then ideally, back the database up, and set up an Agent job to rapidly restore the database between labs
  • SQLQueryStress

Let’s talk through each piece.

Building a SQL Server

In class, we use AWS EC2 VMs:

  • Mastering Index Tuning & Mastering Query Tuning use an i3.xlarge: 4 cores, 30GB RAM, local NVMe SSD storage.
  • Mastering Server Tuning uses an i3.2xlarge: 8 cores, 61GB RAM, local NVMe storage. We use more cores & RAM here because we run heavier stress tests, since we’re doing server-level discussions

Each server needs at least 500GB local SSD space to deal with the 350GB Stack Overflow database, index creation space, TempDB, backups, and restores. It needs to be fast storage, too: for a rough idea, you’ll need to be able to restore the ~350GB backup in 15 minutes or less. (I’m just using restore speed as one example here – you’ll be doing lots of storage-intensive tasks, like creating indexes on large tables.)

Before you install anything on your VM, run a test with CrystalDiskMark. You want at least 1,000 MB/sec for sequential reads and 400 MB/sec for sequential writes. Results for an AWS i3.xlarge like we use in class is shown at right: if you can’t get the top line’s results to be at least that high, stop: you’re not going to be able to keep up in class.

If you want to use an Azure VM, use the L8s v2 instance type with 8 cores, 64GB RAM, and 1.8TB of local NVMe storage. It’s the smallest instance type Microsoft offers with NVMe storage, which is what you’ll need to keep up with the labs.

Can you use faster hardware? Yep, and as long as you’re not caching the entire database in RAM, you’ll probably still have the same basic challenges that we tackle in all of the classes.

After building the server, install:

  • SQL Server 2017 or 2019 Developer Edition (Express Edition won’t work)
  • Use the default collation during install, SQL_Latin1_General_CP1_CI_AS (don’t get fancy with binary collations)
  • The latest Cumulative Update
  • The most recent SQL Server Management Studio (and it really does have to be the most recent one)

Getting the Stack Overflow database

Download the 2018-06 version of the Stack Overflow database: 38GB torrent (magnet.) If you’re not familiar with BitTorrent, here are detailed instructions, but just make sure to get the Stack version mentioned earlier in this paragraph – it’s important so you reproduce demos the same way. Your company may block BitTorrent – many do – but it’s the most efficient way to get a >300GB database out there. There aren’t a lot of file hosts willing to share a file that big, heh. You may need to download it from home.

When you extract it, it’ll expand to a ~304GB SQL Server database that you can attach. There are other smaller versions of the database too, but be aware that you’ll get radically different query plans, and some of the demos won’t work the same way since we’re querying for specific date ranges.

It ships in SQL 2008 compatible format, but you usually wanna learn on the most recent Cardinality Estimator (CE). To set that, in SSMS, right-click on the database, go into Options, and change Compatibility Level to the most current version you’re on. SQL Server 2019 does do some things dramatically differently – sometimes better, sometimes worse – and we’ll talk about that during class.

Setting Up the Stored Procs and Indexes

We use the First Responder Kit scripts for performance analysis, and we use the below procs to help the demos along. Create these in the Stack Overflow database:

Don’t install that in production, bucko.

Next, run the index creation scripts below. It’s going to take a while, depending on how fast your lab’s storage is: on my crazy-fast 2019 MacBook Pro, it took 26 minutes. I know, that’s a long time – but you only have to do this once. The way I’ve built this lab, I have you create ALL of the possible indexes up front, and then when it’s time to do each lab, I just drop the specific indexes that you won’t need for that one lab. This means you can restore the database quickly to get back to the base config, drop specific indexes, and be good to go. (No, you don’t wanna try to avoid the restore – during some of the labs, we actually alter the database’s config, like add or remove columns from tables.)

After creating these indexes, we’ll take a backup of the database so that you can restore it each time to set back to a known starting point.

Back It Up and Set Up a Restore Job

During class, between each lab, the students restore their database to this starting point. For sheer performance, you want to run the backup across 4 files – SQL Server backups & restores can actually go faster when you stripe them across more files, even when writing to the same storage.

Here’s a quick and dirty backup script – just change the drive/folder for your machine:

Then, set up an Agent job to restore them. Here’s the script I use – just modify the file path locations for your backups. In my example below, I’m doing my restores from E:\MSSQL\BACKUP – you may need to modify your backup path and file names.

Note that mine also has a step to set Cost Threshold for Parallelism and MAXDOP back to normal defaults for the VM we’re working with in the cloud. I also set compatibility level to 140 (SQL Server 2017) – that’s just because I do most of my client work on 2016-2017, so I want the database to behave that way. If you do most of your work on a different version, you can use a different compat level. We’ll talk during the class about how some labs need specific compat levels.

Test your restore job to get an idea of how long it’ll take – if it takes more than 15 minutes, you probably want to use a faster VM to be able to keep up in class. If the restore takes, say, 20-30 minutes, you’re going to have a tough time keeping up when you’re doing storage-intensive things like creating indexes.

SQLQueryStress for load testing

SQLQueryStress is an easy, free, open source load testing utility. Download the latest version (zip), and extract it to a folder. You don’t have to run a setup or anything – it’s just a single .exe file that you can run whenever you want to do a load test.

Then, download my load test files and extract those to the same folder. Your folder will end up looking like this:

SQLQueryStress folder

Run the First Lab for Testing Purposes

Let’s act like it’s time for you to do the first lab in class. Follow the setup instructions on the Lab 1 page. You’re basically going to:

  1. Restore your database
  2. Copy/paste the Lab 1 setup script from that page, and run that
  3. EXEC usp_IndexLab1_Setup
  4. Open SQLQueryStress, click File, Load, IndexLab1, and hit Go

Your SQLQueryStress should start running, racking up iterations like this:

It’s totally okay if you get some exceptions like I’m getting in that above screenshot. If you get 100% exceptions, that means there’s something wrong with your setup, like you didn’t apply all of the setup scripts. Click the little … button to see what the exceptions are, and troubleshoot those.

You can click on the … button to view the exceptions at any time without stopping the test. These exceptions are totally normal and okay:

  • Deadlocks
  • Divide by zero errors
  • Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements

If you see any other kinds of errors, there’s probably a setup problem. Stop and double-check your work.

You don’t have to wait for it to finish – your goal is just to make sure that it starts running without producing 100% errors. It won’t be fast, believe me! That’s the point of these labs, to really push your machine hard.

You can cancel SQLQueryStress at any time. You may hit a bug in SQLQueryStress where you close it, but it’s still running – you may need to check in Task Manager if it’s still showing up even when it’s not running. It kinda disappears like a zombie process. In that case, you’ll want to end-task it:

And that’s it! See you in class!

40 Comments. Leave new

  • The new version of the DropIndexes Stored Procedure uses the string_split function that is only available for compatibility level 2016 and up. Dont be like me and leave the compatibility level at the default of 2008 or you are going to have a bad time…

  • With the Black Friday Sale, I’m guessing this page is going to see quite a bit more activity 🙂

    Brent – do you currently build your lab machines with any automation tools? If so, would you be willing to share what you’ve got on github?

    If not, I’ll probably start work on defining the lab server in a Packer image. I’ll post a link here once there’s something reasonably workable that others can use.

  • I get $150 / month Azure credits w/ MSDN Enterprise. Should I expect that to cover my VM costs for a 3-day mastering class?
    Will that be a reasonable substitute for buying VMs w/ my BrentOzar.com subscription? I need the Azure VM experience anyway.

  • What is the expected cost to have this VM up and running for 3-4 days and do all the labs?

    • Charles – sorry, I really have no idea what your infrastructure costs. Some folks have access to a ton of compute power at their jobs, or have spare lab machines that they’ve built for learning purposes. Other folks have been buying their own used server to save money.

  • oswaldo.morales
    December 6, 2019 9:38 am

    Brent, I am taking the MIT on 12/18 – 20. I am setting up my virtual my machine myself. How long in advance do you recommend me building my virtual environment? I suppose I have to pay daily for my amazon box. Thanks.

    • If it’s an Amazon EC2 VM, you can build it at any time, and then just shut it down when you’re not using it. You only pay for compute while it’s on (although you pay for storage all the time.) Just make sure you don’t build the VM in a way that causes it to terminate when it’s shut down.

      • Brent, I am getting my EC2 instance ready. Does it matter what version of windows server? 2012, 2016, 2019?
        Also, they offer it with SQL Standard, should I pick that one? Or just build a Windows box and then download and install SQL developer for free? Thanks.

        • The version of Windows doesn’t matter for the class. About the rest, make sure to read this post carefully. Thanks!

          • Hi Brent, one question about EC2 box volumes. The server I built has a persistent 30 GB C:\ drive and an ephemeral 800 D:\ drive. I put the StackOverflow database on D:\ but it was lost when I stopped the instance. Do you recommend me adding a persistent volume? or just use the ephemeral and leave the instance running all the time. Thanks!

          • Ozzy – I would do the ephemeral drives and set up an Agent job to restore the databases when you want to work on them. I tried to make that clear in the post, too.

  • Miguel Ramirez
    December 11, 2019 8:19 am

    Just FYI: the SQL job had a minor issue: The configuration option ‘cost threshold for parallelism’ does not exist, or it may be an advanced option. [SQLSTATE 42000] (Error 15123) The configuration option ‘max degree of parallelism’ does not exist, or it may be an advanced option.

    • Ok, you understand how to fix that, right?

    • These are advanced options so you have to turn that on first (and back off because you’re a good DBA).

      USE master;
      EXEC sys.sp_configure ‘show advanced option’, ‘1’;

      EXEC sys.sp_configure N’cost threshold for parallelism’, N’50’;
      EXEC sys.sp_configure N’max degree of parallelism’, N’0′;
      EXEC sys.sp_configure ‘show advanced option’, ‘0’;

  • parikshit.paliwal
    December 13, 2019 5:42 pm

    I downloaded the Stackoverflow database (40 GB) Zipped size. It has 4 data files and 1 log file. When I try to unzip using 7z zip software it unzipped only the first data file. It is not unzipping the other. Are they corrupted?

  • Anyone have a combination of Azure VM size and managed\attached disk (P20/P30/etc) that hits those IO numbers?

    • I ended up using an L8s_v2. Note that the storage is ephemeral so if you shut it down you lose your data. You also have to add the drive at startup and format it in disk management. I just made sure that the backup was copied to the C:\ drive in case I lose it or something happens (I have one on the NVMe drive too with a job for the restore.)

      • Thanks Ryan…what region was that in…East US?

        • I put mine in West US2, but you can also use these.
          East US, East US 2, West Europe, SE Asia, and North Europe.
          It also recommended to use Windows 2019 with this series so I used W19 and SQL17 Developer.

          • Cool. I think I have mine working…not sure how but I ended up with an extra 1TB SSD drive that persists, in addition to the 2TB NvMe. Hope this is helping others as well 🙂

  • Steve Scheffler
    December 17, 2019 3:55 pm

    Wow, thanks for the pointer to the L8s series Ryan, missed that azure announcement!

    How do you know that the NVMe drive is ephemeral? I know the smaller D: temp storage one always is, but didn’t see a notice on this in the docs.

  • The links to the torrent/magnet seem to not find any files to download. Should I be using the 52GB version from the detailed blog?

    • What exact error are you getting? I just tried and it’s working fine.

      • When I use magnet:?xt=urn:btih:a2c6a416758acc885d21bf90e0833f6c79a12e92&dn=StackOverflow-SQL-Server-201806 I don’t get anyone seeding the file, but when I use magnet:?xt=urn:btih:1fa77dfa302484dedbf06a03cb19f10eafe0af57&dn=StackOverflow-SQL-Server-201912 I do. Is there any problem using the newer one?

        • Try the torrent file rather than the magnet. I’m seeding with multiple seed boxes, so you may just need to hold tight for a few minutes to let seeding start. Use the exact version specified in the post. Thanks!

  • Siva Ramasamy
    February 9, 2020 9:29 am

    Brent and Everyone, I am scheduled to attend Brent’s Mastering Index tuning class on 02/14/2020 and trying to setup my own VM. I am not able to choose the VMSize “l8s”. It is greyed out for me.
    Can someone help me?

    • Siva – unfortunately no, I can’t do free personal support on Azure. You get a discounted rate if you build your own VM, but the reason you get the discount is that you do the setup work yourself. If you need my assistance to build the VM, contact me via the site about upgrading your membership to include the VM pricing.

      • Siva Ramasamy
        February 9, 2020 7:55 pm

        Thanks Brent. Will try it out a little bit myself and upgrade my membership if I am not able to set it up.

    • Victor Hugo Barajas Sosa
      February 12, 2020 10:45 am

      Hi Siva. L8s VM is not available in all regions, make sure to select a region where the VM is available like East US.

  • Andrey Zavadskiy
    April 21, 2020 1:28 pm

    Stored procedure DropIndexes fails when you have different collation on SQL Server and StackOverflow database. The last has SQL_Latin1_General_CP1_CI_AS. To fix this you need to specify collation before subqueries from #ExceptIndexNames.
    Line 34:
    AND UPPER(o.name) COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT IndexName FROM #ExceptIndexNames);
    Line 53:
    D UPPER(i.name) COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT IndexName FROM #ExceptIndexNames);

  • Hi,
    Do I need to have a login in order to Run the First Lab for Testing Purposes ?
    When I use the link Lab 1 page I am forwarded to:
    but there is no Lab 1 setup script that I could copy from that page.
    I haven’t enrolled for the course yet but I already set up my hardware and I am trying to test it.
    Would it be possible to Run the First Lab for Testing Purposes without having a login ?


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.