If you want to run the labs on your own (either during class, or later), you’re going to need:

  • A server
  • The Stack Overflow database
  • The indexes & scripts set up
  • Then ideally, back the database up, and set up an Agent job to rapidly restore the database between labs
  • SQLQueryStress

Let’s talk through each piece.

Building a SQL Server

In class, we use AWS EC2 VMs:

  • Mastering Index Tuning & Mastering Query Tuning use an i3.xlarge: 4 cores, 30GB RAM, local NVMe SSD storage
  • Mastering Server Tuning uses an i3.2xlarge: 8 cores, 61GB RAM, local NVMe storage – we use more cores & RAM here because we run heavier stress tests, since we’re doing server-level discussions

Each server needs at least 500GB local SSD space to deal with the 350GB Stack Overflow database, index creation space, TempDB, backups, and restores. It needs to be fast storage, too: for a rough idea, you’ll need to be able to restore the ~350GB backup in 15 minutes or less. (I’m just using restore speed as one example here – you’ll be doing lots of storage-intensive tasks, like creating indexes on large tables.)

If you want to check your storage speed before downloading the Stack database, run a test with CrystalDiskMark. You want at least 1,000 MB/sec for sequential reads and writes. As an example, my laptop’s SSD speeds are shown at right, and it would be fine for the labs.

Can you get by with less hardware? Sure, but of course your performance will be different than what we’re seeing in class. I actually don’t think that’s a bad thing – every server out there is different – but just be aware that it’ll pose additional challenges for you if you try something like 4 cores, 8GB RAM. Your index creations will be terribly slow, and you probably won’t be able to keep up in class.

Can you use more hardware? Yep, and as long as you’re not caching the entire database in RAM, you’ll probably still have the same basic challenges that we tackle in all of the classes.

After building the server, install:

Getting the Stack Overflow database

Download the 2018-06 version of the Stack Overflow database: 38GB torrent (magnet.) If you’re not familiar with BitTorrent, here are detailed instructions, but just make sure to get the Stack version mentioned earlier in this paragraph – it’s important so you reproduce demos the same way. Your company may block BitTorrent – many do – but it’s the most efficient way to get a >300GB database out there. There aren’t a lot of file hosts willing to share a file that big, heh. You may need to download it from home.

When you extract it, it’ll expand to a ~304GB SQL Server database that you can attach. There are other smaller versions of the database too, but be aware that you’ll get radically different query plans, and some of the demos won’t work the same way since we’re querying for specific date ranges.

It ships in SQL 2008 compatible format, but you usually wanna learn on the most recent Cardinality Estimator (CE). To set that, in SSMS, right-click on the database, go into Options, and change Compatibility Level to the most current version you’re on – but not SQL Server 2019, which behaves quite differently – hold off on that compat level until SQL Server 2019 is released. I’m updating the classes with more modules for SQL Server 2019’s behavior, but if you try that now, you’re going to get wildly unpredictable behavior compared to the rest of your classmates.

Setting Up the Stored Procs and Indexes

We use the First Responder Kit scripts for performance analysis, and we use the below procs to help the demos along. Create these in the Stack Overflow database:

Don’t install that in production, bucko.

Next, run the index creation scripts below. It’s going to take a while, depending on how fast your lab’s storage is. After creating these indexes, we’ll take a backup of the database so that you can restore it each time to set back to a known starting point – rather than creating the indexes every time you restart labs, which can take tens of minutes.

Back It Up and Set Up a Restore Job

During class, between each lab, the students restore their database to this starting point. For sheer performance, you want to run the backup across 4 files – SQL Server backups & restores can actually go faster when you stripe them across more files, even when writing to the same storage.

Then, set up an Agent job to restore them. Here’s the script I use – just modify the file path locations for your backups. Note that mine also has a step to set Cost Threshold for Parallelism and MAXDOP back to normal defaults for the VM we’re working with in the cloud.

In my example below, I’m doing my restores from E:\MSSQL\BACKUP – you may need to modify your backup path and file names.

Test your restore job to get an idea of how long it’ll take – if it takes more than 15 minutes, you probably want to use a faster VM to be able to keep up in class. If the restore takes, say, 20-30 minutes, you’re going to have a tough time keeping up when you’re doing storage-intensive things like creating indexes.

SQLQueryStress for load testing

SQLQueryStress is an easy, free, open source load testing utility. Download the latest version (zip), and extract it to a folder. You don’t have to run a setup or anything – it’s just a single .exe file that you can run whenever you want to do a load test.

Then, download my load test files and extract those to the same folder. Your folder will end up looking like this:

SQLQueryStress folder

And that’s it! See you in class.

1 Comment. Leave new

  • The new version of the DropIndexes Stored Procedure uses the string_split function that is only available for compatibility level 2016 and up. Dont be like me and leave the compatibility level at the default of 2008 or you are going to have a bad time…

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}