Using Stack Overflow Queries to Generate Workloads

Last Updated May 19, 2019

One of my favorite questions is, “How can I generate workloads to run against SQL Server for testing?”

Step 1: get the StackOverflow.com database. This database has a relatively simple schema, just a few tables, real-world data distributions, and enough rows that you can generate seriously slow queries. You can use any size, small medium or large, and they’ll all work.

Step 2: get the top user queries. Over at Data.StackExchange.com, users write their own queries to learn things about Stack’s data. I’ve taken a dozen highly-voted queries and turned them into stored procedures, all of which take a single integer parameter. This way I can call ’em with a single random number and get constantly varying queries.

Step 3: call them randomly with SQLQueryStress. The last stored proc in the above script is usp_RandomQ, which uses a random number generator to pick one of the stored procs and run it. I use a recompile hint on usp_RandomQ because I don’t want his metrics sticking around in the plan cache – the real workload is the various stored procs that he calls. Just set up SQLQueryStress to repeatedly call usp_RandomQ from, say, a dozen threads, and presto, your SQL Server will fall over.

I use these 3 in combination all the time in my sessions. Enjoy!

Larry
August 1, 2016 5:03 pm

Any suggestions for those of us running sql server on a laptop, in a VM? Maybe a 10 or 20GB db?

Reply
- Brent Ozar
  August 2, 2016 6:07 am
  
  Larry – sure, I’d recommend using your development server at work instead.
  
  Reply
  - Larry
    August 2, 2016 8:17 am
    
    That isn’t applicable, but thanks.
    
    Reply
Kris
August 4, 2016 11:29 am

Hey Brent,
Have you ever thought about making some of your Dell Day tests available for everyone to run. Folks who were interested in participating could submit results for their particular server Then everyone could compare the results of different system configurations… This could be very interesting. If there was enough participation we might get to see some patterns in the results showing how some tests are more likely to be cpu bound or memory bound or io bound etc. Or course folks could just get an idea of how these test scale on various setups. I think that would be awesome … Except for the hassle of collecting results…

Anyway, folks could submit OS Version, SQL Server Version, Server, Number of and Make/Model of Cpu(s), Number of cores allocated to SQL Server, Amount of Ram allocated to SQL Server (I am not sure how far down the rabbit whole of “details to collect” to go… do we ask for type and speed or Ram, and the mode (Optimizer, Mirror, etc)…. Ugh! Then some basic Storage configuration details,

It might look like this:
Windows Server 2012 R2 – SQL Server 2012 SP3 Cumulative Update #4
Dell R720 -Two Intel E5-2670 v1, 16 cores Allocated, 128GB Ram Allocated,
MDF – Four 1 TB 7200 Rpm SAS2 6Gb/s Drives in Raid 10 configuration
LDF – Two 480 GB Intel S3500 SSDs SATA 6Gb/s in Raid 1 configuration
tempdb – Two 240 GB Intel S3500 SSDs SATA 6Gb/s in Raid 1 configuration

Then magically when all this data is in a database could graph it, chart it and have all kinds of fun…!

Reply
- Brent Ozar
  August 4, 2016 12:27 pm
  
  Kris – yeah, it’s just a matter of how much labor we can put in for free. We totally love sharing our work with the public, but even just broadcasting all this (coming up with the sessions, instrumenting our tests, doing the webcasts) takes days of our time. Right now we kinda draw the line here – hopefully that provides you some value in free training anyway!
  
  Reply
Gavin Draper
January 12, 2019 10:57 pm

I recently wrote a tool called ChaosLoad (https://github.com/gavdraper/ChaosLoad) this does something similar but allows you define the scripts you want to to run in JSON allowing for more configuration on how frequently each should run.

Reply