Using Stack Overflow Queries to Generate Workloads

One of my favorite questions is, “How can I generate workloads to run against SQL Server for testing?”

Step 1: get the StackOverflow.com database. This database has a relatively simple schema, just a few tables, real-world data distributions, and enough rows that you can generate seriously slow queries. You can use any size, small medium or large, and they’ll all work.

Step 2: get the top user queries. Over at Data.StackExchange.com, users write their own queries to learn things about Stack’s data. I’ve taken a dozen highly-voted queries and turned them into stored procedures, all of which take a single integer parameter. This way I can call ’em with a single random number and get constantly varying queries.

Step 3: call them randomly with SQLQueryStress. The last stored proc in the above script is usp_RandomQ, which uses a random number generator to pick one of the stored procs and run it. I use a recompile hint on usp_RandomQ because I don’t want his metrics sticking around in the plan cache – the real workload is the various stored procs that he calls. Just set up SQLQueryStress to repeatedly call usp_RandomQ from, say, a dozen threads, and presto, your SQL Server will fall over.

I use these 3 in combination all the time in my sessions. Enjoy!

Previous Post
[Video] Office Hours 2016/07/27 (With Transcriptions)
Next Post
Dell DBA Days Prep: Country Songs About Databases

6 Comments. Leave new

  • Any suggestions for those of us running sql server on a laptop, in a VM? Maybe a 10 or 20GB db?

    Reply
  • Hey Brent,
    Have you ever thought about making some of your Dell Day tests available for everyone to run. Folks who were interested in participating could submit results for their particular server Then everyone could compare the results of different system configurations… This could be very interesting. If there was enough participation we might get to see some patterns in the results showing how some tests are more likely to be cpu bound or memory bound or io bound etc. Or course folks could just get an idea of how these test scale on various setups. I think that would be awesome … Except for the hassle of collecting results…

    Anyway, folks could submit OS Version, SQL Server Version, Server, Number of and Make/Model of Cpu(s), Number of cores allocated to SQL Server, Amount of Ram allocated to SQL Server (I am not sure how far down the rabbit whole of “details to collect” to go… do we ask for type and speed or Ram, and the mode (Optimizer, Mirror, etc)…. Ugh! Then some basic Storage configuration details,

    It might look like this:
    Windows Server 2012 R2 – SQL Server 2012 SP3 Cumulative Update #4
    Dell R720 -Two Intel E5-2670 v1, 16 cores Allocated, 128GB Ram Allocated,
    MDF – Four 1 TB 7200 Rpm SAS2 6Gb/s Drives in Raid 10 configuration
    LDF – Two 480 GB Intel S3500 SSDs SATA 6Gb/s in Raid 1 configuration
    tempdb – Two 240 GB Intel S3500 SSDs SATA 6Gb/s in Raid 1 configuration

    Then magically when all this data is in a database could graph it, chart it and have all kinds of fun…!

    Reply
    • Kris – yeah, it’s just a matter of how much labor we can put in for free. We totally love sharing our work with the public, but even just broadcasting all this (coming up with the sessions, instrumenting our tests, doing the webcasts) takes days of our time. Right now we kinda draw the line here – hopefully that provides you some value in free training anyway!

      Reply
  • I recently wrote a tool called ChaosLoad (https://github.com/gavdraper/ChaosLoad) this does something similar but allows you define the scripts you want to to run in JSON allowing for more configuration on how frequently each should run.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.