So you wanna run a load test against SQL Server.
One thing I need you to understand first: you have to provide the database and the queries. Almost all of the tools in this post, except the last one, are designed to help you run queries, but they don’t include the queries. The whole idea with load testing is that you’re trying to mimic your own workloads. If you’re just trying to test a server with generic workloads, start with my post, “How to Check Performance on a New SQL Server.”
Single-Query Load Tests
These tools let you write a query, and then run that query lots of times to stress load. Of course, this doesn’t really mimic production workloads, which will run lots of different queries concurrently. You can kinda mimic that with my technique to call random queries from a stored procedure, but even with that, we’re talking about a pretty low-tech technique here.
Microsoft RML Utilities and OStress (download – instructions) – very barebones command line tool that lets you run a query across a number of sessions, for a number of threads. No graphical user interface here – it’s very stripped-down and non-intuitive, but that’s also kind of the draw: it’s fast and simple.
(Open source) SQLQueryStress (download – instructions) – graphical utility originally written by Adam Machanic, and now maintained by Erik Ejlskov Jensen. I find this way easier to use, but it’s also slower than OStress. When you have a really well-tuned workload that returns hundreds or thousands of rows per query, it’s not unusual to hit ASYNC_NETWORK_IO waits with SQLQueryStress when it’s unable to keep up with digesting the results, something I don’t see as often with OStress.
Load Replay Tests
These tools capture a workload (either from production, or from a group of users clicking around in dev/QA), and then help you replay that same workload again and again, either on the same SQL Server or different servers.
(Open source) WorkloadTools (download and instructions) – Gianluca Sartori’s command line tools to collect a workload (either with tracing or XE), analyze the data, and replay the workload in real time. If I was a full time DBA in charge of SQL Server load testing at a company, this would be the one I’d start with.
Microsoft Profiler – (built in) – in theory, you can use Profiler to capture a trace of queries in production, and then replay that trace against another server. In practice, it doesn’t work well because it doesn’t accurately mimic the concurrency issues you hit in production. I don’t recommend this as a load testing solution.
Microsoft Distributed Replay (instructions) – this tool aimed to solve Profiler’s weaknesses by capturing a Profiler-style trace, but then replaying that trace in a coordinated fashion across multiple load test servers, all aimed at the same SQL Server. Before you go too far down this road, watch Gail Shaw’s Pluralsight course on it, and read the tales of woe in the Microsoft forums. The learning curve on this is pretty steep – think days and weeks of setup & experimentation time to get this working really well, not hours.
Microsoft Database Experimentation Assistant (download – instructions) – this builds atop Distributed Replay to let you run A/B tests with different server settings, indexes, even SQL Server versions, and then tells you which queries got better (and which ones got worse.) The drawback here is that you gotta get Distributed Replay working before you can progress to this point.
(Open source) HammerDB (download – instructions) – can be used either to run a standard synthetic workload (like a TPC benchmark), or your own custom queries. This one can test not only SQL Server, but also MySQL, Oracle, PostgreSQL, and more. I would only go down this road if I was a performance tuning specialist in a large company with lots of databases – this thing is really specialized.
Regardless of which tool you choose, remember to use the tips from my post, “How to Load Test a Database Application.”
[…] Brent Ozar shares a list of load testing tools for SQL Server: […]
How does WorkloadTools compare to DEA, assuming we already went through the initial pain (and it sure was a pain) of getting Distributed Replay running?
I never had a project use DEA successfully, so I couldn’t tell you, heh.
LOL! Hopefully I’ll be able to provide better feedback on our adventures in a month or two.
If it takes you a month or two, that’s your first clue. 😉
Hahahaha nah that’s due to other priorities, but still, yeah it takes a while. Although, frankly, working with IT teams for the whole snapshots/backups back-and-forth dance for proper comparisons takes even longer (in large organizations and with large databases).
If you had budget to spend on a paid workload testing tool, what route would you go?
I’d probably defer that to the QA team – it’s a better fit for them to test the workload from end to end (including the app servers and front end.)
They do that now, but not at any load. We get high batch rates at certain times of the the year. From 4K batches/sec to over 12K batches/per sec in the past year and likely to go higher this year. Be nice to have a way to prove what the DB can handle for provisioning
Gotcha – if it’s just a curiosity thing, then I go with free tools. If it’s a “business needs the answer” thing, then usually the business wants to know that same thing across the entire stack – thus the deferral to the QA team to build something like that. (The database team shouldn’t be the ones driving load tests for the front end of the app.)
Think it’s more of a…I know they aren’t going to build it in QA…so how do I keep fingers pointed away from my DBs during the next busy time. so yeah, likely free tools for my own proving ground if I can’t encourage end to end load testing. Thx for the input
We have a loading application for stress testing and want to make sure that we are applying a “like production on a bad day” load to our test server. Which value tells us that we applied a similar load to production?
1. Transactions / Second
2. # Active Sessions
3. Batch Requests / Second
1. Load-runner application that runs Production – like activity on against our test system. Configurable number of users can be launched simultaneously.
2. The test system has a Performance Monitor data collector looking at CPU, Memory, Disk Queueing, and Batch Requests/Second. To show how the server performed.
We know on a “bad day” our values in Production are:
1. Transactions / Second = 12,560
2. # Active Sessions = 18
3. Batch Requests / Second = 87
And what we saw after our load runner test of 250 simultaneous users was:
1. Transactions / Second = 5,723
2. # Active Sessions = 9
3. Batch Requests / Second = 10
Which tells me that our % of Production load for each looks like:
1. Transactions / Second = 45.6%
2. # Active Sessions = 50%
3. Batch Requests / Second = 11.5%
QUESTION: Do we need to DOUBLE our number of load runner simultaneous users or do need to multiply it BY 10? to ensure we are applying a “bad day” test load to our test system??
Thanks Laura Prowicz
Laura – this is kinda beyond what I can solve in a blog post comment, but it’s exactly the kind of work I do as part of my consulting.
Yeah, after I left this message I read your article about how its impossible to replicate a Production load due to the variety of activities and the data that exists or doesn’t exist. Unfortunately we are in a mad dash to migrate the production databases to this new server so I suppose we will do our best to bring it to its knees and figure out if that means it can or can’t handle the traffic. Thanks for your response!
Have you ever try Jmeter for workload test ? Think that this application could also be a good alternative. For example it is possibility to fake some kind of business logic by adding PreProcessing scripts, loop/if controllers.
See also: https://freshfork.github.io/dbAnimator