
It sounds really simple:
- Capture a bunch of production queries
- Replay those same queries over in a load test environment
So what’s the big deal? Why is it so hard to find good guidance on how to pull this off? How hard can it be?
Problem 1: many queries aren’t repeatable.
When you try to replay this query:
1 2 |
DELETE ProcessingQueue WHERE JobID = 12345; |
The rows have to actually be there in the table, or else your delete isn’t doing any work. That means you can’t replay that same query several times in a row. You’re going to have to restore the database after each load test, starting it back at the same point.
Deletes aren’t the only commands with this problem, either: for any query with a WHERE clause, we need the data to exist in the state we want:
1 2 3 |
SELECT * FROM ProcessingQueue WHERE IsProcessed = 0 AND JobID = 12345; |
In both cases, the rows for JobID 12345 have to already exist, so for a successful load test, our setup steps at the beginning of the post need to look more like:
- Take a backup of the production database(s)
- Capture a bunch of production queries
- Restore that same backup to the load test environment
- Replay those same queries over in a load test environment
(I can already hear the security crowd groaning, saying, “No no no, you’re never allowed to restore production data somewhere else.” Put a sock in it: we’re already talking about capturing queries in production, and as you know, those have personally identifiable data already. Read through til the end of the post before you get the flamethrower out.)
Problem 2: amplification introduces false blocking.
Managers think that to simulate more load, they can just take the production queries and replay them multiple times, simultaneously, from the replay tool. We’ve already talked about how you can’t reliably replay deletes, but even inserts and updates cause a problem.
Say we’re load testing Stack Overflow queries, and our app does this:
1 2 3 |
UPDATE dbo.Users SET Reputation = Reputation + 1 WHERE Id = 22656; |
If try to simulate more load by running that exact same query from 100 different sessions simultaneously, we’re just going to end up with lock contention on that particular user’s row. We’ll be troubleshooting a blocking problem, not the problem we really have when 100 different users run that same query.
We really need to simulate activity across lots of DIFFERENT users, so you end up writing code to randomize parts of the load test. Now our load test routine looks like:
- Take a backup of the production database(s)
- Capture a bunch of production queries
- Analyze the production queries to see which ones need to be randomized, and write that code (or require it as part of the development process)
- Restore that same backup to the load test environment
- Replay those same queries over in a load test environment using a tool that can randomize the parameter calls
It’s hard work, but it’ll pay off because we can reuse our load test over time, right? Well, wait..
Problem 3: production is running yesterday’s code.
Load testing is (theoretically) useful to know that our new code changes are going to perform well when we deploy ’em. However, if we’re capturing production queries…they’re not running the new code yet, nor does the production database have our new columns and tables.

So now our testing routine looks like:
- Take a backup of the production database(s)
- Capture a bunch of production queries
- Analyze the production queries to see which ones need to be randomized, and write that code (or require it as part of the development process)
- Restore that same backup to the load test environment
- Apply the deployment scripts for our new code
- Replay
those same queriesa mix of old code and our new production code over in a load test environment using a tool that can randomize the parameter calls
That’s why it’s so hard to load test databases.
Everybody starts down the analysis path, gets about an hour into the planning, and realizes it’s still a big challenge in 2019. When you take all of these problems together, plus the risks of using production data in development, the capture-and-replay technique just flat out doesn’t make financial sense for application load tests.
But I was careful there to say application load tests.
Capture-and-replay load tests can still make sense when you want to test new hardware, different compatibility levels, or new indexing strategies. Just make sure to restore the database to the right point in time, and don’t try to amplify your workloads lest you introduce blocking problems.
9 Comments. Leave new
I thought SQL had a tool to do this, Distributed Replay. Does Microsoft make any more tools for this scenario? Essentially, we are trying to replicate a blocking scenario on dev environment.
Don – unfortunately, the limitations described in the post exactly match Distributed Replay.
Or maybe using DEA https://docs.microsoft.com/en-us/sql/dea/database-experimentation-assistant-overview?view=sql-server-2017
Fernando – yep, same limitations as Distributed Replay, and the same problem described in the post. Not a solution either, sadly.
That’s why we automate the application and capture the database activity for performance analysis. Randomization is built in.
Exactly – if you can guess what users are going to do, in what quantities, then you can reliably produce load and repeat it.
Although if you can precisely guess what users are going to do, well, that’s another blog post altogether!
That’s like Marty McFly and Biff with the Sports Almanac going back in time! 😉
[…] Brent Ozar shares some of the trouble you might run into when database load testing: […]
There is one more difficulty not mentioned and in my opinion, it is the most difficult to reproduce: concurrency and contention. Consider a production environment that has four application servers all hitting the same database. You can capture the queries that these servers send to SQL, but can you replay these queries to simulate the same concurrency and contention that exists between the four application servers in production? Unless you can find a way to do this, you will hear “It worked in the load test.”