How to Load Test a Database Application

Development
9 Comments
Caution
I stay in the nicest hotels

It sounds really simple:

  1. Capture a bunch of production queries
  2. Replay those same queries over in a load test environment

So what’s the big deal? Why is it so hard to find good guidance on how to pull this off? How hard can it be?

Problem 1: many queries aren’t repeatable.

When you try to replay this query:

The rows have to actually be there in the table, or else your delete isn’t doing any work. That means you can’t replay that same query several times in a row. You’re going to have to restore the database after each load test, starting it back at the same point.

Deletes aren’t the only commands with this problem, either: for any query with a WHERE clause, we need the data to exist in the state we want:

In both cases, the rows for JobID 12345 have to already exist, so for a successful load test, our setup steps at the beginning of the post need to look more like:

  1. Take a backup of the production database(s)
  2. Capture a bunch of production queries
  3. Restore that same backup to the load test environment
  4. Replay those same queries over in a load test environment

(I can already hear the security crowd groaning, saying, “No no no, you’re never allowed to restore production data somewhere else.” Put a sock in it: we’re already talking about capturing queries in production, and as you know, those have personally identifiable data already. Read through til the end of the post before you get the flamethrower out.)

Problem 2: amplification introduces false blocking.

Managers think that to simulate more load, they can just take the production queries and replay them multiple times, simultaneously, from the replay tool. We’ve already talked about how you can’t reliably replay deletes, but even inserts and updates cause a problem.

Say we’re load testing Stack Overflow queries, and our app does this:

If try to simulate more load by running that exact same query from 100 different sessions simultaneously, we’re just going to end up with lock contention on that particular user’s row. We’ll be troubleshooting a blocking problem, not the problem we really have when 100 different users run that same query.

We really need to simulate activity across lots of DIFFERENT users, so you end up writing code to randomize parts of the load test. Now our load test routine looks like:

  1. Take a backup of the production database(s)
  2. Capture a bunch of production queries
  3. Analyze the production queries to see which ones need to be randomized, and write that code (or require it as part of the development process)
  4. Restore that same backup to the load test environment
  5. Replay those same queries over in a load test environment using a tool that can randomize the parameter calls

It’s hard work, but it’ll pay off because we can reuse our load test over time, right? Well, wait..

Problem 3: production is running yesterday’s code.

Load testing is (theoretically) useful to know that our new code changes are going to perform well when we deploy ’em. However, if we’re capturing production queries…they’re not running the new code yet, nor does the production database have our new columns and tables.

There goes your week
There goes your week

So now our testing routine looks like:

  1. Take a backup of the production database(s)
  2. Capture a bunch of production queries
  3. Analyze the production queries to see which ones need to be randomized, and write that code (or require it as part of the development process)
  4. Restore that same backup to the load test environment
  5. Apply the deployment scripts for our new code
  6. Replay those same queries a mix of old code and our new production code over in a load test environment using a tool that can randomize the parameter calls

That’s why it’s so hard to load test databases.

Everybody starts down the analysis path, gets about an hour into the planning, and realizes it’s still a big challenge in 2019. When you take all of these problems together, plus the risks of using production data in development, the capture-and-replay technique just flat out doesn’t make financial sense for application load tests.

But I was careful there to say application load tests.

Capture-and-replay load tests can still make sense when you want to test new hardware, different compatibility levels, or new indexing strategies. Just make sure to restore the database to the right point in time, and don’t try to amplify your workloads lest you introduce blocking problems.

Previous Post
How Should We Show Statistics Histograms in sp_BlitzIndex?
Next Post
SQL Server Perfmon counters that are still interesting in 2019

9 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.