SQL Server load testing: synchronized tests versus random ones

Our QA department uses Mercury TestDirector to do load testing of our .NET web apps.  The basic idea is that TestDirector runs through a set of URLs to load test an application, and it can gradually scale up the number of simultaneous sessions until the datacenter is filled with smoke.  When the fire alarms go off, subtract one from the number of sessions, and that’s the maximum load the application can sustain.  (Okay, not really, but you get the idea.)

I don’t know the internals of how TestDirector does its thing, but while the QA department does their load testing, I capture my own Perfmon statistics and gather a SQL trace of all queries with duration > 1500 milliseconds.  Granted, the maximum application performance would be higher if I wasn’t running the trace, but I like being able to see behind the scenes to know what queries are causing the worst load.

In the trace, I noticed that different QA testers set up their tests differently.  Most of the time, the users were randomized: all of the sessions were active simultaneously, but they weren’t all doing exactly the same thing at exactly the same time.  While some users were logging in, other users would be printing a report, and other users would be doing data entry.  At first, I thought that was a good way to do load testing because it gave a picture of what the user activity levels would look like in real life.

Later (and I suspect, accidentally) they set up a test so that each simultaneous user was doing exactly the same thing at exactly the same time.  Ten users would hit the login page simultaneously, then they would all hit the menu page, then they would all do the same function at the same millisecond.

The load test results from that synchronized testing were dramatically slower in parts.  The synchronized tests pointed out which parts of the app had concurrency problems, which ones caused high CPU loads on the web server, which ones hit the SQL Server hardest, etc.

When I analyzed the load test results, I looked at the following metrics from Perfmon side by side with my SQL trace results, both sorted by date/time order:

High CPU use on the web server – during periods of very high CPU use on the web front end, I looked at the queries in the trace.  I noticed that a login stored procedure showed up with a low number of reads, low number of writes, and low CPU load, but the duration was very long, usually over 30 seconds.  When I ran that same stored procedure from Management Studio, it always finished in under a second.  I even tried running it over and over in several loops in different sessions, and it still always finished fast.  I looked at the results of the stored proc, and it was returning 7,000 lines of XML.  I deduced that the web front end was having difficulty parsing and processing that XML, and it wasn’t able to digest the results fast enough.  When I pointed this out to the app team, they researched and discovered that the test user’s access wasn’t configured correctly.

Heavy blocking on the SQL Server – when SQL reported lots of locks, I correlated that to the queries that were running at the time.  A stored procedure was doing “while” loops to find unused ID numbers in a table, and was doing the whole thing inside a transaction.  I showed them how to rewrite the query with a single update statement, thereby avoiding the loops and transaction.

High CPU use on the SQL Server – for a brief time in the load test, SQL was maxing out CPU power.  SQL CPU use is always my first concern, because if a program goes live with high CPU use, then the only way to fix it once it goes live is to throw very expensive hardware and licensing at it.  I checked what queries were running at the time, and it turned out that a stored procedure was building dynamic SQL and not caching execution plans.  Because the query was pretty complex, it was taking a lot of time to recompile.  I explained how to write it in a way that the execution plans could be cached.

In summary: I didn’t notice these things when the tests were randomized, because when they weren’t happening in clusters, they disappeared into the background noise.  When they ran simultaneously in large quantities, it was much easier to point out concurrency and scaling problems.  Database administrators that work with a QA team should ask QA to run their tests both ways.  (The QA folks seem to appreciate it, just knowing that someone is analyzing the results this closely.)

Previous Post
SQL Server Backup Best Practices
Next Post
Sunday Buffet at The Lady & Sons

7 Comments. Leave new

  • >> I deduced that the web front end was having difficulty parsing and processing
    >> that XML, and it wasn’t able to digest the results fast enough. When I pointed
    >> this out to the app team, they researched and discovered that the test user’s
    >> access wasn’t configured correctly.

    So, I’m having a similar problem where I have a stored procedure that always runs in a few seconds or less in Management Studio but takes like half an hour to run from my .NET program. What I’m wondering is when you say the “test user’s access wasn’t configured properly”, does that mean you application’s user or the actual user that is used at the database connection level to connect to the database? If it’s the latter, is there any more details you could give about what was mis-configured for the user?

    Reply
  • Ah, that problem pops up a lot. In my example about the test user’s access, that referred to the application’s configuration, not the database, but that’s not the issue you’re probably looking at.

    There’s two things you want to check: first, does the query involve parameters? If so, SQL may be compiling an execution plan for one set of parameters and the execution plan just doesn’t work as well when it’s called with a different set of parameters.

    Second, you want to see that both the SSMS query and your .NET query are being executed the same way. Run a trace in both cases, and what you’ll sometimes find is that the .NET program is running the query via sp_executesql or preparing the SQL first. If that’s the case, they’re probably ending up with different execution plans.

    The key in both cases is the execution plan – the way that the engine is allocating resources and running the query. So how do you fix it? One way is to put the code in a stored procedure, which ensures the same execution plan is always used. Another way is to diagnose the .NET code and get it to send the query the best way, like either using or not using prepared SQL.

    Reply
  • i have a performance isssue
    we load the data through using dataintegrator tool from oracle database to sql server
    the load perviously used to take 6 hrs to delete the tables and insert the data from oracle server to sql server table.now its takin 8-9 hrs to process almost the similar load
    i have checked for fragmention,update statistics every week.no blocking.but there are around 56000 locks which would clear up in some time
    Please advice me how to increase the SQL server performance and decrease the load time

    Reply
    • Nikki – there’s a lot of possibilities here. The best way to get help on something like this is to narrow down the question as tightly as possible, then post it on http://StackOverflow.com or http://ServerFault.com. Include as much example code as you can so that others can quickly run it and get an idea of what you’re asking, and they might even post your code back with fixes. If you don’t get good answers within a day or two, let me know and I’ll take a look at the question to see if there’s a way we can get better answers.

      Reply
  • Brent,Thank you for the immediate response

    Reply
  • The stored procedure sp_blitz solved the performance error .Thank you !

    Reply
  • I had a stored procedure where I was querying datetime fields and it was taking more than 100 secs. But when I run that query in management studio (not executing the stored procedure but the statements inside it) it was taking less than 2 secs. Finally I found out that SQL Server had a “parameter sniffing” problem. I had to declare the same type variables in the statement that I used in stored proc params. The part “High CPU use on the web server” reminded me that thing. It took my 1 day to find out why the stored proc was running so slow 🙂 Good article..

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}