Our QA department uses Mercury TestDirector to do load testing of our .NET web apps. The basic idea is that TestDirector runs through a set of URLs to load test an application, and it can gradually scale up the number of simultaneous sessions until the datacenter is filled with smoke. When the fire alarms go off, subtract one from the number of sessions, and that’s the maximum load the application can sustain. (Okay, not really, but you get the idea.)
I don’t know the internals of how TestDirector does its thing, but while the QA department does their load testing, I capture my own Perfmon statistics and gather a SQL trace of all queries with duration > 1500 milliseconds. Granted, the maximum application performance would be higher if I wasn’t running the trace, but I like being able to see behind the scenes to know what queries are causing the worst load.
In the trace, I noticed that different QA testers set up their tests differently. Most of the time, the users were randomized: all of the sessions were active simultaneously, but they weren’t all doing exactly the same thing at exactly the same time. While some users were logging in, other users would be printing a report, and other users would be doing data entry. At first, I thought that was a good way to do load testing because it gave a picture of what the user activity levels would look like in real life.
Later (and I suspect, accidentally) they set up a test so that each simultaneous user was doing exactly the same thing at exactly the same time. Ten users would hit the login page simultaneously, then they would all hit the menu page, then they would all do the same function at the same millisecond.
The load test results from that synchronized testing were dramatically slower in parts. The synchronized tests pointed out which parts of the app had concurrency problems, which ones caused high CPU loads on the web server, which ones hit the SQL Server hardest, etc.
When I analyzed the load test results, I looked at the following metrics from Perfmon side by side with my SQL trace results, both sorted by date/time order:
High CPU use on the web server – during periods of very high CPU use on the web front end, I looked at the queries in the trace. I noticed that a login stored procedure showed up with a low number of reads, low number of writes, and low CPU load, but the duration was very long, usually over 30 seconds. When I ran that same stored procedure from Management Studio, it always finished in under a second. I even tried running it over and over in several loops in different sessions, and it still always finished fast. I looked at the results of the stored proc, and it was returning 7,000 lines of XML. I deduced that the web front end was having difficulty parsing and processing that XML, and it wasn’t able to digest the results fast enough. When I pointed this out to the app team, they researched and discovered that the test user’s access wasn’t configured correctly.
Heavy blocking on the SQL Server – when SQL reported lots of locks, I correlated that to the queries that were running at the time. A stored procedure was doing “while” loops to find unused ID numbers in a table, and was doing the whole thing inside a transaction. I showed them how to rewrite the query with a single update statement, thereby avoiding the loops and transaction.
High CPU use on the SQL Server – for a brief time in the load test, SQL was maxing out CPU power. SQL CPU use is always my first concern, because if a program goes live with high CPU use, then the only way to fix it once it goes live is to throw very expensive hardware and licensing at it. I checked what queries were running at the time, and it turned out that a stored procedure was building dynamic SQL and not caching execution plans. Because the query was pretty complex, it was taking a lot of time to recompile. I explained how to write it in a way that the execution plans could be cached.
In summary: I didn’t notice these things when the tests were randomized, because when they weren’t happening in clusters, they disappeared into the background noise. When they ran simultaneously in large quantities, it was much easier to point out concurrency and scaling problems. Database administrators that work with a QA team should ask QA to run their tests both ways. (The QA folks seem to appreciate it, just knowing that someone is analyzing the results this closely.)