Ever heard someone say this? “I tuned a query, and I made it 50% faster!”
Whenever I hear that I wonder how the person measured performance, and what their test runs were like.
Measuring query performance can be tricky. Say you run a query and it takes sixty seconds to return data in SQL Server Management Studio. You make one change and you run it again, and it takes thirty seconds. Did you just improve the runtime by 50%? Or did the second query run faster because the data it needed was already in memory, and the first query had the disadvantage of having to read all the data from disk?
You won’t be able to show that you truly improved query performance unless you have a good tool to measure with. Luckily, the same tool helps you make sure you’re doing a fair comparison between the queries at the same time.
Two Simple Tools: STATISTICS IO and STATISTICS TIME
To start measuring your queries, run the following statements in SQL Server Management Studio:
SET STATISTICS IO ON; SET STATISTICS TIME ON; GO
Running these statements will cause SQL Server to print some performance information to the ‘Messages’ tab for your session window. This impacts only the session you run the statements in — this does not change a server level setting.
You can turn off these options at any time by running:
SET STATISTICS IO OFF; SET STATISTICS TIME OFF; GO
Turning on the STATISTICS IO option for your session causes SQL Server to track and print information about how SQL Server is interacting with your storage when you run queries.
The most useful numbers you will get are:
- Logical Reads: How many 8K pages were read to satisfy the query. This count includes pages that were read memory AND pages that were read from disk. (They may have been read from disk by either physical reads, or read-ahead reads).
- Physical Reads: How many 8K pages were read from disk, because they were not already in memory. These pages are left in memory after the query is done for as long as SQL Server can keep them there.
- Read-Ahead Reads: How many 8K pages were pre-fetched from disk for the query. These are physical reads, but they are optimized for performance. Learn more about read-ahead here.
How to Use STATISTICS IO
Use the STATISTICS IO setting for measuring significant changes. If you have thousands of reads and cut them in half, this helps you measure that.
Avoid using STATISTICS IO to measure fine tuned improvements— you typically don’t care about differences in a couple of pages read, unless those pages are read so frequently the represent a significant amount of CPU burn. (In that case, you probably want to think about whether there are other ways to solve the problem in your application environment outside of SQL Server, anyway.)
Don’t worry too much if physical reads + read-ahead reads is not always exactly equal to logical reads. This is sometimes the case, but SQL Server has optimizations that make the numbers not always add up.
Statistics IO also reports on “scan count” for your IO. I don’t find this number to be very useful for big-picture comparisons, so if you’re just starting out, don’t worry about it!
Good news, STATISTICS TIME is even easier than STATISTICS IO.
When you turn on the STATISTICS TIME setting, SQL Server will print messages explaining how much CPU time was used when executing the query. This is a great way to compare query performance because it’s a server-side metric. When you just look at the elapsed time in SQL Server Management Studio there are all sorts of other things that come into play, like how long it took to return the data to your client, and how long it took to stream the results into the UI.
You will notice that there are some slight variations in CPU time when you run a query multiple times. Again, we want to look at bigger numbers, and when you’re really getting into testing you probably want to average the number you get over multiple query runs.
How are Parallel Queries Measured by STATISTICS TIME?
When you look at CPU time executed for a query, you’re literally seeing the amount used across all CPUs involved. This means that if you’re looking at a parallel query, total CPU time may well be greater than the total elapsed time for the query.
This isn’t a problem— and it’s actually a good thing when it comes to comparing performance!
Measuring Performance Improvements: An Example
Let’s take a look at a quick example. I am running this example on SQL 2012 RTM, Developer Edition.
First, I use the ContosoRetailDW sample database and turn on my STATISTICS IO and STATISTICS TIME set options.
USE ContosoRetailDW; GO SET STATISTICS IO ON; SET STATISTICS TIME ON; GO
Next, I create a temporary covering index to use for the query I want to tune.
CREATE INDEX test_ix_FactOnlineSales_DateKey_Incl_SalesAmount ON dbo.FactOnlineSales (DateKey) INCLUDE (SalesAmount); GO
Here’s my problem query. I have a user who has been running this, and they’ve asked me if there’s a way to make it better.
SELECT MONTH(DateKey) AS [Month], SUM(SalesAmount) AS [Total Sales] FROM dbo.FactOnlineSales WHERE YEAR(DateKey) = 2007 GROUP BY MONTH(DateKey);
When you first run your query, you see this as the output:
SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 6 ms. (12 row(s) affected) Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘FactOnlineSales’. Scan count 3, logical reads 40866, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 3219 ms, elapsed time = 1670 ms.
Here are the key pieces of information we want to know:
First, I did 40,866 logical reads and burned 3,3219 milliseconds of CPU. I did not do any physical or read-ahead reads— my data was already in cache. (Creating the covering index did this for me!) Since my elapsed time is roughly half of my CPU time, I can tell that this query went parallel.
Knowing that all the data was in cache is important. If I am just doing logical reads, then comparing CPU time between multiple runs is fair.
I look at the query and focus in on this part as the easiest thing to change:
WHERE YEAR(DateKey) = 2007
By wrapping a function around the date and then comparing it, I’m forcing SQL Server to scan every row of my covering index, apply the function, and then compare it to the literal value I provided.
I decided to rewrite this one line in the query and try a new version. My revised query is:
SELECT MONTH(DateKey) AS [Month], SUM(SalesAmount) AS [Total Sales] FROM dbo.FactOnlineSales WHERE DateKey BETWEEN ‘2007-01-01’ AND ‘2007-12-31’ GROUP BY MONTH(DateKey);
Giving this a whirl, I get the results:
SQL Server parse and compile time: CPU time = 7 ms, elapsed time = 7 ms. (12 row(s) affected) Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘FactOnlineSales’. Scan count 3, logical reads 13032, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 1734 ms, elapsed time = 916 ms.
Once again, I did all logical reads— so I’m not comparing one query that had to do physical IO and/or readaheads with another query which did not have to do them.
Let’s compare these numbers:
- Run 1: 40,866 logical reads and 3,219 milliseconds of CPU.
- Run 2: 13,032 logical reads and 1,734 milliseconds of CPU
Voila! I can now show that with a one-line change, I reduced logical reads by more than 65% and cut CPU burn by 45% in my test run— and I have the numbers to prove it.
If you’d like the deep info on how CPU Time is measured, you’ll enjoy Bob Dorr’s posts on timers in SQL Server.