In our last episode, we were running into problems with these two queries:
1 2 3 4 5 6 7 8 9 |
SELECT LastAccessDate, Id, DisplayName, Age FROM dbo.Users WHERE LastAccessDate BETWEEN '2018-08-31' AND '2018-09-01' ORDER BY LastAccessDate; SELECT LastAccessDate, Id, DisplayName, Age FROM dbo.Users WHERE CAST(LastAccessDate AS DATE) = '2018-08-31' ORDER BY LastAccessDate; |
When SQL Server saw the function in the second query’s WHERE clause, it punted out with a 1-row estimate, which caused us problems. SQL Server did an index seek + key lookup where it wasn’t appropriate.
Well, if we’re not allowed to change the query, we could improve the situation by building a wider covering index. Right now we have this index:
1 |
CREATE INDEX IX_LastAccessDate_Id ON dbo.Users(LastAccessDate, Id); |
But let’s say we add either of these indexes:
1 2 3 4 5 6 7 |
CREATE INDEX IX_LastAccessDate_Id_Includes ON dbo.Users(LastAccessDate, Id) INCLUDE (DisplayName, Age); GO CREATE INDEX IX_LastAccessDate_Id_DisplayName_Age ON dbo.Users(LastAccessDate, Id, DisplayName, Age); GO |
The resulting leaf pages of the index will look like this:
Now, look, dear reader: I know you. You and I go way back. I know how you think, and your first reaction is going to be to scroll down to the comment section and start banging away loudly on your keyboard, debating the difference between putting columns in the include versus putting them in the keys of the index. Hold that thought. We’re going to get to that in the next post in the series, I swear on my standing desk. For now, just roll with this as-is.
Add either one of those indexes – seriously, I don’t care which one you pick, but just only pick one – and the query plans now look identical for ANY date range, even for the entire table:
Ah, the magic of covering indexes. Make an index wide enough, and the queries will beat a path to your door. Now, SQL Server doesn’t have to worry about the tipping point because this index works perfectly no matter how many rows we’re bringing back.
So the plans are the same, right?
Well, no. Come closer, dear reader, and click on the Messages tab. SQL Server is hiding a dirty secret from you:
The bottom query actually read more pages even though it returned the same number of rows! Why is that? Well, hover your mouse over each index seek operator and look at the number of rows read. Here’s the top query:
In the top one, where our dates are clearly defined, SQL Server was able to seek directly into the index at the right point, read the 24,380 matching rows, and then bail out. In the bottom one, however:
SQL Server was like me at the Chinese buffet: it either didn’t know where to start, or it didn’t know when to finish, because it put way more rows on its plate than it really needed to. (For the record, even when crab rangoons are bad, they’re still good.) We’ll cover the reason for that (a residual predicate) later in this series.
For now, don’t sweat 60-70 logical reads between friends. (Also, are you going to finish those rangoons?) Don’t get hung up on this index seek not being as good as it could possibly be, because it’s still way the hell better than the 73,225 logical reads we were getting in the last post, and better than the 7,405 reads that a table scan would require.
60 reads isn’t a big deal,
but here are 3 things that are.
Here are three things you SHOULD sweat:
- The estimated number of rows (1) is still completely inaccurate, which will wreak havoc on further operations in your query, like if you join to additional tables
- Even when you’re looking at an actual plan, the cost percentages shown are still estimates – SQL Server never goes back and recalculates “actual” query costs
- The estimated query costs are pretty much irrelevant to reality:
SQL Server’s saying the top query is 97% of the cost of the batch, and the bottom query is 3%. That’s flat out not true: the bottom query does MORE reads than the top query! The whole reason the bottom query shows a lowly 3% cost is all back to SQL Server’s inaccurate estimate that only 1 row would return.
If the bottom query’s function really did only produce 1 row, AND if SQL Server was smart enough to only read the rows it really needed to read (which isn’t the case with this function), then sure, the cost of the bottom query would be cheaper. But it’s not – so the costs, much like my insistence that I can quit crab rangoons whenever I want, is a lie.
Now if you’ll excuse me, I have an errand to run. When I’m done with that, we’ll talk about whether columns should go in an index’s key or in the includes.
3 Comments. Leave new
I have been reading your blog occasionally and using the First Responder Kit for years, but I just subscribed to your blog last week which I think I should have done this a long time ago. A pleasant regret.
YOU ROCK MAN.
Thank you.
Hahaha, you’re welcome!
So, first query is more effective? Thats greate!