When you see “index seek” on an execution plan, that doesn’t mean SQL Server is jumping to exactly the row you’re looking for. It only means that SQL Server is seeking on the first column of the index.
This is especially misleading on indexes where the first column isn’t very selective.
To explain, I’ll take one of the big Stack Overflow databases and create this index on the Users table:
CREATE INDEX Reputation_Location_DisplayName
ON dbo.Users(Reputation, Location, DisplayName);
And then I’ll run this query:
WHERE Reputation = 1
AND DisplayName = N'Brent Ozar';
The actual execution plan shows an index seek:
But hover your mouse over the index seek to see the popup details:
Look at “Number of Rows Read” – we read 6,044,557 rows to produce exactly 0 rows. That’s not what you or I would really call an index seek – we read 2/3 of the entire table!
The problem is that the term “Index Seek” only refers to how we access the first column of the index. At the bottom of the screenshot, the term “Seek Predicates” indicates that we did indeed seek to Reputation = 1, but unfortunately, there are millions of users with that reputation.
A little higher up in the screenshot, the term “Predicate” indicates that we have a leftover predicate that we couldn’t seek into. I wish this was labeled as a scan predicate because we’re scanning all the rows that match our Seek Predicate – we’re scanning them all because they’re not in an order that helps our search.
Sure, technically DisplayName is in the index – but it’s not the second column in the index, so it doesn’t really matter whether it’s in the key or the includes.
When you see an Index Seek in a plan, before you jump to conclusions that it’s a good usage of the index, compare the Number of Rows Read versus the Actual Number of Rows. If your query is reading way more rows than it’s actually producing, you might be able to dramatically improve performance by tweaking the indexes.
This is a scenario I wouldn’t have thought of. Not for performance reasons but curiosity reasons i’m going to try to play with it and see if a scan would have been better in this case than the seek it used.
Wouldn’t it be nice to have an index type that did not depend on the order of the columns? Perhaps a new index structure could accomplish this efficiently.
Sure, columnstore indexes are completely different, and you can learn how those perform by trying a similar experiment with those.
Hi Brent, may be i got it wrong but you said display name is not the second column in the index – i think it is second column unless you wanted to say its not the first (key column) of the index. Sorry if I misinterpreted it, but i got your point of key position for the displayname column position
Reread the post again more carefully. DisplayName is indeed not the second column of the index. It’s the third.
For better seeks and statistics, my default behavior has been to specify the most selective column first on non-clustered compound indexes – but always look out for high row counts in the exec plan.
Hi Brent, I have read that the best way to order the index columns is to put the most selective column first. This is good for access.
But what about the index update (maintenance). What is the best way to order columns in the index in regard to their update probability? Example: name, and status. The name almost never change, but the status changes a lot.
What is the best order here:
1- status, name
2- name, status
That’s a great question, and we talk about it in my Fundamentals of Index Tuning and Mastering Index Tuning courses.
The database schema I inherited has many “DeletedOn” columns to enable soft-deleting of rows. I just felt a little flutter of horror/excitement in the pit of my stomach that there may be substantial improvements available, since many of our indexes on these tables use “DeletedOn” as the leading index column!
Wouldn’t a columnstore index in this situation also pull and compare millions of row ids? More efficient than pulling an entire row, but still much less efficient than even any index leading with displayName?
Why not download the database to test your hypothesis and find out? That’s why I use open source databases for all my work – so you can get answers quickly without waiting on me.
(That’s the polite way of saying you shouldn’t ask me to do your work for you, hee hee!)
Really nice one Brent, it will certainly make us review a lot of “urban legends”