Somewhere along the way in your career, you were told that:
- Index seeks are quick, lightweight operations
- Table scans are ugly, slow operations
And ever since, you’ve kept an eye on your execution plans looking for those performance-intensive clustered index scans. When you see ’em, you go root ’em out, believing you’ve got a performance problem.
Thing is, … they lied to you. Seeks aren’t necessarily good, nor are scans necessarily bad. To straighten you out, we’re going to walk through a series of demos.
First up: a “bad” index scan.
Let’s start with this query – using the Stack Overflow 2013 (50GB) database:
1 2 3 4 5 |
SET STATISTICS IO ON; GO SELECT COUNT(*) FROM dbo.Users; GO |
The resulting execution plan has a clustered index scan – reading the whole table, 2.5M rows:
That first line, SET STATISTICS IO ON, turns on a set of messages that will show you the number of logical reads SQL Server performs when executing your query. Those are measured in 8KB pages. The output shows up in the Messages tab:
SQL Server had to read 44,530 8KB pages (the whole table) to count all of the rows.
Okay, so that’s a “bad” scan – because it reads all of the pages – although I could argue that it’s really more of a bad query. (Do you really need an accurate, up-to-the-second count of the number of rows? Probably not – you could probably cache that sucker in the app tier and use the same results for hours at a time. But I digress.)
But here’s a “good” index scan.
Take this query:
1 |
SELECT TOP 10 * FROM dbo.Users; |
We’re telling SQL Server, “Just give me the first 10 rows – I don’t care whether they’re the first ten, or the last ten, or just some random ten you pulled out of your back end.” The execution plan for this one is also a table scan:
See the teeny tiny arrows? That means only a limited number of rows came out – which sounds like a good thing, but here’s the real indicator that it was a good thing:
SQL Server only had to read 5 8KB pages – that’s 40KB for those of you with my math skills – and that’s hardly anything at all! Heck, that’s in the exact range as our next demo.
Here’s a “good” index seek.
1 |
SELECT * FROM dbo.Users WHERE Id = 26837; |
It’s a super-efficient operation, only reading 3 8KB pages – not that far off the “good” index scan we did a second ago!
That’s the kind of operation you think about when you see the word “seek” on a plan – but that’s only one kind of example. Try this one on for size:
Here’s a “bad” index seek.
As you mentally build an execution plan for this, you should know that the lowest Id in the StackOverflow.Users export is -1. They all go positive from there.
1 |
SELECT * FROM dbo.Users WHERE Id > -1000; |
That query reads back every single user in the table.
You would probably consider that a table scan, but that’s not what the plan shows:
It’s technically a “seek” because SQL Server is seeking to a particular value, and then reading out the rows from there. Logical reads shows more proof that it’s reading the entire table:
That’s a lot of pages. That’s not what you think of when I say “seek.”
Here’s what “seek” and “scan” really mean.
Seek means SQL Server knows what specific row it’s going to start at. That’s it.
- It might be the first row in the table
- It might be the last row in the table
- It might read out just one row
- It might read out all of the rows
- It might have been performed multiple times (hover your mouse over the operator in the plan and look at “number of executions” for an indication)
Scan means SQL Server is going to start at either end of the index (min or max) and start reading from there. That’s it.
- It might read out all the rows (which is what you usually expect, but…)
- It might only read out one row
An index seek operation isn’t necessarily good, nor is an index scan inherently bad.
To cast judgment, dig deeper.
Let’s take this query:
1 2 3 |
SELECT * FROM dbo.Users WHERE Id > -1000 AND Reputation < 0; |
Its execution plan does a clustered index seek – it’s going to jump to what happens to be the first row in the table (Id -1) and read through all of the rows in the entire table, looking for ones who have a reputation < 0:
So it’s a seek – but is it worthy of attention? Do I need to do something about that operator? To figure out, hover your mouse over the seek operator and check out the tooltip:
And look at:
- Number of Rows Read: 2,465,713 – meaning SQL Server had to examine this many rows in order to find the juicy rows that you actually wanted.
- Actual Number of Rows: 0 – meaning its search was fruitless.
When SQL Server has to examine a lot of data to find not a lot of results, that’s a slow way to access the data. You might be fine with that – this might be a query that never runs – but if you want it to run faster, this is a place you might want to focus your indexing improvements. It doesn’t matter whether it’s a seek or a scan – if you have to read a lot of noise to get a little bit of signal, that’s a problem.
9 Comments. Leave new
Well written sir! On a similar note, “A large number of logical reads isn’t always bad, and a small number of logical reads isn’t always good” http://www.sql-server-performance.com/logical-reads/
Mr. Ozar:
I realize your intent was to illustrate a really bad scan, but your readers might wish to know that for a COUNT operation SQL Server uses the clustered index only if there are no others. If there are nonclustered indexes, SQL Server will choose the smallest for the COUNT.
Dan – correct, that’s part of the reasons why I called it a bad scan. Thanks!
[…] Brent Ozar shows that index seeks are not always better than index scans: […]
Just had a sort-of-similar scenario when I was running a training session on identifying parameter sniffing issues using Query Store that I thought was worth sharing.
For the first execution of my query I passed a param that I knew would return 5 million rows. The optimizer created a plan that utilised a straight NC index SCAN. Pretty slow, but OK for the example.
I then recompiled the query and executed it with a different parameter that I knew would only return 20 rows. The result was a index SEEK, KEY LOOKUP, and NESTED-LOOP join. This was very fast for the 20-row execution, however performance was naturally TERRIBLE for the 5-million row execution due to the NESTED-LOOP join.
So scan’s aren’t always bad, and seeks aren’t always good!
In this query SELECT * FROM dbo.Users WHERE Id > -1000; why wouldn’t SQL server know from its statistics that this will bring the whole table back and therefore choose to do a scan instead (I realise these are the same thing based on the logical reads) but it would be clearer to the viewer of the execution plan whats going on surely?
Because SQL Server doesn’t choose plans based on how clear the visual method will be to the tuner. 😉
So we have Microsoft to blame 😉
And to be clear – remember that stats only update when ~20% of the data changes (and this number decreases with newer versions & trace flags. Plus, users can turn off stats updates.