Using our good old buddy the StackOverflow demo database, let’s look at the Users table. I want to get the first 10 users in the table ordered by Id, the primary clustered key, which is an identity field.
Here’s the actual execution plan:
Ooo – clustered index scan – that’s bad, right?
Not so fast. Click on the plan to view it in PasteThePlan.com, and hover your mouse over the Clustered Index Scan:
We didn’t scan the whole clustered index – just part of it.
And you’ve always heard seeks are great, right?
Let’s test that out too with a very similar query:
That query plan gets a clustered index seek. Perhaps we’ve been told that seeks are better than scans. So is this query less impactful than the first query? Absolutely not – this query takes a heck of a long time to run, and if you click on the plan and hover your mouse over that seek operator, you’ll see a lot more than 10 rows get returned.
The only difference between seeks and scans: where you start.
Seek means we know the starting point’s value. The starting value might happen to be the first row in the table, but that’s irrelevant. Seek means we know the value – in the case of our latter query, we knew the Id would be 1. Sure, as human beings, you and I know that we started our identity field with a value of 1 – but SQL Server doesn’t know that, nor does it know for sure that you never set identity insert on, and backloaded a bunch of data with negative identity fields.
Scan means we start at one end of the index, regardless of what value we find there.
Neither seek nor scan has anything to do with where we stop reading. We can do a scan that only reads a few 8KB pages, and we can do a seek that reads the entire table.
This has two big performance tuning implications. First, when you see a plan, you can’t just look at seek vs scan. Second, when you read the index usage DMVs, you can’t judge efficiency based solely on seek vs scan.