Scan The Man
There seems to be a perpetual battle waged against the Index Scan.
At some point it was declared that scans were inferior to seeks, and all energy should be dedicated to eradicating them.
Much like asking why the whole plane isn’t made out of the stuff the black box is made out of, you start to wonder why the Index Scan is even an operator.

Obviously, if Microsoft cared about performance, there would be no Index Scans.
Right?
Right?
I bet Postgres would never scan an index.
Being Practical
It should be pretty easy to prove Seek Supremacy once and for all. Then you can literally (LITERALLY!) ignore every other operator in a query plan and focus the entirety of your being on this unknown astral plane toward mocking those without the mental capacity to receive your wisdom.
About Index Scans.
Let’s start with a query.
1 2 3 4 |
SELECT COUNT(*) FROM dbo.Users AS u JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id; |
It has no predicates. It’s just joining two tables together in total.
In the query plan, we have two scans. It’s a two scan plan.

For the Seekists out there; fear not, your moment of ascension is at hand.
1 2 3 4 |
SELECT COUNT(*) FROM dbo.Users AS u WITH ( FORCESEEK ) JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id; |
This plan must surely be superior.

Not All Who Scan Are Lost
In this case, the Two Scan Plan does a bit better.

The source of all those extra reads, and likely the extra CPU time for the seek plan is…
Well, it’s in the seek.

You can see why the optimizer chose the scan in the first place.
Hang on though, maybe we picked the wrong table to force a seek on.
1 2 3 4 |
SELECT COUNT(*) FROM dbo.Users AS u JOIN dbo.Posts AS p WITH ( FORCESEEK ) ON p.OwnerUserId = u.Id; |
This must be the better choi-

Important Information
The reason we get this error here is because we don’t have an index on the Posts table with OwnerUserId as a key column.
This is where most Earth Peoples start to get annoyed — they usually do have an index. In fact, if consulting has taught me anything, you have 17 of the same index with _dta_ somewhere in the name on the data you wish to seek into.
Alright then. Let’s add an index.
1 2 |
CREATE INDEX ix_maybe ON dbo.Posts ( OwnerUserId ); |
If we re-run our query without hints, what happens?

We still get two scans!
Comparing all three plans (we can force a seek on Posts now, because we have an index on OwnerUserId), the seeks still aren’t doing so hot.

Now what?
Hopefully you learned that not every seek is great, and not every scan is awful.
If you’re joining two tables together, often a single scan of the data is the wisest choice.
The worst part of a plan may not always be obvious, and it may not always be the method that the optimizer chooses to access data with.
Thanks for reading!
13 Comments. Leave new
“I bet Postgres would never scan an index.” ISWYDT, nice.
Yeah, if we’re being honest, I wrote this whole post as a vehicle for that.
I am going to assume the Postgres would never scan an index line is specifically directed at me.
No, it’s directed at my favorite martian 🙂
Very nice.
I don’t think it’s cases like this where people fight scans, it’s cases where your expected results are only a portion of the total rows of the table, yet the optimizer seems to decide to scan everything anyway. I was just looking at a query last week where it did a scan on a 40 million row table to return 20 thousand rows. LOL
Chris — sure! That’s a case I’d probably wanna take a look at too. But it’s a far cry from the “let no scan go unpunished” mentality out there.
Out of curiosity, what was the root cause on that?
Thanks!
In that particular case, I believe it was a problem where the developer was trying to do too many joins at once.
I ended up suggesting they break it down a bit, trying to separate the “filtering logic” types of joins from the “presentation logic” types of joins.
Good call. I typically find it helps to separate relational data from informational data.
Seeks read pages (8KB), I thought I read scans read extents (64KB).
At the hardware level, the costs are almost the same.
So a query that reads more than 1/8 the pages is more efficient as a scan.
Also every seek needs to traverse the B-tree (more IO) while scans only
look in the B-tree for the first and last items in the scan. Usually the B-tree
nodes are cached in memory, but the query optimizer has a hard time
costing that factor.
Can anyone post a link to some explanation of exactly how a seek/scan actually works, for a mere mortal….
Btw. You can make the plane out of the same stuff as the black box, which is actually red.
As long as you don’t want the ability to fit engines and load fuel and passengers ?
[…] Well the plan shows seeks, but as Erik Darling recently pointed out, seeks aren’t always a good thing. […]