Then Why Doesn’t SQL Always Seek?

Scan The Man

There seems to be a perpetual battle waged against the Index Scan.

At some point it was declared that scans were inferior to seeks, and all energy should be dedicated to eradicating them.

Much like asking why the whole plane isn’t made out of the stuff the black box is made out of, you start to wonder why the Index Scan is even an operator.

This is so dumb.

Obviously, if Microsoft cared about performance, there would be no Index Scans.

Right?

Right?

I bet Postgres would never scan an index.

Being Practical

It should be pretty easy to prove Seek Supremacy once and for all. Then you can literally (LITERALLY!) ignore every other operator in a query plan and focus the entirety of your being on this unknown astral plane toward mocking those without the mental capacity to receive your wisdom.

About Index Scans.

Let’s start with a query.

It has no predicates. It’s just joining two tables together in total.

In the query plan, we have two scans. It’s a two scan plan.

HOUR BACK GET IT?

For the Seekists out there; fear not, your moment of ascension is at hand.

This plan must surely be superior.

Prepare to be humbled

Not All Who Scan Are Lost

In this case, the Two Scan Plan does a bit better.

Not off to a good start.

The source of all those extra reads, and likely the extra CPU time for the seek plan is…

Well, it’s in the seek.

Quackin’ crazy!

You can see why the optimizer chose the scan in the first place.

Hang on though, maybe we picked the wrong table to force a seek on.

This must be the better choi-

I need to sleep more.

Important Information

The reason we get this error here is because we don’t have an index on the Posts table with OwnerUserId as a key column.

This is where most Earth Peoples start to get annoyed — they usually do have an index. In fact, if consulting has taught me anything, you have 17 of the same index with _dta_ somewhere in the name on the data you wish to seek into.

Alright then. Let’s add an index.

If we re-run our query without hints, what happens?

Darn Merges and Scans

We still get two scans!

Comparing all three plans (we can force a seek on Posts now, because we have an index on OwnerUserId), the seeks still aren’t doing so hot.

Hrmpf.

Now what?

Hopefully you learned that not every seek is great, and not every scan is awful.

If you’re joining two tables together, often a single scan of the data is the wisest choice.

The worst part of a plan may not always be obvious, and it may not always be the method that the optimizer chooses to access data with.

Thanks for reading!

Previous Post
New SQL Server Management Studio 17.5: It’s Classified
Next Post
[Video] Office Hours 2018/2/14 (With Transcriptions)

13 Comments. Leave new

  • “I bet Postgres would never scan an index.” ISWYDT, nice.

    Reply
  • I am going to assume the Postgres would never scan an index line is specifically directed at me.

    Reply
  • Very nice.

    Reply
  • I don’t think it’s cases like this where people fight scans, it’s cases where your expected results are only a portion of the total rows of the table, yet the optimizer seems to decide to scan everything anyway. I was just looking at a query last week where it did a scan on a 40 million row table to return 20 thousand rows. LOL

    Reply
    • Chris — sure! That’s a case I’d probably wanna take a look at too. But it’s a far cry from the “let no scan go unpunished” mentality out there.

      Out of curiosity, what was the root cause on that?

      Thanks!

      Reply
      • Chris Harshman
        February 16, 2018 1:22 pm

        In that particular case, I believe it was a problem where the developer was trying to do too many joins at once.
        I ended up suggesting they break it down a bit, trying to separate the “filtering logic” types of joins from the “presentation logic” types of joins.

        Reply
  • Seeks read pages (8KB), I thought I read scans read extents (64KB).
    At the hardware level, the costs are almost the same.
    So a query that reads more than 1/8 the pages is more efficient as a scan.

    Also every seek needs to traverse the B-tree (more IO) while scans only
    look in the B-tree for the first and last items in the scan. Usually the B-tree
    nodes are cached in memory, but the query optimizer has a hard time
    costing that factor.

    Reply
  • Can anyone post a link to some explanation of exactly how a seek/scan actually works, for a mere mortal….

    Reply
  • Btw. You can make the plane out of the same stuff as the black box, which is actually red.
    As long as you don’t want the ability to fit engines and load fuel and passengers ?

    Reply
  • […] Well the plan shows seeks, but as Erik Darling recently pointed out, seeks aren’t always a good thing. […]

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.