“Surely this one will get a clustered index scan.”

I love building demos for our training classes because I’m constantly in a battle of the minds with SQL Server. I try to guess what he’s going to do next, and I love it when he surprises me.

I started by building a new parameter sniffing demo with the Stack Overflow database, and in this case using the 50GB 2013 size. I wanted to demo a search stored procedure looking for the first few words of question titles.

I created this index:

It’s filtered just for questions (PostTypeId = 1) mostly because I wanted attendees to be able to build this index to follow along. I originally started with an unfiltered index on Title alone, but it took minutes even on my faster machines, so Erik suggested filtering it just for questions. Brilliant – that’d come in handy later when I wanted to show the perils of autoparameterization and forced parameterization.

Then I checked for the most popular first word in questions:

The 20 most popular first words of Stack questions:

The 20 most popular first words of a Stack question

Perfect! “How” was way above and beyond other words, and it represented a pretty good chunk of the table.

I crafted a query that would sometimes use the Title index with a key lookup, and sometimes use a clustered index scan instead:

I highlighted both queries, hit F5, and prepared to bask in the awesomeness of two different query plans.

And then something kinda odd happened. I couldn’t get a clustered index scan:

I wish I could quit you

Both of the actual query plans do key lookups. No, it’s not plan reuse, either.

  • Clustered index scan: 4,196,893 logical reads
  • “How” query above: 3,973,708 (just on the Posts table alone)
  • “Why” query above: 426,492 (so sure, okay, that should probably use the index)

Even if I use an incredibly broad WHERE clause, including any question that starts with the most popular letter (H), SQL Server 2017 is in a very serious relationship with this index:

The index that can't say no

It still does key lookups, resulting in 6,159,424 logical reads.

<sigh>

It’s not just the filtered index, either: here are the actual plans with an unfiltered index on PostTypeId, Title, and the actual plans with an index on Title, PostTypeId. They all stubbornly insist on using the index, to their detriment.

I could (should?) probably turn this into a demo about how estimate accuracy is really important, but…I don’t wanna waste attendee time creating a 1GB index on a 37GB table to prove that point. Instead, here’s a blog post. You’re welcome.

Previous Post
Azure SQL DB is Slow: Do I Need to Buy More DTUs?
Next Post
[Video] Office Hours 2018/12/5 (With Transcriptions)

25 Comments. Leave new

  • Hi guys, is that because your SELECT includes the column Score, which is not part of the index? (in the Key Lookup the Output List reports [StackOverflow2013].[dbo].[Posts].Score)
    Thanks,
    Elisa

    Reply
  • Hrm … Could you try Include in the clustered index ?

    Reply
  • So it’s the bad estimation you blame? Or what’s the explanation for such behavior?

    Reply
  • Can you force SQL Server to use the clustered index as a scan by adding a hint, or will it still just use the clustered index as a lookup (and consider it “used” because it’s referenced)?

    If you can force it to use the index as a scan, and it performs better than the original plan, is SQL Server smart enough to start using that plan even if you remove the index hint?

    Reply
    • Brook – that’s the thing I love about doing my demos with the Stack Overflow database. You can download it and try these exact same experiments yourself. It’s a really fun way to learn more about the engine (especially without taking someone else’s word for something.) Go for it! It’s a hoot.

      Reply
  • I think it’s a bug in SQL. Last month I was looking at a stored procedure that was running slow. It hit a table with approximately 4-5 million rows and was doing an index seek while reporting that it estimated 4.5 million rows or so would be looked at. I forgot the exact term from SSMS. I’ve seen this with smaller tables as well. 6000 row table, index seek and it reports 5598 rows looked at.

    Performance wise it seems to act like an index scan, but it’s reported as a seek.

    Reply
    • Henry Stinson
      January 4, 2019 5:36 pm

      Actually, SQL Server (and Oracle) will ignore indexes on small tables (I’ve heard < 10,000 rows), because it's just as fast to scan the table. So an example with 6,000 rows doesn't apply. Apples and Orangutans.

      Reply
  • Hi Brent,

    Try adding more columns in your select Clause, I have vague memory of the same problem when I wanted to showcase key lookup vs index scan.

    Because right now you are only changing the depth of your query with different where clause, you need to change width as well.

    Reply
  • Maybe a silly question, but how up to date are your stats? Just asking as estimates being that far off point in that direction.

    Reply
  • I would check the definition of the clustering key to see what columns it’s covering for the non-clustered index.

    Reply
    • It’s just ID. (You can also download the Stack database yourself to check this – that’s part of the fun of open source.) I like your line of thinking though!

      Reply
  • Henry Stinson
    January 4, 2019 5:41 pm

    What version of SQL Server is Stack db using? (So we can test using same)

    Reply
  • I’ve noticed the same on SQL 2016. In 2005 it was sometimes a challenge to turn a clustered index scan into a seek. On 2016 I see index seeks with large percentages of the rows of a table. Still not sure exactly how it’s doing it.

    Reply
  • Henry Stinson
    January 7, 2019 11:06 am

    I do not have stack dB, so haven’t tried this, but what if exec plan cache was cleared and the queries run in opposite order? Sql server started, in some version — maybe 2012 or 2014 , converting adhoc queries to parameterized queries, so the optimizer just looks up the previous execution plan. I know in a stored process one can use option recompile to force a fresh execution plan. If data skew and the where-value warrant, a different query plan might be developed.

    Reply
    • Hentry – at this point, go ahead and hit the brakes on further comments until you’ve actually played with the Stack database. It’s completely free, and you can test out your assumptions there. That’s why I work so hard to do all this with completely open, reusable tools – you can learn so much by running your own experiments. Start your journey today!

      Reply
  • Henry Stinson
    January 7, 2019 11:07 am

    DBCC ClearCache ?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}