“Surely this one will get a clustered index scan.”

Last Updated December 7, 2018

I love building demos for our training classes because I’m constantly in a battle of the minds with SQL Server. I try to guess what he’s going to do next, and I love it when he surprises me.

I started by building a new parameter sniffing demo with the Stack Overflow database, and in this case using the 50GB 2013 size. I wanted to demo a search stored procedure looking for the first few words of question titles.

I created this index:

CREATE INDEX IX_Title_Filtered ON dbo.Posts(Title) WHERE PostTypeId = 1;
GO

1 2	CREATE INDEX IX_Title_Filtered ON dbo.Posts(Title) WHERE PostTypeId = 1; GO

It’s filtered just for questions (PostTypeId = 1) mostly because I wanted attendees to be able to build this index to follow along. I originally started with an unfiltered index on Title alone, but it took minutes even on my faster machines, so Erik suggested filtering it just for questions. Brilliant – that’d come in handy later when I wanted to show the perils of autoparameterization and forced parameterization.

Then I checked for the most popular first word in questions:

SELECT TOP 100 
    SUBSTRING(p.Title, 1, CHARINDEX(' ', p.Title)) AS FirstWord,
    COUNT(DISTINCT p.Id) AS Questions 
  FROM dbo.Posts p
  WHERE p.PostTypeId = 1
    AND CHARINDEX(' ', p.Title) > 0
  GROUP BY SUBSTRING(p.Title, 1, CHARINDEX(' ', p.Title))
  ORDER BY COUNT(DISTINCT p.Id) DESC;
GO

SELECT TOP 100

SUBSTRING(p.Title, 1, CHARINDEX(' ', p.Title)) AS FirstWord,

COUNT(DISTINCT p.Id) AS Questions

FROM dbo.Posts p

WHERE p.PostTypeId = 1

AND CHARINDEX(' ', p.Title) > 0

GROUP BY SUBSTRING(p.Title, 1, CHARINDEX(' ', p.Title))

ORDER BY COUNT(DISTINCT p.Id) DESC;

The 20 most popular first words of Stack questions:

The 20 most popular first words of a Stack question

Perfect! “How” was way above and beyond other words, and it represented a pretty good chunk of the table.

I crafted a query that would sometimes use the Title index with a key lookup, and sometimes use a clustered index scan instead:

/* This should be a clustered index scan: */
SELECT Title, Id, Score
  FROM dbo.Posts
  WHERE PostTypeId = 1
    AND Title LIKE 'How %'
  ORDER BY Score DESC;

/* This should be an index seek plus key lookup: */
SELECT Title, Id, Score
  FROM dbo.Posts
  WHERE PostTypeId = 1
    AND Title LIKE 'Why %'
  ORDER BY Score DESC;

/* This should be a clustered index scan: */

SELECT Title, Id, Score

FROM dbo.Posts

WHERE PostTypeId = 1

AND Title LIKE 'How %'

ORDER BY Score DESC;

/* This should be an index seek plus key lookup: */

SELECT Title, Id, Score

FROM dbo.Posts

WHERE PostTypeId = 1

AND Title LIKE 'Why %'

ORDER BY Score DESC;

I highlighted both queries, hit F5, and prepared to bask in the awesomeness of two different query plans.

And then something kinda odd happened. I couldn’t get a clustered index scan:

Both of the actual query plans do key lookups. No, it’s not plan reuse, either.

Clustered index scan: 4,196,893 logical reads
“How” query above: 3,973,708 (just on the Posts table alone)
“Why” query above: 426,492 (so sure, okay, that should probably use the index)

Even if I use an incredibly broad WHERE clause, including any question that starts with the most popular letter (H), SQL Server 2017 is in a very serious relationship with this index:

It still does key lookups, resulting in 6,159,424 logical reads.

<sigh>

It’s not just the filtered index, either: here are the actual plans with an unfiltered index on PostTypeId, Title, and the actual plans with an index on Title, PostTypeId. They all stubbornly insist on using the index, to their detriment.

I could (should?) probably turn this into a demo about how estimate accuracy is really important, but…I don’t wanna waste attendee time creating a 1GB index on a 37GB table to prove that point. Instead, here’s a blog post. You’re welcome.

Azure SQL DB is Slow: Do I Need to Buy More DTUs?

[Video] Office Hours 2018/12/5 (With Transcriptions)

25 Comments. Leave new

Elisa
December 7, 2018 9:15 am

Hi guys, is that because your SELECT includes the column Score, which is not part of the index? (in the Key Lookup the Output List reports [StackOverflow2013].[dbo].[Posts].Score)
Thanks,
Elisa

Reply
- Brent Ozar
  December 7, 2018 9:20 am
  
  That’s the point, I’m trying to get SQL Server to switch over to a clustered index scan (rather than a bunch of expensive key lookups to get Score.)
  
  Reply
Scott
December 7, 2018 9:53 am

Hrm … Could you try Include in the clustered index ?

Reply
- Brent Ozar
  December 7, 2018 11:36 am
  
  Scott – I’m not sure what you mean – can you elaborate?
  
  Reply
- Elisa
  December 7, 2018 3:03 pm
  
  I Think Scott means using the INCLUDE clause in the IX_Title_Filtered index creation in order to include the Score column
  
  Reply
  - Brent Ozar
    December 7, 2018 3:57 pm
    
    Ah – no, I was trying to demo when SQL Server would flip over to a clustered index scan. (Often when you’re writing training, you’re not trying to fix problems – you’re trying to SHOW them, hahaha.)
    
    Reply
Jiri
December 7, 2018 10:48 am

So it’s the bad estimation you blame? Or what’s the explanation for such behavior?

Reply
- Brent Ozar
  December 7, 2018 11:37 am
  
  Jiri – like I wrote in the last paragraph, I would love to go into more details with more demos, but sometimes I gotta stop work at the end of the day. 😉
  
  Reply
Brooke
December 7, 2018 11:36 am

Can you force SQL Server to use the clustered index as a scan by adding a hint, or will it still just use the clustered index as a lookup (and consider it “used” because it’s referenced)?

If you can force it to use the index as a scan, and it performs better than the original plan, is SQL Server smart enough to start using that plan even if you remove the index hint?

Reply
- Brent Ozar
  December 7, 2018 11:37 am
  
  Brook – that’s the thing I love about doing my demos with the Stack Overflow database. You can download it and try these exact same experiments yourself. It’s a really fun way to learn more about the engine (especially without taking someone else’s word for something.) Go for it! It’s a hoot.
  
  Reply
  - Brooke
    December 7, 2018 11:40 am
    
    Ha – it’s Friday and my lazy is showing!
    
    Reply
Alen
December 10, 2018 10:10 am

I think it’s a bug in SQL. Last month I was looking at a stored procedure that was running slow. It hit a table with approximately 4-5 million rows and was doing an index seek while reporting that it estimated 4.5 million rows or so would be looked at. I forgot the exact term from SSMS. I’ve seen this with smaller tables as well. 6000 row table, index seek and it reports 5598 rows looked at.

Performance wise it seems to act like an index scan, but it’s reported as a seek.

Reply
- Henry Stinson
  January 4, 2019 5:36 pm
  
  Actually, SQL Server (and Oracle) will ignore indexes on small tables (I’ve heard < 10,000 rows), because it's just as fast to scan the table. So an example with 6,000 rows doesn't apply. Apples and Orangutans.
  
  Reply
  - Brent Ozar
    January 6, 2019 5:59 am
    
    Henry – that’s simply not even remotely correct, and a quick experiment will disprove that.
    
    Reply
Zikato
December 12, 2018 8:47 am

Hi Brent,

Try adding more columns in your select Clause, I have vague memory of the same problem when I wanted to showcase key lookup vs index scan.

Because right now you are only changing the depth of your query with different where clause, you need to change width as well.

Reply
Rick
January 4, 2019 4:26 am

Maybe a silly question, but how up to date are your stats? Just asking as estimates being that far off point in that direction.

Reply
- Brent Ozar
  January 4, 2019 6:08 am
  
  Nah, they’re up to date with full scan, and the Stack Overflow database export doesn’t change all that often. 😉
  
  Reply
Eric Russell
January 4, 2019 7:16 am

I would check the definition of the clustering key to see what columns it’s covering for the non-clustered index.

Reply
- Brent Ozar
  January 4, 2019 7:17 am
  
  It’s just ID. (You can also download the Stack database yourself to check this – that’s part of the fun of open source.) I like your line of thinking though!
  
  Reply
Henry Stinson
January 4, 2019 5:41 pm

What version of SQL Server is Stack db using? (So we can test using same)

Reply
- Brent Ozar
  January 4, 2019 5:43 pm
  
  Harry – as I wrote in the post, SQL Server 2017.
  
  Reply
Alen
January 7, 2019 10:32 am

I’ve noticed the same on SQL 2016. In 2005 it was sometimes a challenge to turn a clustered index scan into a seek. On 2016 I see index seeks with large percentages of the rows of a table. Still not sure exactly how it’s doing it.

Reply
Henry Stinson
January 7, 2019 11:06 am

I do not have stack dB, so haven’t tried this, but what if exec plan cache was cleared and the queries run in opposite order? Sql server started, in some version — maybe 2012 or 2014 , converting adhoc queries to parameterized queries, so the optimizer just looks up the previous execution plan. I know in a stored process one can use option recompile to force a fresh execution plan. If data skew and the where-value warrant, a different query plan might be developed.

Reply
- Brent Ozar
  January 7, 2019 12:07 pm
  
  Hentry – at this point, go ahead and hit the brakes on further comments until you’ve actually played with the Stack database. It’s completely free, and you can test out your assumptions there. That’s why I work so hard to do all this with completely open, reusable tools – you can learn so much by running your own experiments. Start your journey today!
  
  Reply
Henry Stinson
January 7, 2019 11:07 am

DBCC ClearCache ?

Reply