I love building demos for our training classes because I’m constantly in a battle of the minds with SQL Server. I try to guess what he’s going to do next, and I love it when he surprises me.
I started by building a new parameter sniffing demo with the Stack Overflow database, and in this case using the 50GB 2013 size. I wanted to demo a search stored procedure looking for the first few words of question titles.
I created this index:
1 2 |
CREATE INDEX IX_Title_Filtered ON dbo.Posts(Title) WHERE PostTypeId = 1; GO |
It’s filtered just for questions (PostTypeId = 1) mostly because I wanted attendees to be able to build this index to follow along. I originally started with an unfiltered index on Title alone, but it took minutes even on my faster machines, so Erik suggested filtering it just for questions. Brilliant – that’d come in handy later when I wanted to show the perils of autoparameterization and forced parameterization.
Then I checked for the most popular first word in questions:
1 2 3 4 5 6 7 8 9 |
SELECT TOP 100 SUBSTRING(p.Title, 1, CHARINDEX(' ', p.Title)) AS FirstWord, COUNT(DISTINCT p.Id) AS Questions FROM dbo.Posts p WHERE p.PostTypeId = 1 AND CHARINDEX(' ', p.Title) > 0 GROUP BY SUBSTRING(p.Title, 1, CHARINDEX(' ', p.Title)) ORDER BY COUNT(DISTINCT p.Id) DESC; GO |
The 20 most popular first words of Stack questions:
Perfect! “How” was way above and beyond other words, and it represented a pretty good chunk of the table.
I crafted a query that would sometimes use the Title index with a key lookup, and sometimes use a clustered index scan instead:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
/* This should be a clustered index scan: */ SELECT Title, Id, Score FROM dbo.Posts WHERE PostTypeId = 1 AND Title LIKE 'How %' ORDER BY Score DESC; /* This should be an index seek plus key lookup: */ SELECT Title, Id, Score FROM dbo.Posts WHERE PostTypeId = 1 AND Title LIKE 'Why %' ORDER BY Score DESC; |
I highlighted both queries, hit F5, and prepared to bask in the awesomeness of two different query plans.
And then something kinda odd happened. I couldn’t get a clustered index scan:
Both of the actual query plans do key lookups. No, it’s not plan reuse, either.
- Clustered index scan: 4,196,893 logical reads
- “How” query above: 3,973,708 (just on the Posts table alone)
- “Why” query above: 426,492 (so sure, okay, that should probably use the index)
Even if I use an incredibly broad WHERE clause, including any question that starts with the most popular letter (H), SQL Server 2017 is in a very serious relationship with this index:
It still does key lookups, resulting in 6,159,424 logical reads.
<sigh>
It’s not just the filtered index, either: here are the actual plans with an unfiltered index on PostTypeId, Title, and the actual plans with an index on Title, PostTypeId. They all stubbornly insist on using the index, to their detriment.
I could (should?) probably turn this into a demo about how estimate accuracy is really important, but…I don’t wanna waste attendee time creating a 1GB index on a 37GB table to prove that point. Instead, here’s a blog post. You’re welcome.
25 Comments. Leave new
Hi guys, is that because your SELECT includes the column Score, which is not part of the index? (in the Key Lookup the Output List reports [StackOverflow2013].[dbo].[Posts].Score)
Thanks,
Elisa
That’s the point, I’m trying to get SQL Server to switch over to a clustered index scan (rather than a bunch of expensive key lookups to get Score.)
Hrm … Could you try Include in the clustered index ?
Scott – I’m not sure what you mean – can you elaborate?
I Think Scott means using the INCLUDE clause in the IX_Title_Filtered index creation in order to include the Score column
Ah – no, I was trying to demo when SQL Server would flip over to a clustered index scan. (Often when you’re writing training, you’re not trying to fix problems – you’re trying to SHOW them, hahaha.)
So it’s the bad estimation you blame? Or what’s the explanation for such behavior?
Jiri – like I wrote in the last paragraph, I would love to go into more details with more demos, but sometimes I gotta stop work at the end of the day. 😉
Can you force SQL Server to use the clustered index as a scan by adding a hint, or will it still just use the clustered index as a lookup (and consider it “used” because it’s referenced)?
If you can force it to use the index as a scan, and it performs better than the original plan, is SQL Server smart enough to start using that plan even if you remove the index hint?
Brook – that’s the thing I love about doing my demos with the Stack Overflow database. You can download it and try these exact same experiments yourself. It’s a really fun way to learn more about the engine (especially without taking someone else’s word for something.) Go for it! It’s a hoot.
Ha – it’s Friday and my lazy is showing!
I think it’s a bug in SQL. Last month I was looking at a stored procedure that was running slow. It hit a table with approximately 4-5 million rows and was doing an index seek while reporting that it estimated 4.5 million rows or so would be looked at. I forgot the exact term from SSMS. I’ve seen this with smaller tables as well. 6000 row table, index seek and it reports 5598 rows looked at.
Performance wise it seems to act like an index scan, but it’s reported as a seek.
Actually, SQL Server (and Oracle) will ignore indexes on small tables (I’ve heard < 10,000 rows), because it's just as fast to scan the table. So an example with 6,000 rows doesn't apply. Apples and Orangutans.
Henry – that’s simply not even remotely correct, and a quick experiment will disprove that.
Hi Brent,
Try adding more columns in your select Clause, I have vague memory of the same problem when I wanted to showcase key lookup vs index scan.
Because right now you are only changing the depth of your query with different where clause, you need to change width as well.
Maybe a silly question, but how up to date are your stats? Just asking as estimates being that far off point in that direction.
Nah, they’re up to date with full scan, and the Stack Overflow database export doesn’t change all that often. 😉
I would check the definition of the clustering key to see what columns it’s covering for the non-clustered index.
It’s just ID. (You can also download the Stack database yourself to check this – that’s part of the fun of open source.) I like your line of thinking though!
What version of SQL Server is Stack db using? (So we can test using same)
Harry – as I wrote in the post, SQL Server 2017.
I’ve noticed the same on SQL 2016. In 2005 it was sometimes a challenge to turn a clustered index scan into a seek. On 2016 I see index seeks with large percentages of the rows of a table. Still not sure exactly how it’s doing it.
I do not have stack dB, so haven’t tried this, but what if exec plan cache was cleared and the queries run in opposite order? Sql server started, in some version — maybe 2012 or 2014 , converting adhoc queries to parameterized queries, so the optimizer just looks up the previous execution plan. I know in a stored process one can use option recompile to force a fresh execution plan. If data skew and the where-value warrant, a different query plan might be developed.
Hentry – at this point, go ahead and hit the brakes on further comments until you’ve actually played with the Stack database. It’s completely free, and you can test out your assumptions there. That’s why I work so hard to do all this with completely open, reusable tools – you can learn so much by running your own experiments. Start your journey today!
DBCC ClearCache ?