Life Is Messy
Demo queries have this nasty habit of being clean. Even using a pit of despair like Adventure Works or World Wide Importers, it’s easy to craft demo queries that fit the scenario you need to make yourself look like a genius. Stack Overflow, in all its simplicity, makes this even easier (lucky me!) because there’s nothing all that woogy or wonky to dance around.
While working with a client recently — yes, Brent lets me talk to paying customers — we found a rather tough situation. They were using Windowing functions over one group of columns to partition and order by, but the where clause was touching a totally different group of columns.
The query plan wasn’t happy.
Users weren’t happy.
I was still dizzy from being on a boat.
If you’ve been reading the blog for a while, you may remember this post from about two years ago. Over there, we talked about a POC index, a term popularized by Itzik Ben-Gan.
But how does that work when your query has other needs?
Let’s meet our query!
DECLARE @Score INT, @rn INT SELECT @Score = p.Score, @rn = ROW_NUMBER() OVER ( PARTITION BY p.OwnerUserId ORDER BY p.Score DESC, p.CreationDate DESC ) FROM dbo.Posts AS p WHERE p.PostTypeId = 1 AND p.CommunityOwnedDate IS NULL AND p.LastActivityDate >= '2016-01-01'
We have a Windowing function that partitions and orders by three columns, and a where clause that uses three other columns. If we stick a POC index on the Posts table that prioritizes performance of the Windowing function, what happens? I’m going to put the three where clause columns in the include list to avoid troubleshooting key lookups later.
CREATE UNIQUE NONCLUSTERED INDEX ix_helper ON dbo.Posts (OwnerUserId, Score DESC, CreationDate DESC, Id) INCLUDE (PostTypeId, CommunityOwnedDate, LastActivityDate)
Now when I run the query, here’s my plan with — you guessed it! A missing index request.
The missing index request is for nearly the EXACT OPPOSITE INDEX we just added. Oh boy.
/* Missing Index Details from SQLQuery12.sql - NADAULTRA\SQL2016E.StackOverflow (sa (67)) The Query Processor estimates that implementing the following index could improve the query cost by 96.8793%. */ /* USE [StackOverflow] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[Posts] ([CommunityOwnedDate],[PostTypeId],[LastActivityDate]) INCLUDE ([CreationDate],[OwnerUserId],[Score]) GO */
96.8%! I must be a bad DBA. I made a backwards index. I hope someone automates this soon.
Okay, so, let’s create an index close in spirit to our original index. Just, y’know, backwards.
CREATE UNIQUE NONCLUSTERED INDEX ix_helper2 ON dbo.Posts (CommunityOwnedDate, PostTypeId, LastActivityDate, Id) INCLUDE (CreationDate, OwnerUserId, Score)
When we re-run our query, what happens?
Let’s pause here for a minute. Stuff like this can seem witchcrafty when it’s glossed over in a blog post.
The index I created is awesome for the Windowing function, and the index that SQL registered as missing was awesome for the where clause.
When I have both indexes, SQL chooses the where-clause-awesome-index because it judges the query will be cheaper to deal with when it can easily seek and filter out rows from the key of the nonclustered index, and then pass only those rows along to the Windowing function.
Now, it can still do this with the Windowing-function-awesome-index, because the where clause columns are included, just not as efficiently as when they’re key columns.
The trade-off here is a Sort operation to partition and order by for the Windowing function, but SQL says that will still be far cheaper to sort a bunch of data
If you’re query tuning with a small amount of data, you’ll take a look at these query costs, stick with the where clause awesome index, and go get extra drunk for doing a wicked good job.
Here they are back to back.
What happens when we include more data?
Going back a year further, to 2015, the costs are close to even. The Sortless plan costs about 159 query bucks, and the Sorted plan costs about 124 query bucks.
Going back to 2013, the Sortless plan now costs 181 query bucks, the Sorted plan costs 243 query bucks, and the Sort spills to disk.
So what’s the point?
Missing index requests don’t always have your long term health in mind when they pop up. Some may; others may just be a shot and a beer to get your query past a hangover.
If I go back and run the ‘2013’ query with only the original index on there (the one that helps the Windowing function), there’s still a missing index request, but with a lower value (75% rather than 98%). Part of this is due to how costs are estimated and where SQL expects the sort to happen (disk vs memory).
In our case, the Sort was a bit of a time bomb. At first, it didn’t matter. As we included more data, it got worse. This is the kind of challenge that a lot of developers face as their app goes from a couple hundred clients to a couple thousand clients, and exactly the kind of thing our Critical Care helps with.
Thanks for reading!
Brent says: this isn’t just about missing index hints in query plans, either: it’s also a great example of why you have to be a little bit careful with the missing index DMV recommendations, too. sp_BlitzIndex would report this index as missing, and you won’t know which queries are asking for it (or whether they’ve gotten better or worse.) Every now and then, you’ll add a missing index and performance will actually get worse – so you’ve also gotta be looking at your top resource-intensive queries via sp_BlitzCache. In this example, after you’ve added Clippy’s index, the now-slower query would show up in sp_BlitzCache with no missing index hints, and you’d need to know how to hand-craft your own.