Indexing for Windowing Functions

Hooray Windowing Functions

They do stuff that used to be hard to do, or took weird self-joins or correlated sub-queries with triangular joins to accomplish. That’s when there’s a standalone inequality predicate, usually for getting a running total.

With Windowing Functions, a lot of the code complexity and inefficiency is taken out of the picture, but they still work better if you feed them some useful indexes.

What kind of index works best?

In general, what’s been termed a POC Index by Itzik Ben-Gan and documented to some extent here.

POC stands for Partition, Order, Covering. When you look at your code, you want to first index any columns you’re partitioning on, then any columns you’re ordering by, and then cover (with an INCLUDE) any other columns you’re calling in the query.

Note that this is the optimal indexing strategy for Windowing Functions, and not necessarily for the query as a whole. Supporting other operations may lead you to design indexes differently, and that’s fine.

Everyone loves a demo

Here’s a quick example with a little extra something extra for the indexing witches and warlocks out there. I’m using the Stack Exchange database, which you can find out how to make your favorite new test database here.

The query above is running on the Posts table which only has a Clustered Index on the Id column, that does us absolutely no good here. There are tons of access operations and logical reads. Taking a look at the plan doesn’t offer much:

I am a plan. Love me.
I am a plan. Love me.

Let’s try a POC index to fix this up. I’m keeping ViewCount in the key because we’re aggregating on it. You can sometimes get away with just using it as an INCLUDE column instead.

We can note with a tone of obvious and ominous foreshadowing that creating this index on the entire table takes about 15 seconds. Insert culturally appropriate scary sound effects here.

Here’s what the plan looks like running the query again:

I'm a Scorpio. I like Datsuns and Winston 100s.
I’m a Scorpio. I like Datsuns and Winston 100s.

That key lookup is annoying.

Not all key lookups are due to output columns. Some of them are predicates.
Not all key lookups are due to output columns. Some of them are predicates.

We did a good job of reducing a lot of the ickiness from before:

But we’re not happy. Why? Because we’re DBAs. Or developers. Or we just have to use computers, which are the worst things ever invented.

Behold the filtered index

 

Cool. This index only takes about three seconds to create. Marinate on that.

This query is so important and predictable that we can roll this out for it. How does it look now?

Well, but, WHY?
Well, but, WHY?

That key lookup is still there, and now 100% of the estimated magickal query dust cost. For those keeping track at home, this is the entirely new missing index SQL Server thinks will fix your relationship with your dad:

But we took a nice chunk out of the IO and knocked a little more off the CPU, again.

What can we do here?

Include!

 

Running the query one last time, we finally get rid of that stinky lookup:

Bully for you!
Bully for you!

And we’re still at the same place for IO:

What did we learn? Windowing functions are really powerful T-SQL tools, but you still need to be hip to indexing to get the most out of them.

Check out our free resources on Windowing Functions here.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Jeff Moden talks at length about triangular joins here (registration required).

Previous Post
Announcing the Dell DBA Days: August 25-28
Next Post
The Nine Circles of Developer Hell

6 Comments. Leave new

  • That last optimization looks like an optimizer bug. The INCLUDE ([PostTypeId], [Score]) should not be necessary because the predicate is satisfied by the index filter. Maybe worth trying with TF4199.

    Reply
  • Nice post…totally digging your writing style.

    Reply
  • Thomas Franz
    June 18, 2015 2:58 am

    Can you explain why it matters, if I append the ORDER BY to the SUM function (mathematical it’s nonsens (1 + 2 + 3 is equal to 3 + 2 + 1) but without it SQL Server makes some costly table spools)

    Reply
    • The SUM() function uses the default ROWS UNBOUNDED PRECEDING, which means that the results will actually change based on the ORDER BY clause. In this case, the query is creating a running total of view counts by creation date.

      Reply
  • Chris Baigent
    June 18, 2015 7:46 am

    Really interesting post. I do love some windowing functions knowledge!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.