Filtered Indexes: Just Add Includes

I found a quirky thing recently

While playing with filtered indexes, I noticed something odd. By ‘playing with’ I mean ‘calling them horrible names’ and ‘admiring the way other platforms implemented them‘.

I sort of wrote about a similar topic in discussing indexing for windowing functions. It turns out that a recent annoyance could also be solved by putting the column my filter expression is predicated on in the included columns definition. That’s the fanciest sentence I’ve ever written, BTW. If you want more, don’t get your hopes up.

Ready for Horrible

Let’s create our initial index. As usual, we’re using the Stack Overflow database. We’ll look at a small group of users who have a Reputation over 400k. I dunno, it’s a nice number. There are like 8 of them.

With that in place, we’ll run some queries that should make excellent use of our thoughtful and considerate index.

If you were a betting organism, which ones would you say use our index? Money on the table, folks! Step right up!

Yes, Sorta, No, No.
Yes, Sorta, No, No.

That didn’t go well at all. Only the first query really used it. The second query needed a key lookup to figure out the less than filter, and the last two not only ignored it, but told me I need to create an index. The nerve!

Send me your money

Let’s make our index better:

Run those queries again. You don’t even have to recompile them.

Can't you tell by the way I run every time you make eyes at me?
Can’t you tell by the way I run every time you make eyes at me?

They all magically found a way to use our New and Improved index.

What was the point?

When I first started caring about indexes, and filtering them, I would get so mad when these precious little Bloody Mary recipes didn’t get used.

I followed all the rules!
There were no errors!

But why oh why didn’t SQL use my filtered indexes for even smaller subsets of the filter condition? It seemed insane to me that SQL would know the filter for the index is on (x > y), but wouldn’t use them even if (z > x).

The solution was to put the filtered column in the include list. This lets SQL generate statistics on the column, and much like getting rid of the predicate key lookup, allows you to search within the filtered index subset for even more specific information.

Previous Post
How many servers can one person manage?
Next Post
We’re now a Microsoft Gold Partner.

27 Comments. Leave new

  • Great tip!

    Reply
  • Allan S. Hansen
    December 8, 2015 7:26 am

    Whoa – nice “hacky” information – makes them so much more useful.

    Reply
  • I’ve always thought filters acted oddly sometimes and my index was ignored when I thought it should have been used – I suspect I’ve been burned by this more times than I realize. Nice find, and something I’ll be on the lookout for going forward when I have an index that isn’t getting used where it should be!

    Reply
  • Jeff Zieleniuk
    December 8, 2015 11:29 am

    I was literally messing with filtered indexes this morning and noticed the same thing. I stepped away from it for a while, then this article showed up. Talk about great timing. Thanks for the info!

    Reply
  • I feel obligated to provide this Public Service Announcement when I see people discussing Filtered Indexes.
    We added a filtered index to a table that was going to seriously improve a query. And it did! Unfortunately, Sql Server won’t allow any modifications to a table with a filtered index when the object doing the DML (stored procedure) was compiled with QUOTED IDENTIFIER OFF.
    So, our SELECTS were great, but came at the expense of no INSERTS, UPDATES, or DELETES.

    That’s quirky.

    Reply
  • Thank you, thank you, thank you. Just want I needed, been starring at the same fat filtered index for 2 days with a long list of includes. It’s missing the filtered column!

    Reply
  • Thanks for the post.

    “This lets SQL generate statistics on the column…”

    Funny, I would have thought statistics would be generated when you created the index anyway, with or without the include.

    Reply
  • Now that’s a good catch. Thank you!

    Reply
  • Thanks for the explanation! This had me puzzled, but I had shrugged and moved on.

    I’d come across this when including the filtered column in the result set.

    We have a bunch of tables with an IsActive column and 90% of the queries specify IsActive = 1. I didn’t understand why the filtered index wasn’t being used when IsActive wasn’t in the result set or the included columns, but just shrugged and added it to the included column.

    Reply
  • I always include the filtered column in my indexes since I realized, that – even if I filter for equal – it needs to make key lookups for the filter column.

    Example:
    CREATE INDEX idx_test ON dbo.t_users (userid, name) WHERE deleted = 1;
    — key lookup for deleted (instead of a constant scan)
    SELECT userid, name, deleted
    FROM dbo.t_users t
    WHERE t.deleted = 1

    Reply
    • Actually Thomas, your index is not covering, thats why you need the key lookup.
      Secondly why do you need to pull the deleted column when you already know it from your predicate ? makes no sense to me.

      Reply
      • I know, that it is not covering without the deleted column. On the other hand it could replace it in the query optimizer (because I used an equal).

        Regarding the deleted flag: think of a grid control in the user interface, which shows the [deleted] as column / checkbox, even if the user filtered for it (or it is filtered by default). And manually replacing it with 1 in the select part of the query makes it harder to read / more error prone.

        Reply
  • Erik,

    Your original index contained *only* the DisplayName and Id columns for rows where Reputation > 400000. The index *did not contain* the value of Reputation in each row. This is the key point.

    As a consequence, evaluating a more restrictive predicate on Reputation necessarily involves a Key Lookup to fetch and test the actual value of Reputation associated with the filtered index entry.

    To put it another way: It is just not sufficient to know that Reputation > 400000 to say if Reputation > 500000 or not 🙂

    This is why Reputation needs to be in the filtered index explicitly – to avoid a lookup when the query predicate does not match the filter predicate exactly.

    The choice between a lookup and table/CI scan is the normal cost-based one, depending on the number of rows expected.

    Reply
  • Why won’t this work if I use the predicate column as the first column of the index itself?

    Reply
    • it would work – as long you not filter for e.g. Display Name.

      SELECT [u].[Id], [u].[DisplayName]
      FROM [dbo].[Users] AS [u]
      WHERE [u].[Reputation] > 400001
      AND DisplayName = ‘Miller’

      would lead to a pseudo seek (it makes an Index seek to find the first row, wich is > 400001) and than scan the whole remaining index for Millers.

      If you would put the Reputation as last column it would act equal to an include column, but the index is bigger, because the Reputation will be stored on the key pages + data pages, while with include it will be stored only in the data pages. This may be not a big deal for a int or bigint column, but consider e.g. the adress or some other big varchar columns

      Reply
      • It doesn’t make sense to me. I think there is nothing wrong with the pseudo seek.

        As far as I can tell, putting the predicate column as (last) key column makes no difference: it is not used, that is, in my case.

        I have a very large table with only two records satisfying a predicate. It’s an int column with two unique values. All other records have value 0.

        I have a unique filtered index on these two values of this form:
        WHERE ( predicatecolumn >= minimumpredicatevalue )

        When I use the following WHERE clause in my query,

        WHERE ( predicatecolumn >= minimumpredicatevalue )
        __AND ( predicatecolumn <= maximumpredicatevalue )

        the filtered index will _only_ be used if I add the predicatecolumn as included column.
        If I add it as a key column to the index, it is not used, not as last column and not even as first. In those case a Clustered Index Scan is used on the large table!

        I don't get that. I would expect the filtered index to be used in all those cases, because all necessary information is in it, no matter what way I add the predicate column.

        What is wrong with my reasoning?

        Reply
        • I’m not sure if I understand your example:

          – you have a intCol which is either 726 or 16907.
          – you have idx1 with WHERE intCol >= 726
          – you have idx2 with WHERE intCol >= 16907
          – you querying the table with where intCol >=726 and intCol <= 16907

          This would not make any sense to me (neither the filtered indexes – there you should use an intCol = x) nor the query itself.

          Reply
          • intCol = 0 for 10 millions of records
            intCol = 1 for 1 record
            intCol = 2 for 1 record

            I have a filtered index with
            WHERE ( predicatecolumn >= minimumpredicatevalue )

            I query on the intCol using
            WHERE ( predicatecolumn >= minimumpredicatevalue ) — 1
            __AND ( predicatecolumn <= maximumpredicatevalue ) — 2

            just to make sure no unwanted values are retrieved, should they ever have been entered.

            I don't see why I should create two separate filtered indexes, as you seem to suggest.

            I don't see why the _only_ way the filtered index would be used for this, is when the intCol is added as an Included column and not as first key column of the index itself.

          • I wonder that the filtered index would be used in any case, because the AND intCol = 1).

            Did you try to use only WHERE intCol between 1 and 2 (at least in the query but it should work in the index filter condition too)?

          • >> ”
            I wonder that the filtered index would be used in any case, because the AND intCol = 1).

            Did you try to use only WHERE intCol between 1 and 2 (at least in the query but it should work in the index filter condition too)?
            ” <<

            What do you mean by "AND intCol = 1)"?. I don't see me having written that.

            WHERE clause is all I used, so I do not see why it works _only_ if I include intCol as Include column and not as first key column in the filtered index.

          • replace the 1 by your minimumpredicatevalue and the 2 by your maximumpredicatevalue.

            But to be honest, I have no idea, why it should not work, if intCol is the first column in the (filtered) index instead of only an included column (except some internal optimizations)

  • A question regarding filtered indexes and UPDATE statements:

    Why have I to include the updated columns into the filtered index (otherwise it would perform a keylookup).

    Example:
    UPDATE tbl SET last_newsletter_date = GetDate() WHERE email is not null and subscripe = 1
    needs an index
    CREATE INDEX tbl_ix ON tbl (email) INCLUDE (subscripe, last_newsletter_date)
    WHERE email is not null and subscripe = 1

    Without last_newsletter_date it would use the index (if there are only few rows affected) + keylookup for the last_newsletter_date or makes a clustered index scan

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.