Filtered Indexes: Just Add Includes

Last Updated 6 years ago

I found a quirky thing recently

While playing with filtered indexes, I noticed something odd. By ‘playing with’ I mean ‘calling them horrible names’ and ‘admiring the way other platforms implemented them‘.

I sort of wrote about a similar topic in discussing indexing for windowing functions. It turns out that a recent annoyance could also be solved by putting the column my filter expression is predicated on in the included columns definition. That’s the fanciest sentence I’ve ever written, BTW. If you want more, don’t get your hopes up.

Ready for Horrible

Let’s create our initial index. As usual, we’re using the Stack Overflow database. We’ll look at a small group of users who have a Reputation over 400k. I dunno, it’s a nice number. There are like 8 of them.

CREATE UNIQUE NONCLUSTERED INDEX Users_400k_Club
 ON dbo.Users ( DisplayName, Id )
 WHERE Reputation > 400000;

CREATE UNIQUE NONCLUSTERED INDEX Users_400k_Club

ON dbo.Users ( DisplayName, Id )

WHERE Reputation > 400000;

With that in place, we’ll run some queries that should make excellent use of our thoughtful and considerate index.

--Will I Nill I?
SELECT u.Id, u.DisplayName
FROM   dbo.Users AS u
WHERE  u.Reputation > 400000;

SELECT u.Id, u.DisplayName
FROM   dbo.Users AS u
WHERE  u.Reputation > 400000
AND    u.Reputation < 450000; 

SELECT u.Id, u.DisplayName 
FROM dbo.Users AS u 
WHERE u.Reputation > 400001;

SELECT u.Id, u.DisplayName
FROM   dbo.Users AS u
WHERE  u.Reputation > 500000;

--Will I Nill I?

SELECT u.Id, u.DisplayName

FROM dbo.Users AS u

WHERE u.Reputation > 400000;

SELECT u.Id, u.DisplayName

FROM dbo.Users AS u

WHERE u.Reputation > 400000

AND u.Reputation < 450000;

SELECT u.Id, u.DisplayName

FROM dbo.Users AS u

WHERE u.Reputation > 400001;

SELECT u.Id, u.DisplayName

FROM dbo.Users AS u

WHERE u.Reputation > 500000;

If you were a betting organism, which ones would you say use our index? Money on the table, folks! Step right up!

That didn’t go well at all. Only the first query really used it. The second query needed a key lookup to figure out the less than filter, and the last two not only ignored it, but told me I need to create an index. The nerve!

Send me your money

Let’s make our index better:

CREATE UNIQUE NONCLUSTERED INDEX Users_400k_Club
 ON dbo.Users ( DisplayName, Id )
 INCLUDE ( Reputation )
 WHERE Reputation > 400000
 WITH ( DROP_EXISTING = ON );

CREATE UNIQUE NONCLUSTERED INDEX Users_400k_Club

ON dbo.Users ( DisplayName, Id )

INCLUDE ( Reputation )

WHERE Reputation > 400000

WITH ( DROP_EXISTING = ON );

Run those queries again. You don’t even have to recompile them.

Can't you tell by the way I run every time you make eyes at me? — Can’t you tell by the way I run every time you make eyes at me?

They all magically found a way to use our New and Improved index.

What was the point?

When I first started caring about indexes, and filtering them, I would get so mad when these precious little Bloody Mary recipes didn’t get used.

I followed all the rules!
There were no errors!

But why oh why didn’t SQL use my filtered indexes for even smaller subsets of the filter condition? It seemed insane to me that SQL would know the filter for the index is on (x > y), but wouldn’t use them even if (z > x).

The solution was to put the filtered column in the include list. This lets SQL generate statistics on the column, and much like getting rid of the predicate key lookup, allows you to search within the filtered index subset for even more specific information.

How many servers can one person manage?

We’re now a Microsoft Gold Partner.

27 Comments. Leave new

Koen Verbeeck
December 8, 2015 7:20 am

Great tip!

Reply
Allan S. Hansen
December 8, 2015 7:26 am

Whoa – nice “hacky” information – makes them so much more useful.

Reply
Ryan McCauley
December 8, 2015 11:18 am

I’ve always thought filters acted oddly sometimes and my index was ignored when I thought it should have been used – I suspect I’ve been burned by this more times than I realize. Nice find, and something I’ll be on the lookout for going forward when I have an index that isn’t getting used where it should be!

Reply
Jeff Zieleniuk
December 8, 2015 11:29 am

I was literally messing with filtered indexes this morning and noticed the same thing. I stepped away from it for a while, then this article showed up. Talk about great timing. Thanks for the info!

Reply
Paul Barbin
December 8, 2015 11:56 am

I feel obligated to provide this Public Service Announcement when I see people discussing Filtered Indexes.
We added a filtered index to a table that was going to seriously improve a query. And it did! Unfortunately, Sql Server won’t allow any modifications to a table with a filtered index when the object doing the DML (stored procedure) was compiled with QUOTED IDENTIFIER OFF.
So, our SELECTS were great, but came at the expense of no INSERTS, UPDATES, or DELETES.

That’s quirky.

Reply
- Joseph Hayes
  December 8, 2015 2:34 pm
  
  Excellent. Thanks 🙂
  
  Reply
- Joseph Hayes
  December 8, 2015 2:36 pm
  
  That comment went to the wrong place. I no computer good.
  What I meant to say to you, Donger, is Wow, that is a whole new level of quirky?
  
  Reply
  - Joseph Hayes
    December 8, 2015 2:43 pm
    
    My God. Why can I not edit these comments?
    
    Reply
donger
December 8, 2015 12:03 pm

Thank you, thank you, thank you. Just want I needed, been starring at the same fat filtered index for 2 days with a long list of includes. It’s missing the filtered column!

Reply
Hakim
December 8, 2015 12:21 pm

Thanks for the post.

“This lets SQL generate statistics on the column…”

Funny, I would have thought statistics would be generated when you created the index anyway, with or without the include.

Reply
- Erik Darling
  December 8, 2015 2:45 pm
  
  Create, maybe. Use? Well…
  
  Reply
Rebecca
December 8, 2015 1:20 pm

Now that’s a good catch. Thank you!

Reply
Toby Harman
December 8, 2015 6:27 pm

Thanks for the explanation! This had me puzzled, but I had shrugged and moved on.

I’d come across this when including the filtered column in the result set.

We have a bunch of tables with an IsActive column and 90% of the queries specify IsActive = 1. I didn’t understand why the filtered index wasn’t being used when IsActive wasn’t in the result set or the included columns, but just shrugged and added it to the included column.

Reply
Thomas Franz
December 9, 2015 2:20 am

I always include the filtered column in my indexes since I realized, that – even if I filter for equal – it needs to make key lookups for the filter column.

Example:
CREATE INDEX idx_test ON dbo.t_users (userid, name) WHERE deleted = 1;
— key lookup for deleted (instead of a constant scan)
SELECT userid, name, deleted
FROM dbo.t_users t
WHERE t.deleted = 1

Reply
- Paw Jershauge
  December 10, 2015 7:16 am
  
  Actually Thomas, your index is not covering, thats why you need the key lookup.
  Secondly why do you need to pull the deleted column when you already know it from your predicate ? makes no sense to me.
  
  Reply
  - Thomas Franz
    December 10, 2015 6:02 pm
    
    I know, that it is not covering without the deleted column. On the other hand it could replace it in the query optimizer (because I used an equal).
    
    Regarding the deleted flag: think of a grid control in the user interface, which shows the [deleted] as column / checkbox, even if the user filtered for it (or it is filtered by default). And manually replacing it with 1 in the select part of the query makes it harder to read / more error prone.
    
    Reply
Paul White
January 20, 2016 2:26 am

Erik,

Your original index contained *only* the DisplayName and Id columns for rows where Reputation > 400000. The index *did not contain* the value of Reputation in each row. This is the key point.

As a consequence, evaluating a more restrictive predicate on Reputation necessarily involves a Key Lookup to fetch and test the actual value of Reputation associated with the filtered index entry.

To put it another way: It is just not sufficient to know that Reputation > 400000 to say if Reputation > 500000 or not 🙂

This is why Reputation needs to be in the filtered index explicitly – to avoid a lookup when the query predicate does not match the filter predicate exactly.

The choice between a lookup and table/CI scan is the normal cost-based one, depending on the number of rows expected.

Reply
- Erik Darling
  January 20, 2016 8:59 am
  
  There you go again, saying it much better than I did. Thanks Paul.
  
  Reply
TuneIn
January 22, 2016 8:24 am

Why won’t this work if I use the predicate column as the first column of the index itself?

Reply
- Thomas Franz
  January 22, 2016 9:49 am
  
  it would work – as long you not filter for e.g. Display Name.
  
  SELECT [u].[Id], [u].[DisplayName]
  FROM [dbo].[Users] AS [u]
  WHERE [u].[Reputation] > 400001
  AND DisplayName = ‘Miller’
  
  would lead to a pseudo seek (it makes an Index seek to find the first row, wich is > 400001) and than scan the whole remaining index for Millers.
  
  If you would put the Reputation as last column it would act equal to an include column, but the index is bigger, because the Reputation will be stored on the key pages + data pages, while with include it will be stored only in the data pages. This may be not a big deal for a int or bigint column, but consider e.g. the adress or some other big varchar columns
  
  Reply
  - TuneIn
    January 22, 2016 10:08 am
    
    It doesn’t make sense to me. I think there is nothing wrong with the pseudo seek.
    
    As far as I can tell, putting the predicate column as (last) key column makes no difference: it is not used, that is, in my case.
    
    I have a very large table with only two records satisfying a predicate. It’s an int column with two unique values. All other records have value 0.
    
    I have a unique filtered index on these two values of this form:
    WHERE ( predicatecolumn >= minimumpredicatevalue )
    
    When I use the following WHERE clause in my query,
    
    WHERE ( predicatecolumn >= minimumpredicatevalue )
    __AND ( predicatecolumn <= maximumpredicatevalue )
    
    the filtered index will _only_ be used if I add the predicatecolumn as included column.
    If I add it as a key column to the index, it is not used, not as last column and not even as first. In those case a Clustered Index Scan is used on the large table!
    
    I don't get that. I would expect the filtered index to be used in all those cases, because all necessary information is in it, no matter what way I add the predicate column.
    
    What is wrong with my reasoning?
    
    Reply
    - Thomas Franz
      January 23, 2016 5:40 am
      
      I’m not sure if I understand your example:
      
      – you have a intCol which is either 726 or 16907.
      – you have idx1 with WHERE intCol >= 726
      – you have idx2 with WHERE intCol >= 16907
      – you querying the table with where intCol >=726 and intCol <= 16907
      
      This would not make any sense to me (neither the filtered indexes – there you should use an intCol = x) nor the query itself.
      
      Reply
      - TuneIn
        January 25, 2016 3:28 am
        
        intCol = 0 for 10 millions of records
        intCol = 1 for 1 record
        intCol = 2 for 1 record
        
        I have a filtered index with
        WHERE ( predicatecolumn >= minimumpredicatevalue )
        
        I query on the intCol using
        WHERE ( predicatecolumn >= minimumpredicatevalue ) — 1
        __AND ( predicatecolumn <= maximumpredicatevalue ) — 2
        
        just to make sure no unwanted values are retrieved, should they ever have been entered.
        
        I don't see why I should create two separate filtered indexes, as you seem to suggest.
        
        I don't see why the _only_ way the filtered index would be used for this, is when the intCol is added as an Included column and not as first key column of the index itself.
      - Thomas Franz
        January 25, 2016 3:55 am
        
        I wonder that the filtered index would be used in any case, because the AND intCol = 1).
        
        Did you try to use only WHERE intCol between 1 and 2 (at least in the query but it should work in the index filter condition too)?
      - TuneIn
        January 25, 2016 4:00 am
        
        >> ”
        I wonder that the filtered index would be used in any case, because the AND intCol = 1).
        
        Did you try to use only WHERE intCol between 1 and 2 (at least in the query but it should work in the index filter condition too)?
        ” <<
        
        What do you mean by "AND intCol = 1)"?. I don't see me having written that.
        
        WHERE clause is all I used, so I do not see why it works _only_ if I include intCol as Include column and not as first key column in the filtered index.
      - Thomas Franz
        January 25, 2016 4:05 am
        
        replace the 1 by your minimumpredicatevalue and the 2 by your maximumpredicatevalue.
        
        But to be honest, I have no idea, why it should not work, if intCol is the first column in the (filtered) index instead of only an included column (except some internal optimizations)
Thomas Franz
January 27, 2016 3:46 am

A question regarding filtered indexes and UPDATE statements:

Why have I to include the updated columns into the filtered index (otherwise it would perform a keylookup).

Example:
UPDATE tbl SET last_newsletter_date = GetDate() WHERE email is not null and subscripe = 1
needs an index
CREATE INDEX tbl_ix ON tbl (email) INCLUDE (subscripe, last_newsletter_date)
WHERE email is not null and subscripe = 1

Without last_newsletter_date it would use the index (if there are only few rows affected) + keylookup for the last_newsletter_date or makes a clustered index scan

Reply