Filtered Statistics Follow-up

During our pre-con in Seattle

A really sharp lady brought up using filtered statistics, and for a good reason. She has some big tables, and with just 200 histogram steps, you can miss out on a lot of information about data distribution when you have millions or billions of rows in a table. There’s simply not enough room to describe it all accurately, even with a full scan of the stats.

The good news: filtered statistics can totally help you isolate chunks of data and provide more granular details

The bad news: well, they probably don’t do exactly what you want them to do.

A little setup

To make math easy, we’re going to stick a million rows in our table, with 10% (100,000) of them having a BIT flag set to 1.

With that out of the way, and our filtered statistic created, let’s see what happens when we query it. Filtered statistics behave just like filtered indexes when you run ad hoc queries: when you pass in literals, they get used.

Note the fancy trace flags to output statistics examined and used by our query. These only work on 2014 and up. If you want to look at this sort of information on older versions, have a look at Mr. Black Magic himself, Paul White’s blog post. When we check the messages tab for output, part of it looks like this:

I have all my stats IDs memorized. Don't you?
I have all my stats IDs memorized. Don’t you?

 

Alright, so which stats object is that?

Choices, choices
Choices, choices

 

Okay, cool. Our stats with an id of 3 is indeed the filtered one. If you go back and flip the 1 to a 0 the query will use stats #2, which isn’t filtered. At least as of this writing, SQL doesn’t generate filtered statistics on its own.

Where things get a little yucky is with variables. Check this out:

This breaks everything, and falls back to using the system statistic as well.

You stink.
You stink.

 

As you’d expect, the estimates fall apart here as well because we’re using a local variable. The way around this, for better or worse, is a RECOMPILE hint. This ‘fixes’ everything. As long as you don’t run this code a lot, you probably won’t break a CPU coming up with a plan this simple.

Correctimate
Correctimate

Where things get super awkward

Stored procedures and parameterized dynamic SQL can be sniffed. Well, their parameters can be, anyway. Groovy. But stored procedures have a hard time using filtered indexes when the variables passed in make up all or part of the where clause. This is a plan caching issue. If your query plan is produced using a filtered index, SQL may not be able to reuse it if a value outside of your filtered index is used. Take our BIT search for example; a plan using the filtered index for 1 can’t be used very well to find 0. This is where filtered indexes and statistics diverge.

Filtered statistics can still be used for cardinality estimation, and they can make parameter sniffing a bit more of an issue. Let’s procedurize our code and see why.

The first run works fine. We use our filtered stats, we get the good cardinality estimate. Mission accomplished.

Compile!
Compile!

 

At least until the next run:

What happened to our guessing game?!
What happened to our guessing game?!

 

If we switch over to the query plan, we can see just how off our estimates are. We’re getting the cardinality estimate for the 100k rows. Again, this makes sense. We had a parameter, it was sniffed, and a plan was compiled for it. The plan gets reused for the second call to the stored procedure, by design. It doesn’t matter a lot here because the system stats object (id 2) has the same information. We’d run into the same parameter sniffing problem regardless of the filtered statistics object. It’s just funny to me that it will get used at all.

But is this what we want?
But is this good?

 

Helpful or not?

The answer is a lukewarm ‘sometimes’.

For ad hoc queries that either don’t use variables, or can be recompiled, filtered statistics can certainly help.

For stored procedures, they can exacerbate parameter sniffing problems, or you may end up with your filtered stats not being used at all if the plan is first compiled with values that don’t fit the filter definition. That’s probably not what you wanted to hear. Unfortunately there are no hints to force SQL to use a particular statistics object, only indexes.

Thanks for reading!

Previous Post
Ten Ways to Set MAXDOP
Next Post
GroupBy Registration is Open for 3 More Days

3 Comments. Leave new

  • Thanks for a well-written pertinent and succinct article.

    Reply
  • Untested, but I believe the filtered stats’ predicate can be added to the sproc as a literal:
    WHERE fs2.c2 = 1 AND …
    The filtered stats should be considered, as long as the conjunctive predicate (i.e. what would be after the above AND…) does not fold into the filtered statistics’ predicate and as long as SQL Server does not force parameterization.

    Reply
  • By my testing, the same results occur with non-filtered stats.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.