Filtered Statistics Follow-up

Last Updated May 26, 2020

Parameter Sniffing, SQL Server, Statistics

During our pre-con in Seattle

A really sharp lady brought up using filtered statistics, and for a good reason. She has some big tables, and with just 200 histogram steps, you can miss out on a lot of information about data distribution when you have millions or billions of rows in a table. There’s simply not enough room to describe it all accurately, even with a full scan of the stats.

The good news: filtered statistics can totally help you isolate chunks of data and provide more granular details

The bad news: well, they probably don’t do exactly what you want them to do.

A little setup

To make math easy, we’re going to stick a million rows in our table, with 10% (100,000) of them having a BIT flag set to 1.

DROP TABLE IF EXISTS dbo.FilteredStats
CREATE TABLE dbo.FilteredStats (c1 INT IDENTITY (1,1) PRIMARY KEY CLUSTERED, c2 BIT)

INSERT dbo.FilteredStats
        ( c2)
SELECT TOP 1000000
     x.c2
FROM (
SELECT m.is_event_logged AS c2
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2
WHERE m.is_event_logged = 0
) x

UPDATE dbo.FilteredStats 
SET c2 = 1
WHERE c1 % 10 = 0

SELECT 
       fs.c2,
     COUNT(*) AS [Records]
FROM dbo.FilteredStats AS fs
GROUP BY fs.c2

CREATE STATISTICS s_FilteredStats_c3 ON dbo.FilteredStats (c2) WHERE c2 = 1

DROP TABLE IF EXISTS dbo.FilteredStats

CREATE TABLE dbo.FilteredStats (c1 INT IDENTITY (1,1) PRIMARY KEY CLUSTERED, c2 BIT)

INSERT dbo.FilteredStats

( c2)

SELECT TOP 1000000

x.c2

FROM (

SELECT m.is_event_logged AS c2

FROM sys.messages AS m

CROSS JOIN sys.messages AS m2

WHERE m.is_event_logged = 0

) x

UPDATE dbo.FilteredStats

SET c2 = 1

WHERE c1 % 10 = 0

SELECT

fs.c2,

COUNT(*) AS [Records]

FROM dbo.FilteredStats AS fs

GROUP BY fs.c2

CREATE STATISTICS s_FilteredStats_c3 ON dbo.FilteredStats (c2) WHERE c2 = 1

With that out of the way, and our filtered statistic created, let’s see what happens when we query it. Filtered statistics behave just like filtered indexes when you run ad hoc queries: when you pass in literals, they get used.

SELECT  COUNT(*)
FROM    dbo.FilteredStats AS fs
WHERE   fs.c2 = 1
OPTION  ( QUERYTRACEON 3604, QUERYTRACEON 2363 );

SELECT COUNT(*)

FROM dbo.FilteredStats AS fs

WHERE fs.c2 = 1

OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );

Note the fancy trace flags to output statistics examined and used by our query. These only work on 2014 and up. If you want to look at this sort of information on older versions, have a look at Mr. Black Magic himself, Paul White’s blog post. When we check the messages tab for output, part of it looks like this:

I have all my stats IDs memorized. Don't you? — I have all my stats IDs memorized. Don’t you?

Alright, so which stats object is that?

Okay, cool. Our stats with an id of 3 is indeed the filtered one. If you go back and flip the 1 to a 0 the query will use stats #2, which isn’t filtered. At least as of this writing, SQL doesn’t generate filtered statistics on its own.

Where things get a little yucky is with variables. Check this out:

DECLARE @c2 BIT = 1;
SELECT  COUNT(*)
FROM    dbo.FilteredStats AS fs
WHERE   fs.c2 = @c2
OPTION  ( QUERYTRACEON 3604, QUERYTRACEON 2363 );

DECLARE @c2 BIT = 1;

SELECT COUNT(*)

FROM dbo.FilteredStats AS fs

WHERE fs.c2 = @c2

OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );

This breaks everything, and falls back to using the system statistic as well.

As you’d expect, the estimates fall apart here as well because we’re using a local variable. The way around this, for better or worse, is a RECOMPILE hint. This ‘fixes’ everything. As long as you don’t run this code a lot, you probably won’t break a CPU coming up with a plan this simple.

DECLARE @c2 BIT = 1;
SELECT  COUNT(*)
FROM    dbo.FilteredStats AS fs
WHERE   fs.c2 = @c2
OPTION  ( QUERYTRACEON 3604, QUERYTRACEON 2363, RECOMPILE );

DECLARE @c2 BIT = 1;

SELECT COUNT(*)

FROM dbo.FilteredStats AS fs

WHERE fs.c2 = @c2

OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363, RECOMPILE );

Where things get super awkward

Stored procedures and parameterized dynamic SQL can be sniffed. Well, their parameters can be, anyway. Groovy. But stored procedures have a hard time using filtered indexes when the variables passed in make up all or part of the where clause. This is a plan caching issue. If your query plan is produced using a filtered index, SQL may not be able to reuse it if a value outside of your filtered index is used. Take our BIT search for example; a plan using the filtered index for 1 can’t be used very well to find 0. This is where filtered indexes and statistics diverge.

Filtered statistics can still be used for cardinality estimation, and they can make parameter sniffing a bit more of an issue. Let’s procedurize our code and see why.

CREATE OR ALTER PROCEDURE dbo.GetCount (@c2 BIT)
AS
BEGIN
SELECT COUNT(*)
FROM dbo.FilteredStats AS fs
WHERE fs.c2 = @c2
OPTION (QUERYTRACEON 3604, QUERYTRACEON 2363)
END
GO

CREATE OR ALTER PROCEDURE dbo.GetCount (@c2 BIT)

BEGIN

SELECT COUNT(*)

FROM dbo.FilteredStats AS fs

WHERE fs.c2 = @c2

OPTION (QUERYTRACEON 3604, QUERYTRACEON 2363)

END

The first run works fine. We use our filtered stats, we get the good cardinality estimate. Mission accomplished.

At least until the next run:

If we switch over to the query plan, we can see just how off our estimates are. We’re getting the cardinality estimate for the 100k rows. Again, this makes sense. We had a parameter, it was sniffed, and a plan was compiled for it. The plan gets reused for the second call to the stored procedure, by design. It doesn’t matter a lot here because the system stats object (id 2) has the same information. We’d run into the same parameter sniffing problem regardless of the filtered statistics object. It’s just funny to me that it will get used at all.

But is this what we want? — But is this good?

Helpful or not?

The answer is a lukewarm ‘sometimes’.

For ad hoc queries that either don’t use variables, or can be recompiled, filtered statistics can certainly help.

For stored procedures, they can exacerbate parameter sniffing problems, or you may end up with your filtered stats not being used at all if the plan is first compiled with values that don’t fit the filter definition. That’s probably not what you wanted to hear. Unfortunately there are no hints to force SQL to use a particular statistics object, only indexes.

Thanks for reading!

Ten Ways to Set MAXDOP

GroupBy Registration is Open for 3 More Days

5 Comments. Leave new

Jon Brune
December 22, 2016 11:32 am

Thanks for a well-written pertinent and succinct article.

Reply
Bill
December 23, 2016 8:26 am

Untested, but I believe the filtered stats’ predicate can be added to the sproc as a literal:
WHERE fs2.c2 = 1 AND …
The filtered stats should be considered, as long as the conjunctive predicate (i.e. what would be after the above AND…) does not fold into the filtered statistics’ predicate and as long as SQL Server does not force parameterization.

Reply
Paul O
April 6, 2018 12:51 pm

By my testing, the same results occur with non-filtered stats.

Reply
Dragan
August 7, 2023 1:44 pm

Interesting article, but…
The filtered stats work the best when the column(s) are mostly NULL, but the parameters are never NULL. Those queries are notoriously bad for parametrization, but filtered stats work just beautifully.

Reply
- Brent Ozar
  August 7, 2023 2:00 pm
  
  Never, eh? 😉 Boy, if I had a dollar for every time someone said, “Don’t worry, we’ll never pass in a null to the stored procedure.”
  
  Actually, I do have a dollar – multiple dollars for that. 😉
  
  Reply