Stats Week: Do Query Predicates Affect Histogram Step Creation?

Last Updated 4 years ago

Auto Create Statistics is your friend

It’s not perfect, but 99% of the time I’d rather have imperfect statistics than no statistics. This question struck me as interesting, because the optimizer will totally sniff parameters to compile an initial plan. If you don’t have index statistics, or system statistics already on a column in a WHERE clause, SQL is generally kind enough to create a statistics object for you when the query is compiled.

So I thought to myself: Would SQL create an initial histogram based on the compile-time parameter? It might be nice if it did, since it could potentially get the best possible information about predicate cardinality from a direct hit on a histogram step.

Here’s a quick test that shows, no, SQL doesn’t give half a care about that. It creates the same histogram no matter what. 1000 rows should do the trick. I’m making both columns NOT NULL here, because I want to make one my PK, and I want to make sure there’s no histogram step for NULL values in the other. I’m not going to index my date column here, I’m going to let SQL generate statistics automatically.

SELECT
    ISNULL([x].[ID], 0) AS [ID] ,
    ISNULL(CAST([x].[DateCol] AS DATE), '1900-01-01') AS [HireDate]
INTO
    [dbo].[AutoStatsTest]
FROM
    ( SELECT TOP 1000
        ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) ,
        DATEADD(HOUR, [m].[message_id], GETDATE())
      FROM
        [sys].[messages] AS [m] ) [x] ( [ID], [DateCol] );

ALTER TABLE [dbo].[AutoStatsTest] ADD CONSTRAINT [pk_t1_id] PRIMARY KEY CLUSTERED ([ID]);

1

2

3

4

5

6

7

8

9

10

11

12

13

SELECT

ISNULL([x].[ID], 0) AS [ID] ,

ISNULL(CAST([x].[DateCol] AS DATE), '1900-01-01') AS [HireDate]

INTO

[dbo].[AutoStatsTest]

FROM

( SELECT TOP 1000

ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) ,

DATEADD(HOUR, [m].[message_id], GETDATE())

FROM

[sys].[messages] AS [m] ) [x] ( [ID], [DateCol] );

ALTER TABLE [dbo].[AutoStatsTest] ADD CONSTRAINT [pk_t1_id] PRIMARY KEY CLUSTERED ([ID]);

First, let’s check in on what values we have

I’m going to run one query that will generate a histogram, but it’s guaranteed to return all of the table data. I want to see what SQL comes up with for histogram hits and missing, here.

SELECT *
FROM [dbo].[AutoStatsTest] AS [ast]
WHERE [ast].[HireDate] >= '1900-01-01'

1

2

3

SELECT *

FROM [dbo].[AutoStatsTest] AS [ast]

WHERE [ast].[HireDate] >= '1900-01-01'

We have our histogram, and I’ll use a clunky DBCC command to show me to it. Below is a partial screen cap, up to a point of interest.

SQL created a histogram with direct hits on 04/30, and then 05/02. That means it doesn’t have a step for 05/01, but it does postulate that there are 22 rows with a date of 05/01 in the RANGE_ROWS column.

I went ahead and dropped that table and re-created it. Next we’ll run the same query, but we’ll pass in 05/01 as an equality value.

SELECT *
FROM [dbo].[AutoStatsTest] AS [ast]
WHERE [ast].[HireDate] = '2016-05-01'

1

2

3

SELECT *

FROM [dbo].[AutoStatsTest] AS [ast]

WHERE [ast].[HireDate] = '2016-05-01'

And, long story short, it creates the exact same histogram as before.

Is this good? Is this bad?

Well, at least it’s reliable. I’m not sure how I feel about it otherwise.

You can try creating filtered indexes or statistics on really important segments of data, if you really need SQL to have granular information about it. Otherwise, you’ll have to trust in the secret, and sometimes not so secret sauce, behind the cardinality estimator.

Thanks for reading!

Brent says: the more I work with SQL Server, the more I’m filled with optimism about the oddest things. When I read Erik’s idea about the exact histogram step, though, I thought, “Nopetopus.”

2 Comments. Leave new

Isaac Koss
April 20, 2016 1:22 pm

Awesome new signature!

Are system Statistics usually an indication of missing Indexes?? Thanks!

Reply
- Erik Darling
  April 20, 2016 2:48 pm
  
  Great question! Usually not, but there could be some interesting things to look for. I’d start with BlitzIndex, and see if there are any high value missing index requests, and then I’d see if any high read queries are hitting that table/column in BlitzCache.
  
  Reply