Are Index ‘Included’ Columns in Your Multi-Column Statistics?

Last Updated July 13, 2017

Internals Cat populates the density vector

When you create an index in SQL Server with multiple columns, behind the scenes it creates a related multi-column statistic for the index. This statistic gives SQL Server some information about the relationship between the columns that it can use for row estimates when running queries.

But what if you use ‘included’ columns in the index? Do they get information recorded in the statistics?

Here’s my Index

To test, we’ll create an index with multiple key columns and included columns. I’m using a restored copy of the StackOverflow database.

/* FavoriteCount and LastEditDate aren't in the index key */
/* They'll just hang out in the leaf of the index */
CREATE INDEX ix_Posts_OwnerUserId_PostTypeId_INCLUDES 
    ON dbo.Posts (OwnerUserId, PostTypeId) 
    INCLUDE (FavoriteCount, LastEditDate);
GO

/* FavoriteCount and LastEditDate aren't in the index key */

/* They'll just hang out in the leaf of the index */

CREATE INDEX ix_Posts_OwnerUserId_PostTypeId_INCLUDES

ON dbo.Posts (OwnerUserId, PostTypeId)

INCLUDE (FavoriteCount, LastEditDate);

First Stop: sys.Stats and Friends

The sys.stats DMV lets me query metadata out about my statistics, and I can join up to find out how many columns it has, what order they are in, when they were updated, and all sorts of info:

SELECT so.name,
    st.name,
    st.stats_id,
    sc.stats_column_id,
    c.name as column_name,
    st.auto_created,
    st.filter_definition,
    sp.last_updated,
    sp.rows,
    sp.rows_sampled,
    sp.steps,
    sp.modification_counter
FROM sys.stats AS st
JOIN sys.stats_columns AS sc on st.object_id=sc.object_id and st.stats_id=sc.stats_id
JOIN sys.columns as c on sc.object_id=c.object_id and sc.column_id=c.column_id
JOIN sys.objects as so on st.object_id=so.object_id
CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp
WHERE so.name='Posts'
ORDER by so.name, st.stats_id, sc.stats_column_id;
GO

SELECT so.name,

st.name,

st.stats_id,

sc.stats_column_id,

c.name as column_name,

st.auto_created,

st.filter_definition,

sp.last_updated,

sp.rows,

sp.rows_sampled,

sp.steps,

sp.modification_counter

FROM sys.stats AS st

JOIN sys.stats_columns AS sc on st.object_id=sc.object_id and st.stats_id=sc.stats_id

JOIN sys.columns as c on sc.object_id=c.object_id and sc.column_id=c.column_id

JOIN sys.objects as so on st.object_id=so.object_id

CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp

WHERE so.name='Posts'

ORDER by so.name, st.stats_id, sc.stats_column_id;

Here’s the first few columns of the results:

This doesn’t show FavoriteCount or LastEdit date. My index key columns are in the statistic, in the same order they appear in the index.

What About DBCC SHOW_STATISTICS?

We can see a representation of the statistic with more information. We just plug in the name of the table and the statistic to DBCC SHOW_STATISTICS, like this:

DBCC SHOW_STATISTICS ('Posts', 'ix_Posts_OwnerUserId_PostTypeId_INCLUDES');
GO

1 2	DBCC SHOW_STATISTICS ('Posts', 'ix_Posts_OwnerUserId_PostTypeId_INCLUDES'); GO

This returns a few result sets. We get information about the header of the statistics, some density information about the relationship between the columns, and a histogram that shows the distribution of rows for the leading column in the statistic.

That Density Vector Has Three Rows!

Check that out, the density vector has a third row to help SQL Server estimate data distribution for OwnerUserId, PostTypeId, and the Id column. We didn’t specify the Id column anywhere in our index!

The Id column is the key of the clustered index on this table. SQL Server decided that it would also be interesting to know the density information of that as a third column.

What if the Index Was Unique? Would it Still Sneak In the Clustering Key?

The index above doesn’t have unique data, but let’s test a different index:

CREATE UNIQUE INDEX ix_VoteTypes_Name on dbo.VoteTypes (Name);
GO

DBCC SHOW_STATISTICS ('VoteTypes', 'ix_VoteTypes_Name');
GO

CREATE UNIQUE INDEX ix_VoteTypes_Name on dbo.VoteTypes (Name);

DBCC SHOW_STATISTICS ('VoteTypes', 'ix_VoteTypes_Name');

Here’s the density vector:

SQL Server decided that tracking the density information for the combination of Name and Id was still useful, even though this is a unique index and SQL Server didn’t have to sneak that column into the key of the index to make it unique behind the scenes.

Included Columns Weren’t in the Density Vector, but My Clustering Key Was!

In other words, included columns won’t be in the density vector unless they happen to also be in the key of your clustered index.

And Party and Alt Shift

Should You Be a SQL Server DBA? (video)

3 Comments. Leave new

Robin
April 24, 2015 8:29 pm

I was sure your cluster key came free with your nonclustered indexes. Would that explain it appearing as part of the density vector?

Reply
- Kendra Little
  April 27, 2015 8:53 am
  
  It’s a little different. In terms of nonclustered indexes, the clustering key may be “free” in the included columns– it’s not always in the key of the nonclustered index.
  
  I think in terms of statistics, it always makes sense for the optimizer to have a way to estimate the number of related rows to the clustering key in case it has to do row lookups, regardless of that key’s position in the index.
  
  Reply
James Lupolt
May 1, 2015 8:43 am

Thanks for this post. It’s always good to encourage people to get hands-on and look at this stuff.

On a related note, there are a couple questions on DBA.se about cases where sys.stats_columns doesn’t accurately show the column order within a statistics object:

http://dba.stackexchange.com/questions/94533/is-sys-stats-columns-incorrect

http://dba.stackexchange.com/questions/91118/stats-column-id-and-index-column-id-do-not-update-with-physical-order-of-cluster

There’s also a Connect item here:

https://connect.microsoft.com/SQLServer/feedback/details/1163126

Reply

Are Index ‘Included’ Columns in Your Multi-Column Statistics?

Here’s my Index

First Stop: sys.Stats and Friends

What About DBCC SHOW_STATISTICS?

That Density Vector Has Three Rows!

What if the Index Was Unique? Would it Still Sneak In the Clustering Key?

Included Columns Weren’t in the Density Vector, but My Clustering Key Was!

Related

3 Comments. Leave new

Leave a Reply Cancel reply

Hi! I’m Brent Ozar.

Are Index ‘Included’ Columns in Your Multi-Column Statistics?

Here’s my Index

First Stop: sys.Stats and Friends

What About DBCC SHOW_STATISTICS?

That Density Vector Has Three Rows!

What if the Index Was Unique? Would it Still Sneak In the Clustering Key?

Included Columns Weren’t in the Density Vector, but My Clustering Key Was!

Related

3 Comments. Leave new

Leave a Reply Cancel reply

Hi! I’m Brent Ozar.

Get Free SQL Stuff