Are Index ‘Included’ Columns in Your Multi-Column Statistics?

Internals Cat populates the density vector
Internals Cat populates the density vector

When you create an index in SQL Server with multiple columns, behind the scenes it creates a related multi-column statistic for the index. This statistic gives SQL Server some information about the relationship between the columns that it can use for row estimates when running queries.

But what if you use ‘included’ columns in the index? Do they get information recorded in the statistics?

Here’s my Index

To test, we’ll create an index with multiple key columns and included columns. I’m using a restored copy of the StackOverflow database.

First Stop: sys.Stats and Friends

The sys.stats DMV lets me query metadata out about my statistics, and I can join up to find out how many columns it has, what order they are in, when they were updated, and all sorts of info:

Here’s the first few columns of the results:

statistics-columns-order

This doesn’t show FavoriteCount or LastEdit date. My index key columns are in the statistic, in the same order they appear in the index.

What About DBCC SHOW_STATISTICS?

We can see a representation of the statistic with more information. We just plug in the name of the table and the statistic to DBCC SHOW_STATISTICS, like this:

This returns a few result sets. We get information about the header of the statistics, some density information about the relationship between the columns, and a histogram that shows the distribution of rows for the leading column in the statistic.

dbcc show statistics output

 

That Density Vector Has Three Rows!

Check that out, the density vector has a third row to help SQL Server estimate data distribution for OwnerUserId, PostTypeId, and the Id column. We didn’t specify the Id column anywhere in our index!

The Id column is the key of the clustered index on this table. SQL Server decided that it would also be interesting to know the density information of that as a third column.

What if the Index Was Unique? Would it Still Sneak In the Clustering Key?

The index above doesn’t have unique data, but let’s test a different index:

Here’s the density vector:

dbcc show statistics output unique index

SQL Server decided that tracking the density information for the combination of Name and Id was still useful, even though this is a unique index and SQL Server didn’t have to sneak that column into the key of the index to make it unique behind the scenes.

Included Columns Weren’t in the Density Vector, but My Clustering Key Was!

In other words, included columns won’t be in the density vector unless they happen to also be in the key of your clustered index.

Previous Post
And Party and Alt Shift
Next Post
Should You Be a SQL Server DBA? (video)

3 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.