
When you create an index in SQL Server with multiple columns, behind the scenes it creates a related multi-column statistic for the index. This statistic gives SQL Server some information about the relationship between the columns that it can use for row estimates when running queries.
But what if you use ‘included’ columns in the index? Do they get information recorded in the statistics?
Here’s my Index
To test, we’ll create an index with multiple key columns and included columns. I’m using a restored copy of the StackOverflow database.
1 2 3 4 5 6 |
/* FavoriteCount and LastEditDate aren't in the index key */ /* They'll just hang out in the leaf of the index */ CREATE INDEX ix_Posts_OwnerUserId_PostTypeId_INCLUDES ON dbo.Posts (OwnerUserId, PostTypeId) INCLUDE (FavoriteCount, LastEditDate); GO |
First Stop: sys.Stats and Friends
The sys.stats DMV lets me query metadata out about my statistics, and I can join up to find out how many columns it has, what order they are in, when they were updated, and all sorts of info:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SELECT so.name, st.name, st.stats_id, sc.stats_column_id, c.name as column_name, st.auto_created, st.filter_definition, sp.last_updated, sp.rows, sp.rows_sampled, sp.steps, sp.modification_counter FROM sys.stats AS st JOIN sys.stats_columns AS sc on st.object_id=sc.object_id and st.stats_id=sc.stats_id JOIN sys.columns as c on sc.object_id=c.object_id and sc.column_id=c.column_id JOIN sys.objects as so on st.object_id=so.object_id CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp WHERE so.name='Posts' ORDER by so.name, st.stats_id, sc.stats_column_id; GO |
Here’s the first few columns of the results:
This doesn’t show FavoriteCount or LastEdit date. My index key columns are in the statistic, in the same order they appear in the index.
What About DBCC SHOW_STATISTICS?
We can see a representation of the statistic with more information. We just plug in the name of the table and the statistic to DBCC SHOW_STATISTICS, like this:
1 2 |
DBCC SHOW_STATISTICS ('Posts', 'ix_Posts_OwnerUserId_PostTypeId_INCLUDES'); GO |
This returns a few result sets. We get information about the header of the statistics, some density information about the relationship between the columns, and a histogram that shows the distribution of rows for the leading column in the statistic.
That Density Vector Has Three Rows!
Check that out, the density vector has a third row to help SQL Server estimate data distribution for OwnerUserId, PostTypeId, and the Id column. We didn’t specify the Id column anywhere in our index!
The Id column is the key of the clustered index on this table. SQL Server decided that it would also be interesting to know the density information of that as a third column.
What if the Index Was Unique? Would it Still Sneak In the Clustering Key?
The index above doesn’t have unique data, but let’s test a different index:
1 2 3 4 5 |
CREATE UNIQUE INDEX ix_VoteTypes_Name on dbo.VoteTypes (Name); GO DBCC SHOW_STATISTICS ('VoteTypes', 'ix_VoteTypes_Name'); GO |
Here’s the density vector:
SQL Server decided that tracking the density information for the combination of Name and Id was still useful, even though this is a unique index and SQL Server didn’t have to sneak that column into the key of the index to make it unique behind the scenes.
Included Columns Weren’t in the Density Vector, but My Clustering Key Was!
In other words, included columns won’t be in the density vector unless they happen to also be in the key of your clustered index.
3 Comments. Leave new
I was sure your cluster key came free with your nonclustered indexes. Would that explain it appearing as part of the density vector?
It’s a little different. In terms of nonclustered indexes, the clustering key may be “free” in the included columns– it’s not always in the key of the nonclustered index.
I think in terms of statistics, it always makes sense for the optimizer to have a way to estimate the number of related rows to the clustering key in case it has to do row lookups, regardless of that key’s position in the index.
Thanks for this post. It’s always good to encourage people to get hands-on and look at this stuff.
On a related note, there are a couple questions on DBA.se about cases where sys.stats_columns doesn’t accurately show the column order within a statistics object:
http://dba.stackexchange.com/questions/94533/is-sys-stats-columns-incorrect
http://dba.stackexchange.com/questions/91118/stats-column-id-and-index-column-id-do-not-update-with-physical-order-of-cluster
There’s also a Connect item here:
https://connect.microsoft.com/SQLServer/feedback/details/1163126