Clustered Index key columns in Nonclustered Indexes

Last Updated March 12, 2018

Columnstore Indexes, Indexing, SQL Server

Clustered indexes are fundamental

And I’m not just saying that because Kendra is my spiritual adviser!

They are not ~a copy~ of the table, they are the table, ordered by the column(s) you choose as the key. It could be one. It could be a few. It could be a GUID! But that’s for another time. A long time from now. When I’ve raised an army, in accordance with ancient prophecy.

What I’d like to focus on is another oft-neglected consideration when indexing:

Columns in the clustering key will be in all of your nonclustered indexes

“How do you know?”
“Can I join your army?”
“Why does it do that?”

You may be asking yourself all of those questions. I’ll answer one of them with a demo, and one of them with “no”.

D-D-D-DEMO FACE

USE tempdb;

IF OBJECT_ID('tempdb..ClusterKeyColumnsTest') IS NOT NULL
DROP TABLE tempdb..ClusterKeyColumnsTest

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  
        IDENTITY (BIGINT, 1,1) AS ID ,  
        ISNULL(CONVERT(DATE, DATEADD(HOUR, -N.N, GETDATE()    )),  '1900-01-01') AS OrderDate ,
        ISNULL(CONVERT(DATE, DATEADD(HOUR, -N.N, GETDATE() + 1)), '1900-01-01') AS ProcessDate ,
        ISNULL(CONVERT(DATE, DATEADD(HOUR, -N.N, GETDATE() + 3)), '1900-01-01') AS ShipDate ,
        REPLICATE(CAST(NEWID() AS NVARCHAR(MAX)), CEILING(RAND() * 10)) AS DumbGUID 
INTO ClusterKeyColumnsTest
FROM    Numbers N
ORDER BY N.N DESC;

ALTER TABLE ClusterKeyColumnsTest ADD CONSTRAINT PK_ClusterKeyColumnsTest PRIMARY KEY CLUSTERED (ID) WITH (FILLFACTOR = 100);

CREATE NONCLUSTERED INDEX IX_ALLDATES ON dbo.ClusterKeyColumnsTest (OrderDate, ProcessDate, ShipDate) WITH (FILLFACTOR = 100);

USE tempdb;

IF OBJECT_ID('tempdb..ClusterKeyColumnsTest') IS NOT NULL

DROP TABLE tempdb..ClusterKeyColumnsTest

;WITH E1(N) AS (

SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL

SELECT NULL ),

E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),

Numbers AS (SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)

SELECT

IDENTITY (BIGINT, 1,1) AS ID ,

ISNULL(CONVERT(DATE, DATEADD(HOUR, -N.N, GETDATE() )), '1900-01-01') AS OrderDate ,

ISNULL(CONVERT(DATE, DATEADD(HOUR, -N.N, GETDATE() + 1)), '1900-01-01') AS ProcessDate ,

ISNULL(CONVERT(DATE, DATEADD(HOUR, -N.N, GETDATE() + 3)), '1900-01-01') AS ShipDate ,

REPLICATE(CAST(NEWID() AS NVARCHAR(MAX)), CEILING(RAND() * 10)) AS DumbGUID

INTO ClusterKeyColumnsTest

FROM Numbers N

ORDER BY N.N DESC;

ALTER TABLE ClusterKeyColumnsTest ADD CONSTRAINT PK_ClusterKeyColumnsTest PRIMARY KEY CLUSTERED (ID) WITH (FILLFACTOR = 100);

CREATE NONCLUSTERED INDEX IX_ALLDATES ON dbo.ClusterKeyColumnsTest (OrderDate, ProcessDate, ShipDate) WITH (FILLFACTOR = 100);

That code will create a pretty rudimentary table of random data. It was between 3-4 GB on my system.

You can see a Clustered Index was created on the ID column, and a Nonclustered Index was created on the three date columns. The DumbGUID column was, of course, neglected as a key column in both. Poor GUID.

Running these queries, the Nonclustered Index on the date columns will be used, because SQL does this smart thing where it takes page count into consideration when choosing an index to use.

SELECT ckct.ID
FROM   dbo.ClusterKeyColumnsTest AS ckct;

SELECT ckct.ID
FROM   dbo.ClusterKeyColumnsTest AS ckct
WHERE  ckct.OrderDate = '2014-12-18';

SELECT ckct.ID, ckct.DumbGUID
FROM   dbo.ClusterKeyColumnsTest AS ckct
WHERE  ckct.OrderDate = '2014-12-18';

SELECT ckct.ID

FROM dbo.ClusterKeyColumnsTest AS ckct;

SELECT ckct.ID

FROM dbo.ClusterKeyColumnsTest AS ckct

WHERE ckct.OrderDate = '2014-12-18';

SELECT ckct.ID, ckct.DumbGUID

FROM dbo.ClusterKeyColumnsTest AS ckct

WHERE ckct.OrderDate = '2014-12-18';

Notice that the only time a Key Lookup was needed is for the last query, where we also select the DumbGUID column. You’d think it would be needed in all three, since the ID column isn’t explicitly named in the key or as an include in the Nonclustered Index.

Key Lookups really are really expensive and you should really fix them. Really.

sp_BlitzIndex® to the rescue

If you find yourself trying to figure out indexes, Kendra’s sp_BlitzIndex® stored procedure is invaluable. It can do this cool thing where it shows you SECRET columns!

Since I’ve already ruined the surprise, let’s look at the indexes on our test table.

EXEC dbo.sp_BlitzIndex
    @DatabaseName = N'tempdb' , 
    @SchemaName = N'dbo' , 
    @TableName = N'ClusterKeyColumnsTest'

EXEC dbo.sp_BlitzIndex

@DatabaseName = N'tempdb' ,

@SchemaName = N'dbo' ,

@TableName = N'ClusterKeyColumnsTest'

Here’s the output. The detail it gives you on index columns and datatypes is really awesome. You can see the ID column is part of the Nonclustered Index, even though it isn’t named in the definition.

One step beyond

Run the code to drop and recreate our test table, but this time add these indexes below instead of the original ones.

ALTER TABLE ClusterKeyColumnsTest
ADD CONSTRAINT [PK_[ClusterKeyColumnsTest]
    PRIMARY KEY CLUSTERED ( ID, OrderDate )
    WITH ( FILLFACTOR = 100 );

CREATE NONCLUSTERED INDEX IX_ALLDATES
    ON dbo.ClusterKeyColumnsTest ( ProcessDate, ShipDate )
    INCLUDE ( DumbGUID )
    WITH ( FILLFACTOR = 100 );

ALTER TABLE ClusterKeyColumnsTest

ADD CONSTRAINT [PK_[ClusterKeyColumnsTest]

PRIMARY KEY CLUSTERED ( ID, OrderDate )

WITH ( FILLFACTOR = 100 );

CREATE NONCLUSTERED INDEX IX_ALLDATES

ON dbo.ClusterKeyColumnsTest ( ProcessDate, ShipDate )

INCLUDE ( DumbGUID )

WITH ( FILLFACTOR = 100 );

Running the same three queries as before, our plans change only slightly. The Key Lookup is gone, and the statement cost per batch has evened out.

You too can be a hero by getting rid of Key Lookups.

But notice that, for the second query, where we’re searching on the OrderDate column, we’re still scanning the Nonclustered Index.

We moved that out of the Nonclustered Index and used it as part of the Clustering Key the second time around. What gives?

Running sp_BlitzIndex® the same as before, OrderDate is now a secret column in the Nonclustered Index.

Fun Boy Three does the best version of Our Lips are Sealed, BTW.

Did we learn anything?

Sure did!

SQL ‘hides’ the columns from the Key of the Clustered Index in Nonclustered Indexes
Since those columns are part of the index, you don’t need to include them in the definition
Secret columns in Nonclustered Indexes can be used to avoid keylookups AND satisfy WHERE clause searches!

Well, at least until all indexes are ColumnStore Indexes.

Kendra says: One of the most common questions I get is whether there’s a penalty or downside if you list the columns from the clustered index in your nonclustered index. You can stop worrying: there’s no penalty.

Consulting Lines: “Write this down, because I’m not going to write it.”

When Did My Azure SQL Database Server Restart?

26 Comments. Leave new

Jerrod
August 12, 2015 12:51 pm

You, sir, are a mind reader! I had scheduled time next week to play with this, to get me some definitive answers (people kept yelling at me about how i couldn’t handle the truth – le sadface). I applaud you for the shortcut <3

Reply
Ray Herring
August 12, 2015 12:53 pm

I agree that explicitly including the clustered column(s) in the non-clustered index is not required. It is interesting (confusing?) that Sys.Index_Columns includes the column(s) if you specify them and does not include them if you don’t specify them.
You can verify that with the SSMS GUI (Index Properties) or a Demo like this.

–Create Index ix_CIColumns_Explicit On dbo.theDate ([DayOfYear], theDate);

–Create Index Ix_CIColumns_Implicit On dbo.theDate ([DayOfYear]);

Select
[Index] = si.name
, [Column] = sc.name

From
sys.indexes As si
Inner Join sys.index_columns As ic
On si.object_id = ic.object_id
And si.index_id = ic.index_id
Inner Join sys.columns As sc
On sc.object_id = si.object_id
And sc.column_id = ic.index_column_id
Where
si.object_id = Object_Id(‘[dbo].[theDate]’, ‘U’)
And si.name In (‘ix_CIColumns_Explicit’, ‘Ix_CIColumns_Implicit’)
Order By
si.name
,sc.name

Results
Index Column
ix_CIColumns_Explicit Id
ix_CIColumns_Explicit theDate
Ix_CIColumns_Implicit Id

Reply
Tony Trus
August 12, 2015 2:59 pm

This topic came up in AQI (Advanced Querying and Indexing) class in Portland (Aug 3-7) and while it’s true that specifying clustered columns in a NCI is unneeded it does not actually cause any harm nor does it cause the index to grow larger in size. It simply makes the hidden column visible. Having better readability so that future developers better understand what columns the creator intended to have coverage on is something I see as value add. The biggest risk I have of exposing these hidden columns is the potential for a DBA years down the road thinking that I did this out of ignorance and criticize it as “ignorance”.

On a side note, for those who haven’t taken that boot camp its worth its weight in gold. Just be prepared to drink through a fire hose because there is no fluff parts of the class.

Reply
- Erik Darling
  August 13, 2015 8:57 am
  
  No disagreements here. What I was aiming for was to show that the key columns from the clustered index aren’t just dead weight in the nonclustered index. Also, choose your clustered index keys carefully, because they really do end up in all of your nonclustered indexes.
  
  Glad you enjoyed the class.
  
  Reply
Martyn
August 13, 2015 4:43 am

You’ve answered my question there Tony, I wanted to check there was no additional cost to adding the secret columns in and that it was still considered good practice for readability.

Thanks.

Reply
Mike
August 13, 2015 4:51 am

What about the following situation:-

If I have a 2 field clustered index and I have a non-clustered index, if by including one of the clustered index fields in the non-clustered index I can declare that it is unique would that benefit SQL as SQL would then know that that combination of fields are unique?

Reply
- Erik Darling
  August 13, 2015 8:58 am
  
  I’m a total sucker for unique indexes, so I’d say go for it, assuming the clustered index data types are narrow ones (integers or dates).
  
  Reply
- Tony Trus
  August 13, 2015 6:17 pm
  
  Adding constraints, foreign keys, (and of course indexes ) will result in some level of overhead in DML. For OLTP databases it’s especially important to look at the DMV’s for reads/writes for objects/indexes to get proof that the cost of your index/constraint is not showing a very high ratio of writes to reads. When this occurs it may be a better idea to experiment with other ways to enforce data integrity.
  
  For select based statements that do not get a trivial plan, the optimizer should formulate a good enough execution plan that takes into account the check constraints / foreign keys / and available indexes to crunch through your result set as best as it can. If the code was hashed together inefficiently the constraints/fk’s/indexes may provide an unmeasurable benefit.
  
  Reply
Anders Pedersen
August 24, 2015 10:24 am

Very cool. But you do mean that it took up 2-3 MB on your system, right? Or am I doing something completely wrong?

Reply
Victor Sosa
August 24, 2016 5:12 pm

Hi Erik.
Great post, thanks for sharing.
One question, if we imagine that demo table is an actual table in a production system, and those are the 3 most common queries executed against it it’d be good to keep the non-clustered index with OrderDate as the first key column, right? I mean, to keep the index seek instead of the index scan.

Thanks!

Reply
- Erik Darling
  August 24, 2016 5:19 pm
  
  Here’s a brainteaser for you:
  
  If the table has 1 million rows in it, would you rather your query:
  Do 1 million index seeks
  Do 1 index scan of 1 million rows?
  
  😀
  
  Reply
  - Victor Sosa
    August 24, 2016 5:46 pm
    
    Hi Erik. Good one. I’m not really sure but let me see if I follow or if I’m completely lost.
    
    1 index scan of 1 million rows would be preferable due to after reading all the index it would be kept in memory unlike if we’re just making index seeks constantly, every index seek would mean to read from disk a few pages. Even though they would be fewer pages with the index seek they would be read from disk and not from memory.
    
    Thanks!
    
    Reply
Ashish Nasre
September 15, 2016 12:11 am

Thank you very much for the details. but i am still confused , why it used non clustered index scan instead clustered index seek.
its fine if clustered index key columns is part of non clustered index but still why it used costly operation means non clustered index scan than clustered index seek.

Reply
- Erik Darling
  September 15, 2016 7:09 am
  
  Hi Ashish,
  
  Scans happen when the optimizer deems them more efficient, and it’s often/mostly/usually correct.
  
  I’ll ask you the same question I asked another commenter: would you rather do 1 million seeks, or one scan of a million rows?
  
  Thanks!
  
  Reply
Richard
July 24, 2019 2:46 am

“Kendra says: One of the most common questions I get is whether there’s a penalty or downside if you list the columns from the clustered index in your nonclustered index. You can stop worrying: there’s no penalty.”

I did a test in the StackOverflow2010 database, wherein I created 2 indexes on the Posts table (clustered index key = Id column): one with Id & CloseDate and the other with just CloseDate. The one with Id & CloseDate is 7% larger. That’s a penalty, right?

Reply
- Brent Ozar
  July 24, 2019 2:47 am
  
  Richard – go ahead and post your code and I’ll see if I can reproduce that. Thanks!
  
  Reply
  - Richard
    July 24, 2019 3:04 am
    
    Oh dear! Math idiocy. It’s bigger, but I don’t think we’ll get too hot & bothered about 0.072%! Still, it IS bigger.
    
    Anyway:
    CREATE NONCLUSTERED INDEX [IX_ClosedDate] ON [dbo].[Posts] ([ClosedDate] ASC)
    CREATE NONCLUSTERED INDEX [IX_Id_ClosedDate] ON [dbo].[Posts] ([Id] ASC,[ClosedDate] ASC)
    
    SELECT
    OBJECT_SCHEMA_NAME(i.OBJECT_ID) AS SchemaName,
    OBJECT_NAME(i.OBJECT_ID) AS TableName,
    i.name AS IndexName,
    i.index_id AS IndexID,
    (8.000 * SUM(a.used_pages))/1024 AS ‘Indexsize(MB)’
    FROM sys.indexes AS i
    JOIN sys.partitions AS p ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id
    JOIN sys.allocation_units AS a ON a.container_id = p.partition_id
    WHERE OBJECT_NAME(i.OBJECT_ID) = ‘Posts’
    GROUP BY i.OBJECT_ID,i.index_id,i.name
    ORDER BY OBJECT_NAME(i.OBJECT_ID),i.index_id
    
    SELECT 100 – ((65.10156250/65.14843750) * 100)
    
    Reply
    - Brent Ozar
      July 24, 2019 3:05 am
      
      Richard – Kendra’s talking about including them – as in, at the END of an index. The two indexes you posted are wildly, wildly different – if you do WHERE ClosedDate = GETDATE(), for example, SQL Server can seek on the first index, but not the second.
      
      Reply
Shaun Austin
August 27, 2019 3:55 am

Hi,
Consider this scenario.

I have an Order table with a clustered index on the ID column (primary key). I’ve noticed that someone has created this non-clustered index on the table:

CREATE NONCLUSTERED INDEX [IX_Order_Dashboard] ON [dbo].[Order]
(
[ID] ASC,
[UserID] ASC,
[StatusID] ASC
)
Note that the first column is ID, which is the clustered key. My initial reaction was that the index is pointless because it’s on the clustered index key, so SQL Server would just do a seek on the clustered index instead.

From looking at the usage stats I can see that SQL Server is frequently seeking, scanning and updating this index. Here’s an example query which does a seek on the non-clustered index(rather than the clustered idnex) on Order

SELECT c.name
FROM customer as c
inner join order as o on c.orderid = o.ID
WHERE c.ID = @customerid

Notably, the non-clustered index is significantly smaller than the clustered index because it only contains 3 columns. Does that meant it will do fewer reads? Well I tested this by disabling the non-clustered index and re-running the query.

What happened? SQL Server now did an index seek on the clustered index, and the number of reads was exactly the same.

So I ask – is this non-clustered index pointless and in fact, actually hurting performance slightly by simply slowing down my updates?

Reply
- Brent Ozar
  August 27, 2019 3:57 am
  
  Shaun – due to the volume of requests I get, I can’t really do free Q&A in the comments, sorry. Your best bet would be to post this with the supporting table structures over at https://dba.stackexchange.com.
  
  Reply
Michael
October 22, 2019 10:05 am

How would I choose between the following index definitions?

CREATE NONCLUSTERED INDEX IX_ALLDATES
ON dbo.ClusterKeyColumnsTest ( ProcessDate, ShipDate )
INCLUDE ( DumbGUID )
WITH ( FILLFACTOR = 100 )

CREATE NONCLUSTERED INDEX IX_ALLDATES
ON dbo.ClusterKeyColumnsTest ( ProcessDate )
INCLUDE ( ShipDate, DumbGUID )
WITH ( FILLFACTOR = 100 )

Reply
- Brent Ozar
  October 22, 2019 10:18 am
  
  Great question – that’s exactly what we cover (pun intended) in my Fundamentals of Index Tuning class. It’s a little beyond what I can do proper justice to in a comment, though.
  
  Reply
Jeff
March 27, 2023 5:16 pm

If the clustered index key is included in the nonclustered indexes, in what position does it sit, in the list of indexed fields? at the beginning or the end?
(CIKey, field1, field2) or
(field1, field2, CIKey)?
Or is it just an “included” column?
That matters, doesn’t it?

Reply
- Brent Ozar
  March 27, 2023 5:47 pm
  
  How did you define the index when it was created? Give an example, since that influences the answer.
  
  Reply
  - Jeff
    March 27, 2023 8:27 pm
    
    This example is from a 3rd party app db. (I hope this is what you mean).
    
    ALTER TABLE [dbo].[Activity] ADD CONSTRAINT [PK_ACTIVITY] PRIMARY KEY CLUSTERED
    ([ActivityGUID] ASC)
    
    CREATE NONCLUSTERED INDEX [IX_14_ACTIVITY] ON [dbo].[Activity]
    ([PolicyGUID] ASC,[StatusCode] ASC,[EffectiveDate] ASC)
    INCLUDE([TransactionGUID])
    
    Reply
    - Brent Ozar
      March 27, 2023 8:41 pm
      
      Your clustered index is on ActivityGUID, but you didn’t put ActivityGUID anywhere in the index definition. Therefore, it’s in the *included* columns (along with TransactionGUID) – it’s not sorted by that column.
      
      For more info on that, check out my Mastering Index Tuning class.
      
      Reply