Clustered Index key columns in Nonclustered Indexes

Clustered indexes are fundamental

And I’m not just saying that because Kendra is my spiritual adviser!

They are not ~a copy~ of the table, they are the table, ordered by the column(s) you choose as the key. It could be one. It could be a few. It could be a GUID! But that’s for another time. A long time from now. When I’ve raised an army, in accordance with ancient prophecy.

What I’d like to focus on is another oft-neglected consideration when indexing:

Columns in the clustering key will be in all of your nonclustered indexes

“How do you know?”
“Can I join your army?”
“Why does it do that?”

You may be asking yourself all of those questions. I’ll answer one of them with a demo, and one of them with “no”.

D-D-D-DEMO FACE

That code will create a pretty rudimentary table of random data. It was between 3-4 GB on my system.

You can see a Clustered Index was created on the ID column, and a Nonclustered Index was created on the three date columns. The DumbGUID column was, of course, neglected as a key column in both. Poor GUID.

Running these queries, the Nonclustered Index on the date columns will be used, because SQL does this smart thing where it takes page count into consideration when choosing an index to use.

Notice that the only time a Key Lookup was needed is for the last query, where we also select the DumbGUID column. You’d think it would be needed in all three, since the ID column isn’t explicitly named in the key or as an include in the Nonclustered Index.

Key Lookups really are really expensive and you should really fix them. Really.
Key Lookups really are really expensive and you should really fix them. Really.

sp_BlitzIndex® to the rescue

If you find yourself trying to figure out indexes, Kendra’s sp_BlitzIndex® stored procedure is invaluable. It can do this cool thing where it shows you SECRET columns!

Since I’ve already ruined the surprise, let’s look at the indexes on our test table.

Here’s the output. The detail it gives you on index columns and datatypes is really awesome. You can see the ID column is part of the Nonclustered Index, even though it isn’t named in the definition.

Ooh, shapes!
Ooh, shapes!

One step beyond

Run the code to drop and recreate our test table, but this time add these indexes below instead of the original ones.

Running the same three queries as before, our plans change only slightly. The Key Lookup is gone, and the statement cost per batch has evened out.

You too can be a hero by getting rid of Key Lookups.
You too can be a hero by getting rid of Key Lookups.

But notice that, for the second query, where we’re searching on the OrderDate column, we’re still scanning the Nonclustered Index.

We moved that out of the Nonclustered Index and used it as part of the Clustering Key the second time around. What gives?

Running sp_BlitzIndex® the same as before, OrderDate is now a secret column in the Nonclustered Index.

Fun Boy Three does the best version of Our Lips are Sealed, BTW.
Fun Boy Three does the best version of Our Lips are Sealed, BTW.

Did we learn anything?

Sure did!

  1. SQL ‘hides’ the columns from the Key of the Clustered Index in Nonclustered Indexes
  2. Since those columns are part of the index, you don’t need to include them in the definition
  3. Secret columns in Nonclustered Indexes can be used to avoid keylookups AND satisfy WHERE clause searches!

Well, at least until all indexes are ColumnStore Indexes.

Kendra says: One of the most common questions I get is whether there’s a penalty or downside if you list the columns from the clustered index in your nonclustered index. You can stop worrying: there’s no penalty.

Previous Post
Consulting Lines: “Write this down, because I’m not going to write it.”
Next Post
When Did My Azure SQL Database Server Restart?

26 Comments. Leave new

  • You, sir, are a mind reader! I had scheduled time next week to play with this, to get me some definitive answers (people kept yelling at me about how i couldn’t handle the truth – le sadface). I applaud you for the shortcut <3

    Reply
  • I agree that explicitly including the clustered column(s) in the non-clustered index is not required. It is interesting (confusing?) that Sys.Index_Columns includes the column(s) if you specify them and does not include them if you don’t specify them.
    You can verify that with the SSMS GUI (Index Properties) or a Demo like this.

    –Create Index ix_CIColumns_Explicit On dbo.theDate ([DayOfYear], theDate);

    –Create Index Ix_CIColumns_Implicit On dbo.theDate ([DayOfYear]);

    Select
    [Index] = si.name
    , [Column] = sc.name

    From
    sys.indexes As si
    Inner Join sys.index_columns As ic
    On si.object_id = ic.object_id
    And si.index_id = ic.index_id
    Inner Join sys.columns As sc
    On sc.object_id = si.object_id
    And sc.column_id = ic.index_column_id
    Where
    si.object_id = Object_Id(‘[dbo].[theDate]’, ‘U’)
    And si.name In (‘ix_CIColumns_Explicit’, ‘Ix_CIColumns_Implicit’)
    Order By
    si.name
    ,sc.name

    Results
    Index Column
    ix_CIColumns_Explicit Id
    ix_CIColumns_Explicit theDate
    Ix_CIColumns_Implicit Id

    Reply
  • This topic came up in AQI (Advanced Querying and Indexing) class in Portland (Aug 3-7) and while it’s true that specifying clustered columns in a NCI is unneeded it does not actually cause any harm nor does it cause the index to grow larger in size. It simply makes the hidden column visible. Having better readability so that future developers better understand what columns the creator intended to have coverage on is something I see as value add. The biggest risk I have of exposing these hidden columns is the potential for a DBA years down the road thinking that I did this out of ignorance and criticize it as “ignorance”.

    On a side note, for those who haven’t taken that boot camp its worth its weight in gold. Just be prepared to drink through a fire hose because there is no fluff parts of the class.

    Reply
    • No disagreements here. What I was aiming for was to show that the key columns from the clustered index aren’t just dead weight in the nonclustered index. Also, choose your clustered index keys carefully, because they really do end up in all of your nonclustered indexes.

      Glad you enjoyed the class.

      Reply
  • You’ve answered my question there Tony, I wanted to check there was no additional cost to adding the secret columns in and that it was still considered good practice for readability.

    Thanks.

    Reply
  • What about the following situation:-

    If I have a 2 field clustered index and I have a non-clustered index, if by including one of the clustered index fields in the non-clustered index I can declare that it is unique would that benefit SQL as SQL would then know that that combination of fields are unique?

    Reply
    • I’m a total sucker for unique indexes, so I’d say go for it, assuming the clustered index data types are narrow ones (integers or dates).

      Reply
    • Adding constraints, foreign keys, (and of course indexes ) will result in some level of overhead in DML. For OLTP databases it’s especially important to look at the DMV’s for reads/writes for objects/indexes to get proof that the cost of your index/constraint is not showing a very high ratio of writes to reads. When this occurs it may be a better idea to experiment with other ways to enforce data integrity.

      For select based statements that do not get a trivial plan, the optimizer should formulate a good enough execution plan that takes into account the check constraints / foreign keys / and available indexes to crunch through your result set as best as it can. If the code was hashed together inefficiently the constraints/fk’s/indexes may provide an unmeasurable benefit.

      Reply
  • Anders Pedersen
    August 24, 2015 10:24 am

    Very cool. But you do mean that it took up 2-3 MB on your system, right? Or am I doing something completely wrong?

    Reply
  • Hi Erik.
    Great post, thanks for sharing.
    One question, if we imagine that demo table is an actual table in a production system, and those are the 3 most common queries executed against it it’d be good to keep the non-clustered index with OrderDate as the first key column, right? I mean, to keep the index seek instead of the index scan.

    Thanks!

    Reply
    • Here’s a brainteaser for you:

      If the table has 1 million rows in it, would you rather your query:
      Do 1 million index seeks
      Do 1 index scan of 1 million rows?

      😀

      Reply
      • Hi Erik. Good one. I’m not really sure but let me see if I follow or if I’m completely lost.

        1 index scan of 1 million rows would be preferable due to after reading all the index it would be kept in memory unlike if we’re just making index seeks constantly, every index seek would mean to read from disk a few pages. Even though they would be fewer pages with the index seek they would be read from disk and not from memory.

        Thanks!

        Reply
  • Thank you very much for the details. but i am still confused , why it used non clustered index scan instead clustered index seek.
    its fine if clustered index key columns is part of non clustered index but still why it used costly operation means non clustered index scan than clustered index seek.

    Reply
    • Hi Ashish,

      Scans happen when the optimizer deems them more efficient, and it’s often/mostly/usually correct.

      I’ll ask you the same question I asked another commenter: would you rather do 1 million seeks, or one scan of a million rows?

      Thanks!

      Reply
  • “Kendra says: One of the most common questions I get is whether there’s a penalty or downside if you list the columns from the clustered index in your nonclustered index. You can stop worrying: there’s no penalty.”

    I did a test in the StackOverflow2010 database, wherein I created 2 indexes on the Posts table (clustered index key = Id column): one with Id & CloseDate and the other with just CloseDate. The one with Id & CloseDate is 7% larger. That’s a penalty, right?

    Reply
    • Richard – go ahead and post your code and I’ll see if I can reproduce that. Thanks!

      Reply
      • Oh dear! Math idiocy. It’s bigger, but I don’t think we’ll get too hot & bothered about 0.072%! Still, it IS bigger.

        Anyway:
        CREATE NONCLUSTERED INDEX [IX_ClosedDate] ON [dbo].[Posts] ([ClosedDate] ASC)
        CREATE NONCLUSTERED INDEX [IX_Id_ClosedDate] ON [dbo].[Posts] ([Id] ASC,[ClosedDate] ASC)

        SELECT
        OBJECT_SCHEMA_NAME(i.OBJECT_ID) AS SchemaName,
        OBJECT_NAME(i.OBJECT_ID) AS TableName,
        i.name AS IndexName,
        i.index_id AS IndexID,
        (8.000 * SUM(a.used_pages))/1024 AS ‘Indexsize(MB)’
        FROM sys.indexes AS i
        JOIN sys.partitions AS p ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id
        JOIN sys.allocation_units AS a ON a.container_id = p.partition_id
        WHERE OBJECT_NAME(i.OBJECT_ID) = ‘Posts’
        GROUP BY i.OBJECT_ID,i.index_id,i.name
        ORDER BY OBJECT_NAME(i.OBJECT_ID),i.index_id

        SELECT 100 – ((65.10156250/65.14843750) * 100)

        Reply
        • Richard – Kendra’s talking about including them – as in, at the END of an index. The two indexes you posted are wildly, wildly different – if you do WHERE ClosedDate = GETDATE(), for example, SQL Server can seek on the first index, but not the second.

          Reply
  • Hi,
    Consider this scenario.

    I have an Order table with a clustered index on the ID column (primary key). I’ve noticed that someone has created this non-clustered index on the table:

    CREATE NONCLUSTERED INDEX [IX_Order_Dashboard] ON [dbo].[Order]
    (
    [ID] ASC,
    [UserID] ASC,
    [StatusID] ASC
    )
    Note that the first column is ID, which is the clustered key. My initial reaction was that the index is pointless because it’s on the clustered index key, so SQL Server would just do a seek on the clustered index instead.

    From looking at the usage stats I can see that SQL Server is frequently seeking, scanning and updating this index. Here’s an example query which does a seek on the non-clustered index(rather than the clustered idnex) on Order

    SELECT c.name
    FROM customer as c
    inner join order as o on c.orderid = o.ID
    WHERE c.ID = @customerid

    Notably, the non-clustered index is significantly smaller than the clustered index because it only contains 3 columns. Does that meant it will do fewer reads? Well I tested this by disabling the non-clustered index and re-running the query.

    What happened? SQL Server now did an index seek on the clustered index, and the number of reads was exactly the same.

    So I ask – is this non-clustered index pointless and in fact, actually hurting performance slightly by simply slowing down my updates?

    Reply
  • How would I choose between the following index definitions?

    CREATE NONCLUSTERED INDEX IX_ALLDATES
    ON dbo.ClusterKeyColumnsTest ( ProcessDate, ShipDate )
    INCLUDE ( DumbGUID )
    WITH ( FILLFACTOR = 100 )

    CREATE NONCLUSTERED INDEX IX_ALLDATES
    ON dbo.ClusterKeyColumnsTest ( ProcessDate )
    INCLUDE ( ShipDate, DumbGUID )
    WITH ( FILLFACTOR = 100 )

    Reply
    • Great question – that’s exactly what we cover (pun intended) in my Fundamentals of Index Tuning class. It’s a little beyond what I can do proper justice to in a comment, though.

      Reply
  • If the clustered index key is included in the nonclustered indexes, in what position does it sit, in the list of indexed fields? at the beginning or the end?
    (CIKey, field1, field2) or
    (field1, field2, CIKey)?
    Or is it just an “included” column?
    That matters, doesn’t it?

    Reply
    • How did you define the index when it was created? Give an example, since that influences the answer.

      Reply
      • This example is from a 3rd party app db. (I hope this is what you mean).

        ALTER TABLE [dbo].[Activity] ADD CONSTRAINT [PK_ACTIVITY] PRIMARY KEY CLUSTERED
        ([ActivityGUID] ASC)

        CREATE NONCLUSTERED INDEX [IX_14_ACTIVITY] ON [dbo].[Activity]
        ([PolicyGUID] ASC,[StatusCode] ASC,[EffectiveDate] ASC)
        INCLUDE([TransactionGUID])

        Reply
        • Your clustered index is on ActivityGUID, but you didn’t put ActivityGUID anywhere in the index definition. Therefore, it’s in the *included* columns (along with TransactionGUID) – it’s not sorted by that column.

          For more info on that, check out my Mastering Index Tuning class.

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.