Where Clustered Index Keys Dare

Colonel Clustered

We’ve blogged a couple times about how clustered index key columns get stored in your nonclustered indexes: here and here.

But where they get stored is a matter of weird SQL trivia. You see, it depends on how you define your nonclustered index.

“It Depends”

We all scream for dependencies! Hooray!

If you define your nonclustered index as unique, they get stored down at the leaf level. Depending on how you draw your indexes, that’s either the top or the bottom.

If you define your nonclustered indexes as non-unique, they get stored all throughout your index. Irregardless.

Proofs

We need a couple tables.

What we have here are two nearly identical tables. They only difference is that one has a unique nonclustered index, and one is not unique.

If I run these two queries, I get two nearly(!) identical plans.

Rule The World

But if you’re paying very careful attention, there are slight differences.

In the unique nonclustered index, the predicate on the Id column (which is the PK/CX), is a residual predicate, and the predicate on MoreId is a seek predicate

You’re handsome.

That means we did a seek to the MoreId value we wanted, and then checked the predicate on Id.

In the non-unique nonclustered index, both predicates are seekable.

Cells. Cells. Cells.

What Does This Prove?

Not much on its own, but let’s zoom in a little.

If we look at sp_BlitzIndex, we get different output, too.

INCLUDE!

The unique nonclustered index shoes the Id column as an Include, and is 14.1MB.

That’s not an include.

The non-unique index doesn’t list the Id as an Include, but as a regular column. It’s also slightly larger, at 14.2MB.

These two things, combined with the query plans, should be enough.

Can We Duplicate It?

Here’s another example that’s a bit more explicit. Two identical tables, except for the nonclustered index definitions.

If you examine the plans, they have a similar outcome. The two key columns allow both predicates to be seeked, and the one key/one include index results in a residual.

Say that ten times fast while you’re drunk and alone in your dark office.

What Does It All Mean?

Aside from some SQL Jeopardy points, this isn’t likely to get you much. It’s mostly an implementation point of interest.

SQL Server adds clustered index key columns to nonclustered indexes for many good reasons. It makes key lookups possible, and is probably pretty helpful when checking for corruption.

When people ask you if there’s any penalty to adding clustered index key columns, the answer is (and always has been) no, absolutely not.

They’re going to be there no matter what, but now you know exactly where they’re going to be.

Thanks for reading!

Brent says: oh wow, the predicate-vs-seek-predicate is a really good way of showing how things are ordered in an index. Nice explanation.

Previous Post
Book Review: Microsoft SQL Server Training, Volume 1
Next Post
Informal Poll: What Are Your Favorite Error Log Messages?

5 Comments. Leave new

  • Steve Kratowicz
    May 11, 2018 11:52 am

    Interesting… so I wonder, is the reason for putting the clustered index keys as additional keys in a non-unique clustered index (instead of at the leaf level) a way to make that non-unique index actually unique (when you include the clustered index keys too)…
    And if so, what about if your clustered index is itself non-unique… I guess the uniqueifier would also be a key now in the nonclustered index, not at the leaf…
    This is making me think that in reality, all indexes in SQL server truly are unique in terms of how they are stored. Mind-blower…
    Thanks for the post!

    Reply
    • Erik Darling
      May 11, 2018 11:59 am

      Steve — yeah, and without GUIDs 😉

      Reply
    • I wonder if it is to create an order, so that records with the same key value are not just in random order. Don’t know why that would be a problem, but… There’s always reasons for these little details – it’s figuring out what they are that’s hard though. Thanks Erik for an enlightening post!

      Reply
    • alen teplitsky
      May 25, 2018 8:52 am

      Yes. Say you have a table with 200 million rows. You create a CI on a column with a few ten thousand unique values. SQL would need a way to identify the location of every value.

      This used to be a big thing many years ago when creating a CI on an column other than a column with unique numbers was a big performance no no. Storage and memory were a lot more expensive and the extra space required for this data would have been considered wasted

      Reply
  • Varun Chopra
    May 25, 2018 6:07 am

    Great read!! It’s an interesting take on things. Also, to add If we run DBCC PAGE command on the indexes, for non-unique clustered index we see a new column of Primary key added.
    Thanks for writing!!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}