Launch week: the Season Pass & Fundamentals Week are 50% off — ends in 17d 11h 01mSee the sale

Performance Tuning

Where Clustered Index Keys Dare

Colonel Clustered

We’ve blogged a couple times about how clustered index key columns get stored in your nonclustered indexes: here and here.

But where they get stored is a matter of weird SQL trivia. You see, it depends on how you define your nonclustered index.

“It Depends”

We all scream for dependencies! Hooray!

If you define your nonclustered index as unique, they get stored down at the leaf level. Depending on how you draw your indexes, that’s either the top or the bottom.

If you define your nonclustered indexes as non-unique, they get stored all throughout your index. Irregardless.

Proofs

We need a couple tables.

What we have here are two nearly identical tables. They only difference is that one has a unique nonclustered index, and one is not unique.

If I run these two queries, I get two nearly(!) identical plans.

Rule The World

But if you’re paying very careful attention, there are slight differences.

In the unique nonclustered index, the predicate on the Id column (which is the PK/CX), is a residual predicate, and the predicate on MoreId is a seek predicate

You’re handsome.

That means we did a seek to the MoreId value we wanted, and then checked the predicate on Id.

In the non-unique nonclustered index, both predicates are seekable.

Cells. Cells. Cells.

What Does This Prove?

Not much on its own, but let’s zoom in a little.

If we look at sp_BlitzIndex, we get different output, too.

INCLUDE!

The unique nonclustered index shoes the Id column as an Include, and is 14.1MB.

That’s not an include.

The non-unique index doesn’t list the Id as an Include, but as a regular column. It’s also slightly larger, at 14.2MB.

These two things, combined with the query plans, should be enough.

Can We Duplicate It?

Here’s another example that’s a bit more explicit. Two identical tables, except for the nonclustered index definitions.

If you examine the plans, they have a similar outcome. The two key columns allow both predicates to be seeked, and the one key/one include index results in a residual.

Say that ten times fast while you’re drunk and alone in your dark office.

What Does It All Mean?

Aside from some SQL Jeopardy points, this isn’t likely to get you much. It’s mostly an implementation point of interest.

SQL Server adds clustered index key columns to nonclustered indexes for many good reasons. It makes key lookups possible, and is probably pretty helpful when checking for corruption.

When people ask you if there’s any penalty to adding clustered index key columns, the answer is (and always has been) no, absolutely not.

They’re going to be there no matter what, but now you know exactly where they’re going to be.

Thanks for reading!

Brent says: oh wow, the predicate-vs-seek-predicate is a really good way of showing how things are ordered in an index. Nice explanation.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

5 comments

  1. Interesting… so I wonder, is the reason for putting the clustered index keys as additional keys in a non-unique clustered index (instead of at the leaf level) a way to make that non-unique index actually unique (when you include the clustered index keys too)…
    And if so, what about if your clustered index is itself non-unique… I guess the uniqueifier would also be a key now in the nonclustered index, not at the leaf…
    This is making me think that in reality, all indexes in SQL server truly are unique in terms of how they are stored. Mind-blower…
    Thanks for the post!

    1. I wonder if it is to create an order, so that records with the same key value are not just in random order. Don’t know why that would be a problem, but… There’s always reasons for these little details – it’s figuring out what they are that’s hard though. Thanks Erik for an enlightening post!

    2. Yes. Say you have a table with 200 million rows. You create a CI on a column with a few ten thousand unique values. SQL would need a way to identify the location of every value.

      This used to be a big thing many years ago when creating a CI on an column other than a column with unique numbers was a big performance no no. Storage and memory were a lot more expensive and the extra space required for this data would have been considered wasted

  2. Great read!! It’s an interesting take on things. Also, to add If we run DBCC PAGE command on the indexes, for non-unique clustered index we see a new column of Primary key added.
    Thanks for writing!!

Leave a comment

Your email address will not be published. Required fields are marked *