Does an Index on Just The Clustering Key Ever Make Sense?

Indexing
10 Comments

Say you’ve got a table – in this case, I’m using the Stack Overflow Posts table, which holds all of the questions and answers:

StackOverflow.dbo.Posts

See the clustered index at the bottom? It’s on the Id field, which is an identity – starts at one and goes up to a bajillion.

So it would be really stupid to do this, right?

Why would someone ever create yet another index on Id when you already have the clustered index? I’ll show you why:

Here are the execution plans – all 3 of which use our Id index:

Scantastic

The performance differences are pretty eye-opening:

  • Clustered index scan: 50 seconds, 7,736,170 logical reads
  • Nonclustered index scan: 0.5 seconds, 36,767 logical reads

When SQL Server has to scan the table, it says, “Is there a faster object that I could scan instead? Is there another copy of the table I could use that has less data in it?” Indeed, our nonclustered index just on Id is way smaller, as sp_BlitzIndex shows:

sp_BlitzIndex analyzing index sizes

The clustered index (row 2) is 70GB because it has all the fields, whereas the nonclustered index on Id is a tiny 285MB in comparison. Turns out all those big string fields take up a lot of space on the clustered index. Who knew?

My favorite query plan of the 3, though, is the count WHERE Body IS NOT NULL. You’d think SQL Server would have to go check the Body field, right? But no – the Body field was specified as non-nullable (scroll back up to check the table definition) so SQL Server can simply ignore it. Ahh, the sweet benefits of the right null-vs-not-null table definitions.

The index on dbo.Posts.Id can even be used to speed up foreign key creation on other tables. Illustrating this is just a little trickier since foreign key creations don’t show execution plans (nor do they show up in the plan cache.) To show it, I restarted my instance, created a foreign key between Comments.PostId and Posts.Id, and then ran sp_BlitzCache. (You could also probably show this with Extended Events, but I’m doing this fast, dear reader.)

Score another one for our nonclustered index

I’m not saying that those queries are a good idea – I’m just explaining why someone might have created that index, and what might backfire when you drop it. I’m also not saying you can never drop it, either – I’m just warning you to keep an eye out for what might slow down.

And by the way, if this comes as a surprise, I bet you haven’t seen my free course, How to Think Like the SQL Server Engine. It teaches you the differences between clustered & nonclustered indexes – something everybody thinks they know until I start asking them questions.

Previous Post
So You Wanna Debug SQL Server Part 2
Next Post
[Video] Office Hours 2018/4/18 (With Transcriptions) with Special Guest Drew Furgiuele

10 Comments. Leave new

  • alen teplitsky
    April 20, 2018 8:59 am

    I don’t automatically hate clustered PK’s on an identity column, but i’ve seen tables where the index to column ratio is close to 1:1 because of this

    Reply
  • Eryk Seppanen
    April 20, 2018 11:26 am

    That 3rd one had me scratching my head. Sneaky!

    Reply
  • I was reading through it and suddenly I saw….FREE COURSE!!! 🙂
    Good article also. I would be doing that free course right now because it doesn’t hurt at all to get back up to speed on some of that essential information, but the company wants us to learn more about AWS. Even though I consider your free course and the information it contains may actually be more valuable!

    Reply
  • Great post! We use these a lot for super fast EXISTS and NOT EXISTS checks.

    Reply
  • Aaron Morelli
    April 20, 2018 1:06 pm

    “Stuck on 2008 R2? Roll your own column store indexes! Put a single-col b-tree on every column!”

    Reply
  • I saw this on a database I was working on the other day. The LOB kills you and it is there because of the nvarchar(max) field, Body. Our (max) field has a picture in it. The clustered index has that picture in it causing the CPU to go crazy. Told them to get that picture out of the database. In my case, the execution plans using the cluster key is using 86 percent of the query time.

    Reply
  • Mind Blown!
    Ah Ha!
    When you know it but then Brent paints it together for you, again, and makes you slap yourself and go, “YEAH!”
    Thank You, Brent.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.