Performance Benefits of Unique Indexes

SQL Server
29 Comments

SQL server loves unique indexes

Why? Because it’s lazy. Just like you. If you had to spend all day flipping pages around, you’d probably be even lazier. Thank Codd someone figured out how to make a computer do it. There’s some code below, along with some screen shots, but…

TL;DR

SQL is generally pretty happy to get good information about the data it’s holding onto for you. If you know something will be unique, let it know. It will make better plan choices, and certain operations will be supported more efficiently than if you make it futz around looking for repeats in unique data.

There is some impact on inserts and updates as the constraint is checked, but generally it’s negligible, especially when compared to the performance gains you can get from select queries.

So, without further ado!

Q: What was the last thing the Medic said to the Heavy?

A: Demoooooooooo!

We’ll start off by creating four tables. Two with unique clustered indexes, and two with non-unique clustered indexes, that are half the size. I’m just going with simple joins here, since they seem like a pretty approachable subject to most people who are writing queries and creating indexes. I hope.

Now that we have our setup, let’s look at a couple queries. I’ll be returning the results to a variable so we don’t sit around waiting for SSMS to display a bunch of uselessness.

What does SQL do with these?

Ugly as a river dolphin, that one.
Ugly as a river dolphin, that one.

Not only does the query for the unique indexes choose a much nicer merge join, it doesn’t even get considered for parallelilzazation going parallel. The batch cost is about 1/3, and the sort is fully supported.

The query against non-unique tables requires a sizable memory grant, to boot.

Looking at the STATISTICS TIME and IO output, there’s not much difference in logical reads, but you see the non-unique index used all four cores available on my laptop (4 scans, 1 coordinator thread), and there’s a worktable and workfile for the hash join. Overall CPU time is much higher, though there’s only ever about 100ms difference in elapsed time over a number of consecutive runs.

Fair fight

So, obviously going parallel threw some funk on the floor. If we force a MAXDOP of one to the non-unique query, what happens?

You Get Nothing! You Lose! Good Day, Sir!
You Get Nothing! You Lose! Good Day, Sir!

Yep. Same thing, just single threaded this time. The plan looks a little nicer, sure, but now the non-unique part is up to 85% of the batch cost, from, you know, that other number. You’re not gonna make me say it. This is a family-friendly blog.

Going back to TIME and IO, the only noticeable change is in CPU time for the non-unique query. Still needed a memory grant, still has an expensive sort.

Just one index

The nice thing is that a little uniqueness goes a long way. If we join the unique table to the non-unique join table, we end up with nearly identical plans.

You're such a special flower.
You’re such a special flower.

Done and doner

So, you made it to the end. Congratulations. I hope your boss didn’t walk by too many times.

By the way, the year is 2050, the Cubs still haven’t won the world series, and a horrible race of extraterrestrials have taken over the Earth and are using humans as workers to mine gold. Wait, no, that’s something else.

But! Hey! Brains! You have more of them now, if any of this was enlightening to you. If you spaced out and just realized the page stopped scrolling, here’s a recap:

  • Unique indexes: SQL likes’em
  • You will generally see better plans when the optimizer isn’t concerned with duplicate values
  • There’s not a ton of downside to using them where possible
  • Even one unique index can make a lot of difference, when joined with a non-unique index.

As an aside, this was all tested on SQL Server 2014. An exercise for Dear Reader; if you have SQL Server 2012, look at the tempdb spills that occur on the sort and hash operations for the non-unique indexes. I’m not including them here because it’s a bit of a detour. It’s probably not the most compelling reason to upgrade, but it’s something to consider — tempdb is way less eager to write to disk these days!

Thanks for reading!

Brent says: I always wanted proof that unique clustered indexes made for better execution plans!

Previous Post
Consulting Lines: “I have a hard stop at…”
Next Post
How Do You Manage DBAs? Part 2: Training Budgets

29 Comments. Leave new

  • Database Antichrist
    August 18, 2015 8:49 am

    +1 TF2 humor!

    Reply
  • +1 Willy Wonka reference…
    Great article.

    Reply
  • Is this only a factor if the data in the index is “naturally” unique; or will it also affect indexes if using the the ‘make my index unique’ unique flag when creating indexes?

    Reply
  • I am really enjoying your posts – keep them (and the Willy Wonka references) coming!

    Reply
  • John G Hohengarten
    August 19, 2015 9:42 am

    So… if I understand this correctly, are you suggesting that if we have, for example, a Clustered Index on a surrogate key in a Data Warehouse environment, which we know the Surrogate Key should always be unique since it’s an IDENTITY column, if we make it a Unique Clustered Index (still on the Surrogate Key) it will [generally] perform better than the non-unique Clustered Index on the same column?

    Reply
  • Brad Featherstone
    August 19, 2015 11:12 am

    …”used all four cores available on my laptop (4 scans, 1 coordinator thread)”.
    I don’t see it at all.
    I thought that ‘Scan’ counted the number of times the object was scanned.
    Am I having a bad reading day?

    Reply
    • STATISTICS IO, for parallel plans, counts things strangely. For simplicity, each core got a thread (1-4). Each thread scanned the object once. So four scans. Then there’s a coordinator thread (0) that gathers rows from each scan. So it counts five, because SQL spawned a thread to manage parallelism.

      Reply
  • How about unique non-clustered indexes?

    Reply
    • Same deal with those. Using these tables, you can add the clustered indexes as nonclustered indexes instead, and see the same patterns emerge.

      Reply
      • To be complete (tested it just now):

        It works the same way if you create a UNIQUE CLUSTERED INDEX instead of the PK.

        Reply
  • Vince Chapman
    August 21, 2015 7:42 am

    The comparison between execution plans when I run the initial two SELECTs are identicial the only way I can get the Unique Table to use a Merge Join is to add a HINT of MAXDOP 1.

    It this potentially identicating that the configuration of my server settings are incorrect?

    Reply
  • Ha, saw the TF2 humor and had to do a double take.

    The downside seems to be that “UNIQUE” on large tables often doubles the initial index creation time.

    Reply
    • It can, sure. I think it depends a bit on the data you’re indexing, and how cool your server is. Sometimes sorting in tempdb helps a bit as well.

      Reply
  • Please edit “2050, the Cubs still haven’t won the world series” this is incorrect 🙂

    Reply
  • Yuri Leventman
    November 29, 2016 5:50 pm

    Hi Erik,
    What about the known impact of UNIQUE index on CDC operation codes?
    When index is unique and any of its key columns is in the list of tracked columns on e.g. MyCdcTable_CT table, then SQL engine creates so called deferred updates (stated as normal behavior), which are reported as DELETE(1) and INSERT(2) operations vs. an UPDATE(3 & 4) operation on the CDC side.
    Sources:
    https://social.msdn.microsoft.com/Forums/sqlserver/en-US/72b37e54-c22a-4cf9-8997-69372f17f75c/cdc-operation-giving-wrong-result?forum=sqldatabaseengine
    https://support.microsoft.com/en-us/kb/238254
    That means, if you have CDC enabled, say good bye to UNIQUE indexes.
    Is that correct or there is still a workaround for base table (not the …._CT one) level updates?
    Thank you.

    Reply
    • Yuri — so, you’re suggesting that people not use Primary Keys if they’re using CDC? Or unique clustered indexes? Those can result in the split/sort/collapse behavior of the deferred update that you’re talking about, as well.

      As far as fixes and workarounds go, I’m not sure. I don’t work with CDC often (if at all), and don’t keep up to date on it. The KB article you posted is very old, but the forum post is very new. It looks like an ongoing issue, I guess.

      Thanks!

      Reply
      • Erik,
        thank you for your reply.
        There is no workaround provided/suggested by MS or SQL forums at this point.
        While the impact of deleted/updated PK’s key value(s) is unlikely, under normal circumstances, UNIQUE clustered and non-clustered indexes impact the BI side of data load process on the DWH server (in our case at least).
        There is a KB #302341(Last Review: 12/06/2015 03:30:31), which is not that old, it suggests to use trace flag 8207. Unfortunately that flag works only for a single row update (aka a singleton update).
        So, the possible solution could be either create a scheduled jobs which fixes those 1 and 2 codes to 3 and 4 accordingly (this could be a time/resource sensitive/expensive process though) or to have the BI side data load process to be deferred update case aware, so to speak, and analyze the existence of those updated keys values (like being really DELETED and/or INSERTED) against base table(s) accordingly to avoid a confusion caused by reported CDC operation codes.
        Thank you.

        Reply
  • Erik/All, I know this is an older post, but I have an interesting situation with a vendor provided solution. Erik, you indicate that SQL Server loves unique indexes but this vendor solution seems to add (randomly in a lot of cases) unnecessary uniqueness by appending the PK (ID) column to the end of several indexes on any given table. Based on my understand of inserts/updates/deletes, I can’t imagine this is helping out. Can too many unique indexes on a table, trailing with the primary key column for uniqueness, have a negative performance impact?

    Reply
  • Gregory Liénard
    May 14, 2022 7:46 am

    In our cases, I prefer to use not unique indexes. For instance we have a table with hundreds of millions keywords to be analyzed. The keyword is unique (clustered index & primary key). But as we cannot do them all at once, we have a priority column as well. So we create a not unique, non-clustered index with only the Priority column and the keyword as included column.

    This gives the following performance advantages:
    1. No additional constraint is checked when adding.
    2. When inserting a keyword, any page which has free space left and containing the same priority entries can be used. Meaning less fragmentation, less maintenance (rebuild/reorganize) Note that we also delete from this table and priority is just a tinyint, so this happens in most cases.

    Reply
    • I’m a little confused – are you saying you have a heap plus a nonclustered index, or a clustered index plus a nonclustered index? Or are you thinking that the nonclustered index is the only storage for the table? (It isn’t.)

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.