Index Key Column Order And Supporting Sorts

Whatever Man

When tuning queries that need to sort large amounts of data, sometimes it makes sense to stick the ordering elements as the leading key column(s) in your index. This allows SQL Server to easily sort your data by that column, and then access other key and included columns to satisfy other parts of the query, whether they’re joins, predicates, or selected columns.

That doesn’t mean that indexes with the sorting element first is always the best idea. Depending on how your query looks, you can sometimes put sorting elements at the end of an index definition and still have your sort be supported.

Pick It Up

Let’s start with this index:

If we run queries like this, the ORDER BY is unsupported by the index because it’s not the leading key column.

Slap happy

But if my query looks like this, it is supported.

Limited Yay

With direct equalities on each of the columns in the key of the index, we can sort by the last column in the index.

Unfortunately, this doesn’t work if we…

  • Have an incomplete WHERE clause
  • Our WHERE clause has inequalities (ranges) in it

By incomplete WHERE clause, I mean one that doesn’t utilize all of our index key columns, and by inequalities I mean >, >=, <, <=, and <>.

That means that both of these will result in a Sort operator:

Along with all of these:

It would be nice if the only requirement were that the key column prior to the sorting element use an equality predicate, but that’s not so either.

For instance, this will still result in a Sort.

Crud

But you do have some flexibility with other queries and other sort orders. For example, these will both result in the simple TOP plan that we saw earlier in the post. Again, the predicates need to be equalities to work.

Stone Cold Wrappin’

I think this is an interesting example of how much index definitions can matter, and how a good index for one variation of a query may not be so great for another variation. There’s probably some allegory about knowing your data or something in here, too.

Thanks for reading!

Previous Post
Parking Garages And Predicates
Next Post
So You Wanna Debug SQL Server Part 2

10 Comments. Leave new

  • What a great explanation Erik. I read all your blog posts as soon as they come out and you explain them clear as mud. Thank you so much for taking the pain to write these and publish them.

    Warm Regards

    • Erik Darling
      April 18, 2018 1:22 pm

      Thanks, I think? I’d like to explain more clearly than mud, where possible

  • Andrew Tobin
    April 18, 2018 6:48 pm

    A lot of these posts lately I feel is the kind of material I should know, but have either glossed over, or isn’t on the tip of my brain when I do things, so I really appreciate it.

    One thing I’m struggling with right now is: I don’t think key lookups are bad. I’m supporting a group of people who’ll add new covering indexes because every query, report, needs “seeks” even if they’re run rarely, and return 100s of rows. When your indexes are 10x the size of the database, and you have 75 indexes on one table, most of which are “covering” – then it’s getting a bit mad.

    I think that performance can be overrated, if it isn’t balanced out by other considerations. And I get the “disk is cheap”, even if my infrastructure peeps don’t agree – but maintenance isn’t.

    One consideration with the sorting, I’m coming around to the idea that sorting should be done client-side if possible, and logical. I don’t need spills to tempdb, or extra cycles on my server if it’s something simple that can be done on the client. The flip-side to this is, windowing functions, and other operators which require a sort to be done before they can be – because the data needs to be in a way they work with.

    • Andrew – yeah, key lookups can be totally fine if they’re done in small quantities (as a percentage of the overall table.) The StackOverflow.dbo.Posts table is a great example – I fully expect that if any query needs the Body field (a big ol’ wall of text), then it HAS to do a key lookup. I don’t ever want that included in an index, full stop.

    • Erik Darling
      April 19, 2018 7:44 am

      Andrew — gonna ask the obvious question!

      Have you run sp_BlitzIndex against these databases/tables to see what the damage really looks like?

      The sorting thing is largely correct, but there are cases where a sort might get introduced in your plan for other reasons. I have a post coming out about that in the near future.

      Thanks!

      • Andrew Tobin
        April 19, 2018 8:13 pm

        I’ve gotta tell you, I was just thinking “I wonder whether it’s too stalkery/fanboy like to want to send Erik a bottle of drink” – I run the Blitz tools constantly. And I email help@ way too often for Brent to be as kind as he is getting back to me.

        I’m currently consolidating all my indexes against a new Reporting box using it, because I can.

        I’m also really wishing there was a simple tool that allowed me to drag/drop consolidate index columns to merge indexes, and add columns, and drop old indexes. Especially with the tables that have way too many. I half think I should see whether I could add it in to the new Operations Manager thing.

        That being said, I’m learning while doing with indexes… never quite sure I’m getting the ordering right, or covering enough of my plans… because it’s so difficult to keep every plan that hits an index in mind while tweaking – I guess that’s why you monitor “missing indexes” though. Would that give a near duplicate missing index if the sort order was sub-optimal?

  • Brian Boodman
    April 19, 2018 8:17 am

    One of my dream features is the ability to create a SQL Constraint to mark a column as ascending or descending, on condition that the table has an index with the first column is ordered. This would create a permanent, hypothetical “child” index on [constrained column, all columns within the parent index]. The hypothetical index would be clustered or non-clustered, just like the parent index. Then, any query which tried to use that index would use the clustered index instead.

    That way, a table like “MyTable([ID] [int] IDENTITY(1,1) NOT NULL, [Created] [Date] NOT NULL)” could efficiently handle queries like, “select * from mytable where created = ‘2018-01-01′” without an index on Created.

    As of now, my choices are:
    1. Add a second index on Created
    2. Use Created, ID as my clustered index and add MyTableCreated to any tables that need to join to it efficiently
    3. Perform some very coarse queries first (e.g., via TableSample), then use those queries to define @minID and @maxID.

    This might not be quite as efficient as having a real index, but it could at least use the worst case it could at least some sort of binary search to limit table reads (similar to #3 above).

  • Eric in Sacramento
    May 1, 2018 10:42 am

    I don’t understand how the last two queries in this blog can result in the simple TOP plan shown above. If CreationDate is sorted in the index last–yet we’re not using an equality filter on all preceding columns in the index–how is it the CreationDate values returned don’t need to first be sorted?

    • Erik Darling
      May 1, 2018 10:53 am

      Eric — give the code another read. I’m not sorting by Creation Date in the last two.

      • Eric in Sacramento
        May 1, 2018 12:24 pm

        Thanks, Erik. It turns out that after reading the last two again and deciding I must’ve worded my question poorly, I actually had to read the code yet again to see what you pointed out. I see that we could also get the same TOP plan by using “…WHERE DownVotes = 1 ORDER BY CreationDate”. BTW, you’ve made a new fan. This blog is so well written with valuable info that even I understood it and look forward to putting it into practice.

Menu
{"cart_token":"","hash":"","cart_data":""}