Hey, That’s Not My Sort!

Understand Your Plan

Mr. Optimizer does the rock n roll hoochie koo with Mrs. Optimizer, c. 1953.

When reading query plans, you may sometimes see a sort when you didn’t explicitly ask for data to be sorted.

Sometimes they show up to support order-preserving operators like stream aggregates, merge joins, or segments. Other times, they may show up to help optimize an operation that would otherwise rely on random I/O.

Random I/O is estimated to be much more expensive than sequential I/O. This comes from the optimizer being a bit long in the tooth, and not being hip to storage that doesn’t have design features in common with a record player.

This can be pretty confusing sometimes, so I’m going to highlight five queries and their plans where the optimizer has injected a sort operator, when we haven’t asked for any particular ordering.

The point isn’t whether the Sort is good or bad, or if you need to index to support the ordering that that optimizer has inflicted upon your plan, it’s just to show you why they sometimes show up.

Plan One: The Optimized Key Lookup

I’ll say it one more time in case you’re playing catch up: clustered index key columns are implicitly (or inherently, if you’re feeling rhymey) present in all of your nonclustered indexes. Where they’re stored in the index depends on uniqueness, but if your nonclustered index leads with columns that aren’t your clustered index keys, they’ll like fall out of order in the index pages.

When the optimizer chooses a key lookup plan, it effectively joins the nonclustered index to the clustered index using the clustered index key columns in both.

In some circumstances, the optimizer may decide to sort the output of the nonclustered index to reduce the random I/O involved in the key lookup.

It Just Runs Faster

This is what happened here. The Id column in the Posts table is the PK/CX. The nonclustered index that got used for our query looks like this.

Clearly the Id column isn’t in the definition.

Plan Two: The Sort To Support A Stream Aggregate

The Optimizer has chosen a Stream Aggregate over a Hash Aggregate. Unfortunately (using the same nonclustered index as before), the OwnerUserId column isn’t ordered in any way. It’s only an included column in the leaf level of the index, and not in any particular order, and doesn’t exist in the intermediate pages in a seekable fashion.

 

Islands In The Stream

Where Hash Match aggregates can take data in any order (this doesn’t necessarily make them better, but hang on), Stream Aggregates need everything in order first.

This is a trade off, though. For a Hash Match aggregate, it behaves a bit like a Hash Join in that all the rows have to arrive at the operator before hashing can begin. With a Stream Aggregate, if the data isn’t in index order, it needs to be sorted. The Sort behaves similarly to the Hash Join (huzzah! The trade off!), and all rows have to get to the Sort operator before sorting can begin.

The other thing that sorting and hashing have in common is they’re both memory consuming operators. This is what all the fuss about memory grants is about.

Plan Three: The Sort To Support A Merge Join

Just like a Stream Aggregate, Merge Joins need sorted input. There’s some sense to this plan, in that with two Sorts, you can deduplicate some amount of data prior to sorting. There’s also that Stream Aggregate after the Merge Join, which needs sorted input. If the Sort were after the Merge Join (for some strange reason), there could be a whole lot more data to sort.

 

Wild, huh?

Traffic Jam

The Hash Match is applied to one side of the join to reduce the number of rows to sort and join.

Why not both sides?

Good question.

Plan Four: The Distinct Sort

This one is a little more obvious, but I decided to include it because I think you’re all nice people who deserve mostly complete blog posts.

I hope someday you find them.

Ha ha haaaadhefoiwsoihrgjergopjh.

This wine is only okay.

If you use DISTINCT or GROUP BY in your queries, the optimizer may choose to implement them with a Distinct Sort.

Why not a Sort and then a Stream Aggregate?

Good question.

Maybe it’s both?

Plan Five: The Windowing Function

I know, you think I’m cheating here. You think I’m a big fat cheaty cheater. Liar liar plans on fire.

There’s an order by here, but the Sort has almost nothing to do with it. All of these queries use the same index as before. The order by for CreationDate is fully supported.

You’re a good time

Or at least it would be, if we didn’t have to Partition By OwnerUserId first.

Partition By sorts the data, too. This is why indexing correctly can be important.

The optimizer doesn’t really have any alternatives. In many of the other plans, things could have been done differently. This one is all you.

Or me.

Where were we?

Put On A Happy Face

Now when you see a mysterious Sort operator appear in your query plans, you’ll have some more clues as to why.

Hopefully this helps you out, and helps you make a decision about how to interpret and tune things where necessary.

Remember that the only cure for a SARGable Sort (a Sort that’s not on an Expression) is an index that supports it. Adding or rearranging indexes may help (query rewrites may also help, sometimes…).

Thanks for reading!

Previous Post
Book Review: Inside SQL Server 6.5
Next Post
Parameter Fluid Optimization

10 Comments. Leave new

  • Great post! I always learn so much from good examples that are explained well. Thanks!

    Reply
  • I’d guess that in plan three one Hash Match is enough to keep the Merge Join from becoming many-to-many.

    Reply
  • Nick Rotsching
    May 7, 2018 3:43 pm

    Just ran into this head-scratcher last week and this post came around at a fantastic time! I was hitting a “plan three” scenario with a merge join and couldn’t figure out where on Earth the sort was coming from. Makes a lot of sense when it’s spelled out clearly. Thanks for the insight Erik, very helpful!

    Reply
  • Carlos Bercero
    July 13, 2021 2:39 pm

    Hi Brent!

    This unintended sort is killing me. Is completely random.

    Random Key Lookup are way faster and less resource consuming than the unnecessary sort operation.

    Please tell me there’s a hint or something to force the engine not to do it!

    Best for you my friend!

    Carlos from Abarca

    Reply
  • So I just ran a query with a bunch of unions, and I got a DISTINCT SORT. I also ran the same query but within a CTE to break out the common conditions across all unions to tidy things up a bit… Now all of a sudden, the plan was identical, except that instead of a distinct sort, I got a regular SORT and a STREAM AGGREGATE…
    In this post, under the distinct sort section, you touch on that it’s basically the same thing but no explanation as to why it’s used. Do you have any idea whether one is better than the other in any situation?

    Reply
    • Hi! Unfortunately, personalized query tuning help is a little beyond what we can do in the blog post comments.

      Reply
      • I respect that and I really didn’t mean that I wanted help with query tuning 😛
        I really just wanted to know if there was any real difference between distinct sort and sort with stream aggregate. But if that’s still beyond what can be done as a comment here, no worries! I was mostly just curious, anyway 😀

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.