Understand Your Plan

When reading query plans, you may sometimes see a sort when you didn’t explicitly ask for data to be sorted.
Sometimes they show up to support order-preserving operators like stream aggregates, merge joins, or segments. Other times, they may show up to help optimize an operation that would otherwise rely on random I/O.
Random I/O is estimated to be much more expensive than sequential I/O. This comes from the optimizer being a bit long in the tooth, and not being hip to storage that doesn’t have design features in common with a record player.
This can be pretty confusing sometimes, so I’m going to highlight five queries and their plans where the optimizer has injected a sort operator, when we haven’t asked for any particular ordering.
The point isn’t whether the Sort is good or bad, or if you need to index to support the ordering that that optimizer has inflicted upon your plan, it’s just to show you why they sometimes show up.
Plan One: The Optimized Key Lookup
I’ll say it one more time in case you’re playing catch up: clustered index key columns are implicitly (or inherently, if you’re feeling rhymey) present in all of your nonclustered indexes. Where they’re stored in the index depends on uniqueness, but if your nonclustered index leads with columns that aren’t your clustered index keys, they’ll like fall out of order in the index pages.
When the optimizer chooses a key lookup plan, it effectively joins the nonclustered index to the clustered index using the clustered index key columns in both.
1 2 3 4 |
SELECT p.OwnerUserId, SUM(p.Score) AS ScoreSum, MIN(p.CreationDate) AS MinScore, MAX(p.CreationDate) AS MaxScore FROM dbo.Posts AS p WITH ( INDEX = ix_posts_cd ) WHERE p.CreationDate >= '2016-01-01' GROUP BY p.OwnerUserId |
In some circumstances, the optimizer may decide to sort the output of the nonclustered index to reduce the random I/O involved in the key lookup.

This is what happened here. The Id column in the Posts table is the PK/CX. The nonclustered index that got used for our query looks like this.
1 2 |
CREATE INDEX ix_posts_cd ON dbo.Posts (CreationDate) INCLUDE (OwnerUserId); |
Clearly the Id column isn’t in the definition.
Plan Two: The Sort To Support A Stream Aggregate
The Optimizer has chosen a Stream Aggregate over a Hash Aggregate. Unfortunately (using the same nonclustered index as before), the OwnerUserId column isn’t ordered in any way. It’s only an included column in the leaf level of the index, and not in any particular order, and doesn’t exist in the intermediate pages in a seekable fashion.
1 2 3 4 |
SELECT p.OwnerUserId, SUM(p.Score) AS ScoreSum FROM dbo.Posts AS p WHERE p.CreationDate BETWEEN '2008-06-01' AND '2008-08-01' GROUP BY p.OwnerUserId |

Where Hash Match aggregates can take data in any order (this doesn’t necessarily make them better, but hang on), Stream Aggregates need everything in order first.
This is a trade off, though. For a Hash Match aggregate, it behaves a bit like a Hash Join in that all the rows have to arrive at the operator before hashing can begin. With a Stream Aggregate, if the data isn’t in index order, it needs to be sorted. The Sort behaves similarly to the Hash Join (huzzah! The trade off!), and all rows have to get to the Sort operator before sorting can begin.
The other thing that sorting and hashing have in common is they’re both memory consuming operators. This is what all the fuss about memory grants is about.
Plan Three: The Sort To Support A Merge Join
Just like a Stream Aggregate, Merge Joins need sorted input. There’s some sense to this plan, in that with two Sorts, you can deduplicate some amount of data prior to sorting. There’s also that Stream Aggregate after the Merge Join, which needs sorted input. If the Sort were after the Merge Join (for some strange reason), there could be a whole lot more data to sort.
1 2 3 4 |
SELECT COUNT(*) FROM dbo.Posts AS p JOIN dbo.Posts AS p2 ON p.OwnerUserId = p2.OwnerUserId |
Wild, huh?

The Hash Match is applied to one side of the join to reduce the number of rows to sort and join.
Why not both sides?
Good question.
Plan Four: The Distinct Sort
This one is a little more obvious, but I decided to include it because I think you’re all nice people who deserve mostly complete blog posts.
I hope someday you find them.
Ha ha haaaadhefoiwsoihrgjergopjh.

If you use DISTINCT or GROUP BY in your queries, the optimizer may choose to implement them with a Distinct Sort.
1 2 |
SELECT DISTINCT p.LastEditorDisplayName FROM dbo.Posts AS p |
Why not a Sort and then a Stream Aggregate?
Good question.
Maybe it’s both?
Plan Five: The Windowing Function
I know, you think I’m cheating here. You think I’m a big fat cheaty cheater. Liar liar plans on fire.
1 2 3 4 |
SELECT TOP 1000 p.OwnerUserId, DENSE_RANK() OVER ( PARTITION BY p.OwnerUserId ORDER BY p.CreationDate ) AS ranking_something FROM dbo.Posts AS p WHERE p.CreationDate >= '2010-10-30' |
There’s an order by here, but the Sort has almost nothing to do with it. All of these queries use the same index as before. The order by for CreationDate is fully supported.

Or at least it would be, if we didn’t have to Partition By OwnerUserId first.
Partition By sorts the data, too. This is why indexing correctly can be important.
The optimizer doesn’t really have any alternatives. In many of the other plans, things could have been done differently. This one is all you.
Or me.
Where were we?
Put On A Happy Face
Now when you see a mysterious Sort operator appear in your query plans, you’ll have some more clues as to why.
Hopefully this helps you out, and helps you make a decision about how to interpret and tune things where necessary.
Remember that the only cure for a SARGable Sort (a Sort that’s not on an Expression) is an index that supports it. Adding or rearranging indexes may help (query rewrites may also help, sometimes…).
Thanks for reading!
10 Comments. Leave new
Great post! I always learn so much from good examples that are explained well. Thanks!
Hi Lori! Glad you enjoyed it.
Or learned something.
Either way.
I’d guess that in plan three one Hash Match is enough to keep the Merge Join from becoming many-to-many.
Just ran into this head-scratcher last week and this post came around at a fantastic time! I was hitting a “plan three” scenario with a merge join and couldn’t figure out where on Earth the sort was coming from. Makes a lot of sense when it’s spelled out clearly. Thanks for the insight Erik, very helpful!
Glad it helped, Nick. These things sure are puzzling at first.
Hi Brent!
This unintended sort is killing me. Is completely random.
Random Key Lookup are way faster and less resource consuming than the unnecessary sort operation.
Please tell me there’s a hint or something to force the engine not to do it!
Best for you my friend!
Carlos from Abarca
No, sorry sir!
So I just ran a query with a bunch of unions, and I got a DISTINCT SORT. I also ran the same query but within a CTE to break out the common conditions across all unions to tidy things up a bit… Now all of a sudden, the plan was identical, except that instead of a distinct sort, I got a regular SORT and a STREAM AGGREGATE…
In this post, under the distinct sort section, you touch on that it’s basically the same thing but no explanation as to why it’s used. Do you have any idea whether one is better than the other in any situation?
Hi! Unfortunately, personalized query tuning help is a little beyond what we can do in the blog post comments.
I respect that and I really didn’t mean that I wanted help with query tuning 😛
I really just wanted to know if there was any real difference between distinct sort and sort with stream aggregate. But if that’s still beyond what can be done as a comment here, no worries! I was mostly just curious, anyway 😀