Hey, That’s Not My Sort!

Last Updated April 28, 2018

Understand Your Plan

Mr. Optimizer does the rock n roll hoochie koo with Mrs. Optimizer, c. 1953.

When reading query plans, you may sometimes see a sort when you didn’t explicitly ask for data to be sorted.

Sometimes they show up to support order-preserving operators like stream aggregates, merge joins, or segments. Other times, they may show up to help optimize an operation that would otherwise rely on random I/O.

Random I/O is estimated to be much more expensive than sequential I/O. This comes from the optimizer being a bit long in the tooth, and not being hip to storage that doesn’t have design features in common with a record player.

This can be pretty confusing sometimes, so I’m going to highlight five queries and their plans where the optimizer has injected a sort operator, when we haven’t asked for any particular ordering.

The point isn’t whether the Sort is good or bad, or if you need to index to support the ordering that that optimizer has inflicted upon your plan, it’s just to show you why they sometimes show up.

Plan One: The Optimized Key Lookup

I’ll say it one more time in case you’re playing catch up: clustered index key columns are implicitly (or inherently, if you’re feeling rhymey) present in all of your nonclustered indexes. Where they’re stored in the index depends on uniqueness, but if your nonclustered index leads with columns that aren’t your clustered index keys, they’ll like fall out of order in the index pages.

When the optimizer chooses a key lookup plan, it effectively joins the nonclustered index to the clustered index using the clustered index key columns in both.

SELECT   p.OwnerUserId, SUM(p.Score) AS ScoreSum, MIN(p.CreationDate) AS MinScore, MAX(p.CreationDate) AS MaxScore
FROM     dbo.Posts AS p WITH ( INDEX = ix_posts_cd )
WHERE    p.CreationDate >= '2016-01-01'
GROUP BY p.OwnerUserId

SELECT p.OwnerUserId, SUM(p.Score) AS ScoreSum, MIN(p.CreationDate) AS MinScore, MAX(p.CreationDate) AS MaxScore

FROM dbo.Posts AS p WITH ( INDEX = ix_posts_cd )

WHERE p.CreationDate >= '2016-01-01'

GROUP BY p.OwnerUserId

In some circumstances, the optimizer may decide to sort the output of the nonclustered index to reduce the random I/O involved in the key lookup.

This is what happened here. The Id column in the Posts table is the PK/CX. The nonclustered index that got used for our query looks like this.

CREATE INDEX ix_posts_cd
    ON dbo.Posts (CreationDate) INCLUDE (OwnerUserId);

1 2	CREATE INDEX ix_posts_cd ON dbo.Posts (CreationDate) INCLUDE (OwnerUserId);

Clearly the Id column isn’t in the definition.

Plan Two: The Sort To Support A Stream Aggregate

The Optimizer has chosen a Stream Aggregate over a Hash Aggregate. Unfortunately (using the same nonclustered index as before), the OwnerUserId column isn’t ordered in any way. It’s only an included column in the leaf level of the index, and not in any particular order, and doesn’t exist in the intermediate pages in a seekable fashion.

SELECT   p.OwnerUserId, SUM(p.Score) AS ScoreSum
FROM     dbo.Posts AS p
WHERE    p.CreationDate BETWEEN '2008-06-01' AND '2008-08-01'
GROUP BY p.OwnerUserId

SELECT p.OwnerUserId, SUM(p.Score) AS ScoreSum

FROM dbo.Posts AS p

WHERE p.CreationDate BETWEEN '2008-06-01' AND '2008-08-01'

GROUP BY p.OwnerUserId

Where Hash Match aggregates can take data in any order (this doesn’t necessarily make them better, but hang on), Stream Aggregates need everything in order first.

This is a trade off, though. For a Hash Match aggregate, it behaves a bit like a Hash Join in that all the rows have to arrive at the operator before hashing can begin. With a Stream Aggregate, if the data isn’t in index order, it needs to be sorted. The Sort behaves similarly to the Hash Join (huzzah! The trade off!), and all rows have to get to the Sort operator before sorting can begin.

The other thing that sorting and hashing have in common is they’re both memory consuming operators. This is what all the fuss about memory grants is about.

Plan Three: The Sort To Support A Merge Join

Just like a Stream Aggregate, Merge Joins need sorted input. There’s some sense to this plan, in that with two Sorts, you can deduplicate some amount of data prior to sorting. There’s also that Stream Aggregate after the Merge Join, which needs sorted input. If the Sort were after the Merge Join (for some strange reason), there could be a whole lot more data to sort.

SELECT COUNT(*)
FROM   dbo.Posts AS p
JOIN   dbo.Posts AS p2
    ON p.OwnerUserId = p2.OwnerUserId

SELECT COUNT(*)

FROM dbo.Posts AS p

JOIN dbo.Posts AS p2

ON p.OwnerUserId = p2.OwnerUserId

Wild, huh?

The Hash Match is applied to one side of the join to reduce the number of rows to sort and join.

Why not both sides?

Good question.

Plan Four: The Distinct Sort

This one is a little more obvious, but I decided to include it because I think you’re all nice people who deserve mostly complete blog posts.

I hope someday you find them.

Ha ha haaaadhefoiwsoihrgjergopjh.

If you use DISTINCT or GROUP BY in your queries, the optimizer may choose to implement them with a Distinct Sort.

SELECT DISTINCT p.LastEditorDisplayName
FROM dbo.Posts AS p

1 2	SELECT DISTINCT p.LastEditorDisplayName FROM dbo.Posts AS p

Why not a Sort and then a Stream Aggregate?

Good question.

Maybe it’s both?

Plan Five: The Windowing Function

I know, you think I’m cheating here. You think I’m a big fat cheaty cheater. Liar liar plans on fire.

SELECT TOP 1000
       p.OwnerUserId, DENSE_RANK() OVER ( PARTITION BY p.OwnerUserId ORDER BY p.CreationDate ) AS ranking_something
FROM   dbo.Posts AS p
WHERE  p.CreationDate >= '2010-10-30'

SELECT TOP 1000

p.OwnerUserId, DENSE_RANK() OVER ( PARTITION BY p.OwnerUserId ORDER BY p.CreationDate ) AS ranking_something

FROM dbo.Posts AS p

WHERE p.CreationDate >= '2010-10-30'

There’s an order by here, but the Sort has almost nothing to do with it. All of these queries use the same index as before. The order by for CreationDate is fully supported.

Or at least it would be, if we didn’t have to Partition By OwnerUserId first.

Partition By sorts the data, too. This is why indexing correctly can be important.

The optimizer doesn’t really have any alternatives. In many of the other plans, things could have been done differently. This one is all you.

Or me.

Where were we?

Put On A Happy Face

Now when you see a mysterious Sort operator appear in your query plans, you’ll have some more clues as to why.

Hopefully this helps you out, and helps you make a decision about how to interpret and tune things where necessary.

Remember that the only cure for a SARGable Sort (a Sort that’s not on an Expression) is an index that supports it. Adding or rearranging indexes may help (query rewrites may also help, sometimes…).

Thanks for reading!

Book Review: Inside SQL Server 6.5

Parameter Fluid Optimization

10 Comments. Leave new

Lori Fry
May 3, 2018 11:43 am

Great post! I always learn so much from good examples that are explained well. Thanks!

Reply
- Erik Darling
  May 3, 2018 1:11 pm
  
  Hi Lori! Glad you enjoyed it.
  
  Or learned something.
  
  Either way.
  
  Reply
Markus
May 4, 2018 12:22 am

I’d guess that in plan three one Hash Match is enough to keep the Merge Join from becoming many-to-many.

Reply
Nick Rotsching
May 7, 2018 3:43 pm

Just ran into this head-scratcher last week and this post came around at a fantastic time! I was hitting a “plan three” scenario with a merge join and couldn’t figure out where on Earth the sort was coming from. Makes a lot of sense when it’s spelled out clearly. Thanks for the insight Erik, very helpful!

Reply
- Erik Darling
  May 7, 2018 4:55 pm
  
  Glad it helped, Nick. These things sure are puzzling at first.
  
  Reply
Carlos Bercero
July 13, 2021 2:39 pm

Hi Brent!

This unintended sort is killing me. Is completely random.

Random Key Lookup are way faster and less resource consuming than the unnecessary sort operation.

Please tell me there’s a hint or something to force the engine not to do it!

Best for you my friend!

Carlos from Abarca

Reply
- Brent Ozar
  July 13, 2021 2:55 pm
  
  No, sorry sir!
  
  Reply
Max
April 12, 2022 1:22 pm

So I just ran a query with a bunch of unions, and I got a DISTINCT SORT. I also ran the same query but within a CTE to break out the common conditions across all unions to tidy things up a bit… Now all of a sudden, the plan was identical, except that instead of a distinct sort, I got a regular SORT and a STREAM AGGREGATE…
In this post, under the distinct sort section, you touch on that it’s basically the same thing but no explanation as to why it’s used. Do you have any idea whether one is better than the other in any situation?

Reply
- Brent Ozar
  April 13, 2022 2:21 pm
  
  Hi! Unfortunately, personalized query tuning help is a little beyond what we can do in the blog post comments.
  
  Reply
  - Max
    April 13, 2022 2:53 pm
    
    I respect that and I really didn’t mean that I wanted help with query tuning 😛
    I really just wanted to know if there was any real difference between distinct sort and sort with stream aggregate. But if that’s still beyond what can be done as a comment here, no worries! I was mostly just curious, anyway 😀
    
    Reply