We started out the How to Think Like the Engine series with a simple query with a WHERE clause:
Now let’s add an ORDER BY:
WHERE LastAccessDate > '2014/07/01'
ORDER BY LastAccessDate;
Here’s the updated plan – note that the query cost has tripled to $17.72 Query Bucks. Let’s dig into why:
We read the plan from right to left, hovering our mouse over each operator. Each operator is kinda like a standalone program that has its own dedicated work to do, and produces a specific output.
The first thing that happens is the Clustered Index Scan.
At the top right, the clustered index scan is exactly the same as it was in the last query. Hover your mouse over that and there are a lot of juicy details that we didn’t really dig into before:
- Predicate: the clustered index scan mini-program is only going to return rows where the LastAccessDate > ‘2014-07-01’. (Pretty nifty how it changed the date format and added the time, right?)
- Estimated Number of Rows: 149,259
- Estimated Number of Rows to be Read: 299,398 (we have to scan the whole table to find the people who match)
- Output List: Id, LastAccessDate (because upstream of this mini-program, other mini-programs are going to need both Id and LastAccessDate. Specifically, the Sort operator – which happens next – is going to need to sort all these rows by LastAccessDate.)
The data flows out of this scan operator, and flows into the next one: Sort.
The second thing that happens is the Sort.
The sort’s input is the 148,328 rows of Id & LastAccessDate that came out of the Clustered Index Scan, and they’re not sorted in any kind of order – but our query asked for them to be ordered by LastAccessDate. That’s where the Sort’s work comes in. Hover your mouse over it to see what’s happening in that mini-program:
A few points of interest:
- Order By (down at the bottom): the main goal of the sort
- Estimated Number of Rows: 149,259
- Estimated I/O Cost: $0.01 (because there’s not much I/O to do if you’re sorting 150K rows)
- Estimated CPU Cost: $11.7752 Query Bucks (because sorting is mostly CPU work)
But note that those estimates above are all based on 149,259 rows. If way more rows came in (or less), then our actual work would have been way more (or less.) This is a good time to stop and mention that you don’t see Actual Cost numbers here in Query Bucks: SQL Server doesn’t go back and re-cost stuff after the work has been completed. Anytime you see a cost – even on an actual plan – it’s just an estimate that was done before the query started. Just like you, SQL Server doesn’t later confess just how over budget or late its work was.
So why did the query cost triple from $6 to $18?
It all comes back to the $11.7752 Query Buck cost of the Sort operator, which is brand new in this plan. Sorting data is hard work.
I jokingly say that SQL Server is the world’s second most expensive place to sort data – second only to Oracle. Next time someone complains about the $7K per core cost of SQL Server Enterprise Edition, remind them that Oracle is $47K per core. $47K. Hoo boy.
If you need to sort data, try doing it in the app tier instead. Developers are really good about scaling out application work, and their app servers don’t cost $7K per core. I’m all about going to bat for my developers to get them more & faster web servers because those are easier/cheaper than scaling out sorting in SQL Server. If the query doesn’t have a TOP, then it probably shouldn’t have an ORDER BY.
What’s that, you say? You’ve never heard Microsoft dispense that advice?
Microsoft, the company that charges $7K per core to do your sorting for you? They haven’t told you about that? Huh. Must have been too busy about telling you how you can now do R, Python, and Java inside the SQL Server, all at the low low price of $7K per core. Funny how that works.