How to Think Like the SQL Server Engine: The Perils of SELECT *

Last Updated October 22, 2019

In our last post, we ran a query with an ORDER BY, but we only got one column in the SELECT:

SELECT Id 
  FROM dbo.Users 
  WHERE LastAccessDate > '2014/07/01' 
  ORDER BY LastAccessDate;

1

2

3

4

SELECT Id

FROM dbo.Users

WHERE LastAccessDate > '2014/07/01'

ORDER BY LastAccessDate;

The estimated cost was about $18 Query Bucks because SQL Server had to:

Scan the entire clustered index, yelling out the Id and LastAccessDate of each row
Sort that list by LastAccessDate

Now, let’s change just one thing about the query – what we’re selecting:

SELECT *
  FROM dbo.Users 
  WHERE LastAccessDate > '2014/07/01' 
  ORDER BY LastAccessDate;

1

2

3

4

SELECT *

FROM dbo.Users

WHERE LastAccessDate > '2014/07/01'

ORDER BY LastAccessDate;

And run both of the queries back to back to get their actual execution plans:

The basic execution plan of both queries is pretty similar, aside from the fact that the bottom one went parallel. SQL Server realized that sorting all this data was going to be a heck of a lot more work, so it split the work across multiple CPU cores.

This query sucks in a few different ways.

SELECT * can read more data. I know what you’re thinking: both queries have to read all of the 8KB pages in the table, right? No – go back to the first post in the series when I introduced the Users table. I mentioned that the AboutMe field, a big ol’ NVARCHAR(MAX), might be so large that it ends up getting pushed off-row: stored in different 8KB pages. Our SELECT Id query didn’t need to read those extra pages – but when we SELECT *, we do.

SELECT * can sort more data. SQL Server can’t just sort the LastAccessDates – it sorts the whole rows around. That means it’s going to need more CPU time, more memory, and do more spills to disk if it gets those memory estimates wrong.

SELECT * can take more time to output. This was always the thing I focused on as a database administrator. I would tell my devs, “Don’t select fields you don’t need, because it takes longer to yell that data out over the network.” Today, that’s the least of my problems: I’m much, much more concerned about the CPU time and memory grants consumed by the sort operator.

Your first clue about just how bad the SELECT *’s sort operator sucks is the 97% cost of the sort – but it’s not 97% of the same query cost. The original query cost was $18 Query Bucks, but check out the cost of the SELECT * – IT’S ALMOST NINE HUNDRED QUERY BUCKS.

If query bucks were real dollars, I could have bought my first car with the cost of this SELECT *.

I hate SELECT *.

It isn’t about the star itself – it’s about lazily getting all of the columns, including ones you don’t need, and then forcing SQL Server to sort them.

And just because you didn’t put an ORDER BY in your query doesn’t mean you don’t have a sort, either – Erik wrote about sorts that get injected into your plan even if you didn’t ask for ’em.

For the next post in the series, we’ll go back to selecting just the Id. It’s not that I’m only going to allow my developers to just select Ids and nothing else – I understand that we gotta get data out of the database. However, when I’m dealing with query tuning, and I see a SELECT * (or a big ginormous list of columns), I’m going to start the tuning process by asking if we really need all of those fields. Even with Entity Framework, you can pick the columns you want. Does it take a little more work on your part? Sure – but you’re a lot cheaper than SQL Server’s CPU licensing or Azure SQL DB’s pricing, especially as your app starts to scale.

So next, let’s get just the Id, but run the query repeatedly.

Keith
October 15, 2019 9:47 am

What’s wrong with

SELECT *
FROM
(SELECT * FROM x.dbo.y
WHERE x = ‘value’
UNION
SELECT * FROM a.dbo.zz
WHERE 123 = ‘othervalue’)
WHERE count(*) = 42
AND y LIKE ‘%value2%’
ORDER BY 1,2,3,4 ASC, 5 DESC?

:-p

Reply
Tom Horner
October 16, 2019 4:49 am

Another reason not to use select * is that if your table definition changes (columns added or moved around), and you forget to change a program expecting a certain sequence of data to be used by the program, the program will stop working. In a previous company I worked for, this SQL was not allowed since many programs may have had to be changed if they used it and the table changed.

Reply
- Brent Ozar
  October 16, 2019 6:25 am
  
  Tom – good point!
  
  Reply
- Keith
  October 16, 2019 9:12 am
  
  This ^
  
  The crappy performance is one thing. someone had a report that did a select * on a 200 something column wide, mostly unicode table that would only use 10 or so of the rows. That’s super annoying. But having to deal with it on every single little update can make a 2 day update take a month and a half.
  
  the people that do this are invariable also the ones who put loose query text into excel workbooks they send all over the network, into SSRS reports all over the network and you can use any search tools to try and find them other than manual grind work.
  
  Reply
Stephen Morris
October 17, 2019 2:22 am

shout out for enterprise edition which can share scans (not quite the same thing as caching results but some similarities)

https://docs.microsoft.com/en-us/sql/relational-databases/reading-pages?view=sql-server-ver15

Reply