You can’t do that on management studio
Recently, while working with a client, I did something in a query that they were mystified by. I didn’t think much of it, but I thought it might be useful to you, dear readers, as well. Along with an explanation.
Here’s a sample query that takes advantage of the same type of trick, but with a few extra bats and worms added in to illustrate a larger point.
1 2 3 4 5 6 7 |
SELECT TOP 100 b.Name, b.UserId, COUNT_BIG(*) AS Coconuts FROM dbo.Badges AS b WHERE b.Date >= '2014-01-01 00:00:00.003' GROUP BY b.Name, b.UserId HAVING COUNT_BIG(*) > 100 ORDER BY Coconuts DESC; |
Can you dig it?
What I did was order by the alias of the COUNT_BIG(*) column, Coconuts.
What they didn’t understand was why that’s legal, but filtering on that alias wouldn’t be legal. A more familiar scenario might be using ROW_NUMBER(); you can ORDER BY it, but not filter on it in the WHERE clause to limit result sets to the TOP N per set. You would have to get an intermediate result in a CTE or temp table and then filter.
When SQL goes to figure out what to do with all this, it doesn’t look at it in the order you typed it. It’s a bit more like this:
8. SELECT
9. DISTINCT
11. TOP
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE/ROLLUP
7. HAVING
10. ORDER BY
12. OFFSET/FETCH
To make that a easier to read:
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE/ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
12. OFFSET/FETCH
And that’s how babies get made
Since the ORDER BY is processed after the SELECT list, ORDER BY can use a column aliased there. You can’t do that in the WHERE clause because it gets processed before SQL does its fancy footwork to get you some stuff to look at.
Here are some examples of what happens when you try to move the alias to different parts of the query.
Using it in the HAVING filter:
1 2 3 4 5 6 7 |
SELECT TOP 100 b.Name, b.UserId, COUNT_BIG(*) AS Coconuts FROM dbo.Badges AS b WHERE b.Date >= '2014-01-01 00:00:00.003' GROUP BY b.Name, b.UserId HAVING Coconuts > 100 ORDER BY Coconuts DESC; |
And again using it in the WHERE clause:
1 2 3 4 5 6 7 |
SELECT TOP 100 b.Name, b.UserId, COUNT_BIG(*) AS Coconuts FROM dbo.Badges AS b WHERE b.Date >= '2014-01-01 00:00:00.003' AND Coconuts > 100 GROUP BY b.Name, b.UserId ORDER BY Coconuts DESC; |
Both result in the same error, give or take a line number. Coconuts is not reference-able at either of these points.
Msg 207, Level 16, State 1, Line 33
Invalid column name 'Coconuts'.
31 Comments. Leave new
This trick is so little known that SQL Prompt does not even support it 🙂
SQL really is not well designed. LINQ has no trouble here. But we’re stuck with SQL and this will never change.
Thanks Erik, I made sure to bookmark this one so I can revisit it from time to time. I need to do that more as more as I get older. 🙂
You and me both. But I’ve always followed Einstein’s advice: “Never memorize what you can look up in books.”
Similarly, I follow Professor Henry Jones’ mantra of “I wrote them down [in my diary] so that I wouldn’t *have* to remember!”, in regards to the Grail Diary in Indiana Jones and the Last Crusade.
What really confuses people is when you you do
SELECT columna,
columnb
from table1
order by 1
So that it will order by columna
I personally don’t use this convention but I work with someone who does
I haaaaaaaaaaaaaate that. Forever.
Agreed. I grumble at my co worker who does it all the time. He just laughs at me
In adhoc queries it’s a wonderful thing, especially on computed columns that you haven’t aliased. In production code though? Bring me your firstborn.
I do that only for quick, disposable queries. By the time I get to the bottom of my query, my fingers are happy that I choose to just type “ORDER BY n”. Gets me to F5 as fast as possible.
But I never, ever, use ordinal aliases in the ORDER BY in any kind of persisting queries. EVER. It’s in the same category as aliasing tables as “a,” “b,” and “c.”
ooohhhhhhhh…… good stuff. Thank you very much. Very cool.
Is the order on how SQL server processes a SQL statement in the Microsoft documentation?
I think it is very helpful. If not, I will copy your list and display it on my desk at work =)
Thanks!!!
There is no order as predicates in WHERE clause can move up and *down* derived tables and so WHERE can execute *after* SELECT in this case.
The unfortunate result is that SELECT CAST(Column AS FLOAT) … WHERE ISNUMERIC(Column) = 1 can fail for VARCHAR column in a derived table.
https://msdn.microsoft.com/en-us/library/ms189499.aspx
It’s also in chapter 1 of the 70-461 training kit. It is definitely a fundamental behavior to understand.
I’ve known about this capability for years, just never could get myself in the habit of using it. The column alias isn’t very useful in the query it is used in; it’s more for the output, which if you think about it, is exactly what the ORDER BY is – an output operator. This gave me a major headache years ago when I first started learning SQL Server after having only know Oracle, where aliases are aliases EVERYWHERE.
The alias isn’t very useful functionally, but it follows the standards of the relational model. Every attribute in a set should have a name.
Yes, ORDER BY is weird and another FYI
SELECT [b].[Name]
FROM [dbo].[Badges] AS [b]
JOIN [dbo].[Badges] AS [b2]
ON [b].[ID] = [b2].[ID]
ORDER BY [Name]
fails on SQL2000 with “Ambiguous column name ‘Name'” but works OK on SQL2005+
SELECT [b].[Name] AS [Name]
FROM [dbo].[Badges] AS [b]
JOIN [dbo].[Badges] AS [b2]
ON [b].[ID] = [b2].[ID]
ORDER BY [Name]
works everywhere.
Try nesting aggregations with window fuctions.
Something like
SELECT islands, SUM(SUM(coconuts) OVER ()
FROM myTable
GROUP BY islands;
Messes with your head 🙂
The SUM(SUM(… thing messes with my head enough as is. The first time I ever saw it I was reading on a train and missed my stop trying to figure it out.
What’s this Sum(Sum(.. thing you speak of? I do like stuff that messes with your head.
My favorite is
WITH –CROSS JOINED Virtual Auxiliary Table of Numbers
L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 A CROSS JOIN L0 B),
L2 AS (SELECT 1 AS c FROM L1 A CROSS JOIN L1 B),
L3 AS (SELECT 1 AS c FROM L2 A CROSS JOIN L2 B),
L4 AS (SELECT 1 AS c FROM L3 A CROSS JOIN L3 B),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS i FROM L4)
Select i from Nums
Here’s an example of “nested” aggregations:
http://beyondrelational.com/quiz/sqlserver/tsql/2011/questions/nested-aggregate-function.aspx
The SUM,MAX Staff sometime make me thing a lot , so I will like you to get me there!!
Don’t you have to engage in the practice of making the babies before you can throw out the weak ones (THIS…IS…SPARTA !!!). I only ask because I’ve been working through Ben-Gan’s T-SQL tome and it has the cartesian product phase before the on predicate phase. I am mostly curious if there is some short circuitry in this beast that makes your specified ordering occur (and this admittedly doesn’t have any impact on the rest of your post, but I am a curious mind)?
Oh, yeah, there are a few possible in-between steps depending on the type of join you’re doing. My list was just to put the keywords in order for reference.
Sorry to be slightly off topic, but in the example: ‘2014-01-01 00:00:00.003’ – why three milliseconds? I’m guessing this is related to the precision of the data type and you want to avoid where b.date was exactly midnight?
Easy. Dude’s a robot.
Itzik Ben-Gan produced an excellent diagram covering this topic:
http://tsql.solidq.com/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf
Lets say we have a query like following:
USE TestDB
GO
SELECT
p.Column1 ,
p.Column2
FROM dbo.Table1 AS p
ORDER BY p.Column3
Like we know, SQL Server create virtual tables as a result of doing some step (JOIN, WHERE, GROUP BY…) and pass that virtual tables to next steps. So, as ORDER BY is processed after SELECT clause, how it sees column that is not selected and included in virtual table after SELECT clause?
V – SQL Server brings all of the fields along through the plan that it needs in order to perform its work.
Why then it is not possible if we include GROUP BY clause with two columns from SELECT list?
V – because if you want to sort by something, you have to group by it too. You may want to show examples if you’d like more specific answers.
First query:
USE TestDB
GO
SELECT
p.Column1 ,
p.Column2
FROM dbo.Table1 AS p
ORDER BY p.Column3
Second query:
USE TestDB
GO
SELECT
p.Column1 ,
p.Column2
FROM dbo.Table1 AS p
GROUP BY p.Column1, p.Column2
ORDER BY p.Column3
If we execute these two queries, second will arise an error.
My question is why is then first possible to execute, or why the second is not allowed for execution? What is really happening internally, what is the distinction?
Your example:
SELECT
p.Column1 ,
p.Column2
FROM dbo.Table1 AS p
GROUP BY p.Column1, p.Column2
ORDER BY p.Column3
Answer for your question:
When you are grouping a resulset of a query, if you want to order ir you must to order by one of the columns you put in your ORDER BY sentence, it doesnt matter if that column will be into your select columns.
SELECT
p.Column1 ,
p.Column2
FROM dbo.Table1 AS p
GROUP BY p.Column1, p.Column2, p.Column3
ORDER BY p.Column3
Also if you select a column that is NOT into your GROUP BY sentence you will have too the same error.
SELECT
p.Column1 ,
p.Column2 ,
p.Column3
FROM dbo.Table1 AS p
GROUP BY p.Column1, p.Column2, p.Column3
ORDER BY p.Column3
Hope it helps you =D