Logical Query Processing

You can’t do that on management studio

Recently, while working with a client, I did something in a query that they were mystified by. I didn’t think much of it, but I thought it might be useful to you, dear readers, as well. Along with an explanation.

Here’s a sample query that takes advantage of the same type of trick, but with a few extra bats and worms added in to illustrate a larger point.

Can you dig it?

What I did was order by the alias of the COUNT_BIG(*) column, Coconuts.

What they didn’t understand was why that’s legal, but filtering on that alias wouldn’t be legal. A more familiar scenario might be using ROW_NUMBER(); you can ORDER BY it, but not filter on it in the WHERE clause to limit result sets to the TOP N per set. You would have to get an intermediate result in a CTE or temp table and then filter.

When SQL goes to figure out what to do with all this, it doesn’t look at it in the order you typed it. It’s a bit more like this:

8. SELECT
9. DISTINCT
11. TOP
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE/ROLLUP
7. HAVING
10. ORDER BY
12. OFFSET/FETCH

To make that a easier to read:

1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE/ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
12. OFFSET/FETCH

And that’s how babies get made

Since the ORDER BY is processed after the SELECT list, ORDER BY can use a column aliased there. You can’t do that in the WHERE clause because it gets processed before SQL does its fancy footwork to get you some stuff to look at.

Here are some examples of what happens when you try to move the alias to different parts of the query.

Using it in the HAVING filter:

And again using it in the WHERE clause:

Both result in the same error, give or take a line number. Coconuts is not reference-able at either of these points.
Msg 207, Level 16, State 1, Line 33
Invalid column name 'Coconuts'.

,
Previous Post
13 Questions to Ask Before You Touch a Database Server
Next Post
Database Connection Hazards with Entity Framework

31 Comments. Leave new

  • This trick is so little known that SQL Prompt does not even support it 🙂

    SQL really is not well designed. LINQ has no trouble here. But we’re stuck with SQL and this will never change.

  • Thomas Strike
    July 14, 2015 9:11 am

    Thanks Erik, I made sure to bookmark this one so I can revisit it from time to time. I need to do that more as more as I get older. 🙂

    • Erik Darling
      July 14, 2015 3:35 pm

      You and me both. But I’ve always followed Einstein’s advice: “Never memorize what you can look up in books.”

      • John G Hohengarten
        August 5, 2015 9:48 am

        Similarly, I follow Professor Henry Jones’ mantra of “I wrote them down [in my diary] so that I wouldn’t *have* to remember!”, in regards to the Grail Diary in Indiana Jones and the Last Crusade.

  • What really confuses people is when you you do
    SELECT columna,
    columnb
    from table1
    order by 1
    So that it will order by columna

    I personally don’t use this convention but I work with someone who does

    • Erik Darling
      July 14, 2015 3:36 pm

      I haaaaaaaaaaaaaate that. Forever.

      • Agreed. I grumble at my co worker who does it all the time. He just laughs at me

        • In adhoc queries it’s a wonderful thing, especially on computed columns that you haven’t aliased. In production code though? Bring me your firstborn.

    • I do that only for quick, disposable queries. By the time I get to the bottom of my query, my fingers are happy that I choose to just type “ORDER BY n”. Gets me to F5 as fast as possible.

      But I never, ever, use ordinal aliases in the ORDER BY in any kind of persisting queries. EVER. It’s in the same category as aliasing tables as “a,” “b,” and “c.”

  • ooohhhhhhhh…… good stuff. Thank you very much. Very cool.

    Is the order on how SQL server processes a SQL statement in the Microsoft documentation?
    I think it is very helpful. If not, I will copy your list and display it on my desk at work =)

    Thanks!!!

  • Steve Mangiameli
    July 14, 2015 10:25 am

    I’ve known about this capability for years, just never could get myself in the habit of using it. The column alias isn’t very useful in the query it is used in; it’s more for the output, which if you think about it, is exactly what the ORDER BY is – an output operator. This gave me a major headache years ago when I first started learning SQL Server after having only know Oracle, where aliases are aliases EVERYWHERE.

  • Yes, ORDER BY is weird and another FYI

    SELECT [b].[Name]
    FROM [dbo].[Badges] AS [b]
    JOIN [dbo].[Badges] AS [b2]
    ON [b].[ID] = [b2].[ID]
    ORDER BY [Name]

    fails on SQL2000 with “Ambiguous column name ‘Name'” but works OK on SQL2005+

    SELECT [b].[Name] AS [Name]
    FROM [dbo].[Badges] AS [b]
    JOIN [dbo].[Badges] AS [b2]
    ON [b].[ID] = [b2].[ID]
    ORDER BY [Name]

    works everywhere.

  • Try nesting aggregations with window fuctions.
    Something like
    SELECT islands, SUM(SUM(coconuts) OVER ()
    FROM myTable
    GROUP BY islands;

    Messes with your head 🙂

  • Don’t you have to engage in the practice of making the babies before you can throw out the weak ones (THIS…IS…SPARTA !!!). I only ask because I’ve been working through Ben-Gan’s T-SQL tome and it has the cartesian product phase before the on predicate phase. I am mostly curious if there is some short circuitry in this beast that makes your specified ordering occur (and this admittedly doesn’t have any impact on the rest of your post, but I am a curious mind)?

    • Erik Darling
      July 14, 2015 3:45 pm

      Oh, yeah, there are a few possible in-between steps depending on the type of join you’re doing. My list was just to put the keywords in order for reference.

  • Signal_Runner
    July 14, 2015 3:57 pm

    Sorry to be slightly off topic, but in the example: ‘2014-01-01 00:00:00.003’ – why three milliseconds? I’m guessing this is related to the precision of the data type and you want to avoid where b.date was exactly midnight?

  • Itzik Ben-Gan produced an excellent diagram covering this topic:

    http://tsql.solidq.com/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf

  • Lets say we have a query like following:

    USE TestDB
    GO

    SELECT
    p.Column1 ,
    p.Column2
    FROM dbo.Table1 AS p
    ORDER BY p.Column3

    Like we know, SQL Server create virtual tables as a result of doing some step (JOIN, WHERE, GROUP BY…) and pass that virtual tables to next steps. So, as ORDER BY is processed after SELECT clause, how it sees column that is not selected and included in virtual table after SELECT clause?

    • V – SQL Server brings all of the fields along through the plan that it needs in order to perform its work.

      • Why then it is not possible if we include GROUP BY clause with two columns from SELECT list?

        • V – because if you want to sort by something, you have to group by it too. You may want to show examples if you’d like more specific answers.

          • First query:

            USE TestDB
            GO

            SELECT
            p.Column1 ,
            p.Column2
            FROM dbo.Table1 AS p
            ORDER BY p.Column3

            Second query:

            USE TestDB
            GO

            SELECT
            p.Column1 ,
            p.Column2
            FROM dbo.Table1 AS p
            GROUP BY p.Column1, p.Column2
            ORDER BY p.Column3

            If we execute these two queries, second will arise an error.

            My question is why is then first possible to execute, or why the second is not allowed for execution? What is really happening internally, what is the distinction?

        • Adriana Escamilla
          July 29, 2015 11:19 am

          Your example:

          SELECT
          p.Column1 ,
          p.Column2
          FROM dbo.Table1 AS p
          GROUP BY p.Column1, p.Column2
          ORDER BY p.Column3

          Answer for your question:

          When you are grouping a resulset of a query, if you want to order ir you must to order by one of the columns you put in your ORDER BY sentence, it doesnt matter if that column will be into your select columns.

          SELECT
          p.Column1 ,
          p.Column2
          FROM dbo.Table1 AS p
          GROUP BY p.Column1, p.Column2, p.Column3
          ORDER BY p.Column3

          Also if you select a column that is NOT into your GROUP BY sentence you will have too the same error.

          SELECT
          p.Column1 ,
          p.Column2 ,
          p.Column3
          FROM dbo.Table1 AS p
          GROUP BY p.Column1, p.Column2, p.Column3
          ORDER BY p.Column3

          Hope it helps you =D

Menu
{"cart_token":"","hash":"","cart_data":""}