A Query That Should Be Contradicted

Innocent Enough

I was writing another query, and became enamored with the fact that HAVING will accept IS NULL or IS NOT NULL as a predicate.

What I ended up writing as an example was this query:

Why this query?

I figured the optimizer would take one look at it and bail out with a Constant Scan.

After all, the WHERE clause filters to only NULL UserIds, and this column is NULLable in the Votes table.

The HAVING could only ever produce a NULL. And according to the laws of Logical Query Processing, WHERE is processed earlier than HAVING is.

But that’s not what happens.

Query At Work

And how.

Ach.

Smarter People

Just may kick my butt in the comments about why this doesn’t bail out. My soul and butt are prepared.

Thanks for reading!

Previous Post
New Classes: Dashboard in a Day, Database DevOps, tSQLt, SQL Server Internals, and Avoiding NOLOCK
Next Post
How I Configure SQL Server Management Studio

21 Comments. Leave new

  • The more I learn about Sql Server, I realize it is only as smart as the query you give it. Very powerful tool, proceed with caution.

    • Erik Darling
      July 31, 2018 8:38 am

      Isaac — yeah, a lot of the time I’m amazed it can figure out what people are asking for, and I’m even more amazed at the ways people come up with to ask for things. Hahaha.

  • Sarge Argable
    July 31, 2018 9:58 am

    SUM() makes the column non-SARGable?

    • Marcus Cuvler
      July 31, 2018 11:08 am

      I can’t help but be amuzed at Sarge Argable’s nameor in short S.Argable

    • Erik Darling
      July 31, 2018 11:23 am

      Go on.

      • Quantum Tun-filtering
        August 1, 2018 6:11 am

        SUM() makes the column non-SARGable, and also strips out the NULL values (hence the Warning: NULL value eliminated… message).

        Seems like the optimiser cannot think about the WHILE and the HAVING together (especially with that GROUP BY a-different-column-to-the-one-I’m-filtering-on).
        I wonder if it could connect those 2 clauses if you were GROUPing by the actual UserID. (I’d test myself, but I’m at work. Shhhhh)

  • Graeme Martin
    July 31, 2018 11:19 am

    It is reduced to a Constant Scan, if UserId is marked NOT NULLable.

  • Samuel Jones
    July 31, 2018 11:27 am

    I could be wrong here but I was always taught (and evidence has yet to contradict) that the HAVING clause operates on the resultset so I’m guessing the optimizer isn’t smart enough to notice the contradiction until runtime.

  • What a silly oversight. Please send this patch for the SUM function to your bros at Microsoft.
    if (identity.FullName == “Erik Darling”) return null;

  • Ashish Bhatia
    August 1, 2018 1:37 am

    Same kind of query on my local box resulted in Constant Scan (SQL Server 2012 and table has some 500 rows).

  • Ashish Bhatia
    August 1, 2018 4:34 am

    Oh, column in my case is not nullable.

  • Steven Hibble
    August 1, 2018 12:32 pm

    I don’t know why, but it looks like SUM, AVG, etc. are all converted into something like:
    CASE WHEN COUNT_BIG(Col) = 0 THEN NULL ELSE SUM(Col) END

    MIN and MAX work as expected. COUNT/COUNT_BIG do *not* work (seems weird). I’m guessing that that rewrite above of SUM can’t be pushed down because COUNT_BIG can’t be optimized away.

  • https://docs.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-2017
    Microsoft’s documentation for the SUM operator states:
    “Null values are ignored.”

  • Chris Stewart
    August 6, 2018 3:57 pm

    Using the http://data.stackexchange.com/ tool to run your query, I first checked that there was a stats object on the dbo.Votes.UserID field:
    SELECT *
    FROM sys.stats AS s
    INNER JOIN sys.stats_columns AS sc
    ON s.object_id = sc.object_id
    AND s.stats_id = sc.stats_id
    INNER JOIN sys.syscolumns AS sc2
    ON s.object_id = sc2.id
    AND sc.column_id = sc2.colid
    WHERE s.object_id = object_id(‘dbo.Votes’)
    Is there a stats object on that column in your copy? If there isn’t one, then obviously different work would need to be done by the optimizer.

    Now for running your query, to test the opposite case – if you flip the IS NULL/IS NOT NULL checks, and CAST the UserID as a BIGINT to prevent arithmetic overflow, like so:
    SELECT v.PostId, SUM(cast(v.UserId as bigint)) AS whatever
    FROM dbo.Votes AS v
    WHERE v.UserId IS not NULL
    GROUP BY v.PostId
    HAVING SUM(cast(v.UserId as bigint)) IS NULL;
    You get basically the same plan, where the FILTER operator applies the HAVING after you get the WHERE data, because it’s still possible for a SUM to provide a NULL value. However, in the revised statement the COMPUTE SCALAR operator for the BIGINT cast comes before the HASH MATCH operator, and there isn’t a second COMPUTE SCALAR operator, signifying the resulting in a more efficient plan. That COMPUTE SCALAR operator from your query is
    [Expr1001] = Scalar Operator(CASE WHEN [Expr1006]=(0) THEN NULL ELSE [Expr1007] END)
    This doesn’t exist in the plan for the flipped query.

    In both cases in the execution plan the expected row count is 1 and the actual is 0. However, in the STATISTICS IO results there is no actual worktable work being done:
    (0 row(s) returned)

    Table ‘Votes’. Scan count 7, logical reads 650511, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I’m not sure if this part is due to the nature of the site vs clearing cache and running it clean.

    It looks like HAVING SUM bails in the case for IS NULL vs the case for IS NOT NULL, but it still needs to do the work to get the data before applying the filter.

Menu
{"cart_token":"","hash":"","cart_data":""}