A Query That Should Be Contradicted

Last Updated July 20, 2018

Innocent Enough

I was writing another query, and became enamored with the fact that HAVING will accept IS NULL or IS NOT NULL as a predicate.

What I ended up writing as an example was this query:

SELECT   v.PostId, SUM(v.UserId) AS whatever
FROM     dbo.Votes AS v
WHERE    v.UserId IS NULL
GROUP BY v.PostId
HAVING   SUM(v.UserId) IS NOT NULL;

SELECT v.PostId, SUM(v.UserId) AS whatever

FROM dbo.Votes AS v

WHERE v.UserId IS NULL

GROUP BY v.PostId

HAVING SUM(v.UserId) IS NOT NULL;

Why this query?

I figured the optimizer would take one look at it and bail out with a Constant Scan.

After all, the WHERE clause filters to only NULL UserIds, and this column is NULLable in the Votes table.

The HAVING could only ever produce a NULL. And according to the laws of Logical Query Processing, WHERE is processed earlier than HAVING is.

But that’s not what happens.

Query At Work

Ach.

Smarter People

Just may kick my butt in the comments about why this doesn’t bail out. My soul and butt are prepared.

Thanks for reading!

New Classes: Dashboard in a Day, Database DevOps, tSQLt, SQL Server Internals, and Avoiding NOLOCK

How I Configure SQL Server Management Studio

21 Comments. Leave new

Isaac Koss
July 31, 2018 8:30 am

The more I learn about Sql Server, I realize it is only as smart as the query you give it. Very powerful tool, proceed with caution.

Reply
- Erik Darling
  July 31, 2018 8:38 am
  
  Isaac — yeah, a lot of the time I’m amazed it can figure out what people are asking for, and I’m even more amazed at the ways people come up with to ask for things. Hahaha.
  
  Reply
Sarge Argable
July 31, 2018 9:58 am

SUM() makes the column non-SARGable?

Reply
- Marcus Cuvler
  July 31, 2018 11:08 am
  
  I can’t help but be amuzed at Sarge Argable’s nameor in short S.Argable
  
  Reply
- Erik Darling
  July 31, 2018 11:23 am
  
  Go on.
  
  Reply
  - Quantum Tun-filtering
    August 1, 2018 6:11 am
    
    SUM() makes the column non-SARGable, and also strips out the NULL values (hence the Warning: NULL value eliminated… message).
    
    Seems like the optimiser cannot think about the WHILE and the HAVING together (especially with that GROUP BY a-different-column-to-the-one-I’m-filtering-on).
    I wonder if it could connect those 2 clauses if you were GROUPing by the actual UserID. (I’d test myself, but I’m at work. Shhhhh)
    
    Reply
    - Erik Darling
      August 1, 2018 7:09 pm
      
      QT — hm. Well, when you’ve had a chance to test, lemme know!
      
      Reply
Graeme Martin
July 31, 2018 11:19 am

It is reduced to a Constant Scan, if UserId is marked NOT NULLable.

Reply
- Erik Darling
  July 31, 2018 11:23 am
  
  That won’t work out too well on a column with NULLs in it…
  
  Reply
  - Graeme Martin
    July 31, 2018 1:01 pm
    
    Absolutely. Just an academic answer, nothing practical. Like the question… 😛
    
    Reply
    - Erik Darling
      August 1, 2018 7:10 pm
      
      OUCH DUDE
      
      Reply
Samuel Jones
July 31, 2018 11:27 am

I could be wrong here but I was always taught (and evidence has yet to contradict) that the HAVING clause operates on the resultset so I’m guessing the optimizer isn’t smart enough to notice the contradiction until runtime.

Reply
Billy Ray
July 31, 2018 3:36 pm

What a silly oversight. Please send this patch for the SUM function to your bros at Microsoft.
if (identity.FullName == “Erik Darling”) return null;

Reply
- Erik Darling
  August 1, 2018 7:10 pm
  
  Billy — that’d hurt more if you returned 0.
  
  Reply
Ashish Bhatia
August 1, 2018 1:37 am

Same kind of query on my local box resulted in Constant Scan (SQL Server 2012 and table has some 500 rows).

Reply
Ashish Bhatia
August 1, 2018 4:34 am

Oh, column in my case is not nullable.

Reply
Steven Hibble
August 1, 2018 12:32 pm

I don’t know why, but it looks like SUM, AVG, etc. are all converted into something like:
CASE WHEN COUNT_BIG(Col) = 0 THEN NULL ELSE SUM(Col) END

MIN and MAX work as expected. COUNT/COUNT_BIG do *not* work (seems weird). I’m guessing that that rewrite above of SUM can’t be pushed down because COUNT_BIG can’t be optimized away.

Reply
- Steven Hibble
  August 1, 2018 1:01 pm
  
  Of course, Paul White wrote something about this: http://sqlblog.com/blogs/paul_white/archive/2011/02/27/bug-slow-sums-and-averages.aspx
  
  Reply
Angela
August 1, 2018 6:58 pm

https://docs.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-2017
Microsoft’s documentation for the SUM operator states:
“Null values are ignored.”

Reply
- Erik Darling
  August 1, 2018 7:11 pm
  
  GO ON
  
  Reply
Chris Stewart
August 6, 2018 3:57 pm

Using the http://data.stackexchange.com/ tool to run your query, I first checked that there was a stats object on the dbo.Votes.UserID field:
SELECT *
FROM sys.stats AS s
INNER JOIN sys.stats_columns AS sc
ON s.object_id = sc.object_id
AND s.stats_id = sc.stats_id
INNER JOIN sys.syscolumns AS sc2
ON s.object_id = sc2.id
AND sc.column_id = sc2.colid
WHERE s.object_id = object_id(‘dbo.Votes’)
Is there a stats object on that column in your copy? If there isn’t one, then obviously different work would need to be done by the optimizer.

Now for running your query, to test the opposite case – if you flip the IS NULL/IS NOT NULL checks, and CAST the UserID as a BIGINT to prevent arithmetic overflow, like so:
SELECT v.PostId, SUM(cast(v.UserId as bigint)) AS whatever
FROM dbo.Votes AS v
WHERE v.UserId IS not NULL
GROUP BY v.PostId
HAVING SUM(cast(v.UserId as bigint)) IS NULL;
You get basically the same plan, where the FILTER operator applies the HAVING after you get the WHERE data, because it’s still possible for a SUM to provide a NULL value. However, in the revised statement the COMPUTE SCALAR operator for the BIGINT cast comes before the HASH MATCH operator, and there isn’t a second COMPUTE SCALAR operator, signifying the resulting in a more efficient plan. That COMPUTE SCALAR operator from your query is
[Expr1001] = Scalar Operator(CASE WHEN [Expr1006]=(0) THEN NULL ELSE [Expr1007] END)
This doesn’t exist in the plan for the flipped query.

In both cases in the execution plan the expected row count is 1 and the actual is 0. However, in the STATISTICS IO results there is no actual worktable work being done:
(0 row(s) returned)

Table ‘Votes’. Scan count 7, logical reads 650511, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

I’m not sure if this part is due to the nature of the site vs clearing cache and running it clean.

It looks like HAVING SUM bails in the case for IS NULL vs the case for IS NOT NULL, but it still needs to do the work to get the data before applying the filter.

Reply