SUM, AVG, and arithmetic overflow

T-SQL
15 Comments

You Shoulda Brought A Bigger Int

Sometimes you run a query, and everything goes fine.

For a while. For example, if I run this query in the 2010 copy of Stack Overflow, it finishes pretty quickly, and without error.

If I run this query in the full version, it runs for a minute and a half, and then returns an error:

They Shoot Horses, Don’t They?

Now, we’ve had COUNT_BIG for ages. Heck, we even need to use it in indexed views when they perform an aggregation. More recently, we got DATEDIFF_BIG. These functions allow for larger integers to be handled without additional math that I hate.

If one day you got an email that this query suddenly started failing, and you had to track down the error message, what would you do?

I’d probably wish I wore my dark pants and start running DBCC CHECKDB.

I kid, I kid.

There are no pants here.

Anyway, it’s easy enough to fix.

You just have to lob a convert to bigint in there.

Data Grows, Query Breaks

This is a pretty terrible situation to have to deal with, especially if you have a lot of queries that perform aggregations like this.

Tracking them all down, and adding in fixes is arduous, and there’s not a solution out there that doesn’t require changing code.

I was going to open a User Voice item about this, but I wanted to get reader feedback, first, because there are a few different ways to address this that I can think of quickly.

  1. Add _BIG functions for SUM, AVG, etc.
  2. Add cultures similar to CONVERT to indicate int, bigint, or decimal
  3. Change the functions to automatically convert when necessary
  4. Make the default output a BIGINT

There are arguments for and against all of these, but leave a comment with which you think would be best, or your own.

Thanks for reading!

Previous Post
You read reviews before you buy training, right?
Next Post
How Check Constraints MIGHT Improve Your Queries and Missing Index Requests

15 Comments. Leave new

  • Emanuele Meazzo
    October 30, 2018 9:59 am

    I don’t want the output to be always bigint because I don’t want to deal with implicit bigint to int conversions when using those results; At the same time, converting all the numbers I send to those functions to bigint isn’t ideal (the aggregation is “big”, not the single value), so, having _BIG versions for all the aggregation functions would be the ideal solution for me.

    Reply
  • Jeffrey Khalil
    October 30, 2018 11:11 am

    It would be cool if SQL could implicitly convert the aggregate result to _BIG if it detects the error.

    Reply
  • My vote goes to option 3, having the function automatically perform a conversion when necessary, that way a bigint isn’t returned unnecessarily, but the function won’t, after diligently plugging away for a good while, burst into flames and die in a spectacular fashion.

    Reply
  • I like the cultures “sum(p.Sales as bigint)”. The auto-conversion is less work for writing, but it’s hiding the data type from you.

    Reply
  • Well….
    Options 1 and 2 seem equivalent to me as the current solution (requiring manual intervention), and I don’t necessarily always want bigint unless I need bigint (which crosses off option 4).

    So I vote for option 3!

    Reply
  • I think SUM and AVG should be treated differently. AVG seems easy: as Chad says, use a bigger datatype to hold the internal working total so it doesn’t overflow. SUM already returns a bigger datatype than the input for most types of whole number. I guess, when they introduced BIGINT in SQL 2000, they decided changing SUM to return BIGINT for an INT input wouldn’t help prevent an overflow if you were still putting the answer into an INT. I think a SUM_BIG would be best because it would be like the other _BIG functions.

    Reply
  • Option 5: Just make all integrals 64-bit in the engine! Until we need 72-bit integrals. Then introduce 72-bit REALLYBIGINT at which point we have to explicitly cast the type.

    Seriously though: I’m a proponent of 1 or 2. Auto conversions could have a negative impact on applications dependent on the data. If overflow breaks something, you end up searching everywhere anyway.

    Reply
  • It’s “You’re gunna need a bigger int”. Just saying ?

    Reply
  • Jonathan Brune
    October 30, 2018 12:40 pm

    I vote #4, but any of them would be an improvement. Memory and storage are much cheaper than they were in the days of SQL 6.5, there’s not much downside to using BIGxxx instead of xxx.

    Reply
  • I don’t like 1 or 2 as that needs me to know up front if it’s going to be possible that the calculation will over flow, and if I really don’t know, I’ll end up using XXX_BIG everywhere just in case. Which would then act like 4, and always return BIGINT, like it or not. So if 3 can be done without too much impact then 3, otherwise 4.

    Reply
  • Having _BIG versions for all the aggregation functions would be the best solution for me and it seems to be more consistent with other _BIG changes.

    Reply
  • Paw Jershauge
    October 31, 2018 3:21 am

    Option 1: Would and should be the correct way to handle this. in my opinion.
    Option 2: Would be nice, if it was Sum(table.column AS bigint).
    Option 3:
    a. How in the world would this ever work ?? What would be the metadata, for the resultset… would the sum column output datatype be set as SQL_VARIANT ??? because we don’t know what the data type is that is returned. No thank you…
    b. If the output datatype, is based on the input datatype automaticly, what should be the rule ???
    1. INT in and BIGINT out ???
    2. BIGINT in and ??????? out
    Option 4: No thank you. The developer need to be in control of the datatype.

    Reply
    • Paw Jershauge
      October 31, 2018 3:47 am

      Regarding my answer to option 3, I was assuming that we were looking for a solution, where there was no overflow exception.

      Reply
  • Mikael Eriksson
    November 1, 2018 3:02 am

    I guess option 3 will probably make SUM() nondeterministic so my vote would go to option 1 for SUM().
    The result of AVG() can never overflow the input data type so that AVG() fails is a bug that needs to be fixed internally and reported separately.

    Reply
  • Tarhib IT Limited
    March 7, 2024 9:02 am

    You might be summing up a large number of positive values in a column. If the sum exceeds the maximum value allowed by the data type for that column, overflow occurs, and the result might be incorrect (often a negative number).

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.