User-defined scalar functions suck – even when they don’t access data.

T-SQL
19 Comments

The performance of scalar functions sucks hard. Let’s see it in action using the Stack Overflow database – any size will work. I’ll set things up first in case you want to follow along:

I’m purposely setting my Cost Threshold for Parallelism to be low here because I want to demonstrate what happens when a query goes parallel, but I want to use short demo queries so you’re not twiddling your thumbs as you wait to reproduce this stuff on your Pentium 3.

I’m using compat level 140 (2017), but any compat level prior to 150 (2019) will work. (SQL Server 2019 has dramatically different effects, sometimes not so good.)

Let’s create a scalar function that just returns the number 1. Nothing more, nothing less:

Then test its overhead by turning on actual execution plans and running these two queries back-to-back:

The execution plan looks like no big deal – one of ’em costs 49%, and the other is 51% – what’s a couple of percent between friends?

Scalar function execution plans, SQL Server 2017

But remember, the percentages are based on the estimated costs – even when you’re looking at an actual plan. The percentages are simply garbage and meaningless – execution plans are full of lies. However, there’s one interesting thing that does stand out: the plan with the function doesn’t have parallelism. Until SQL Server 2019, if your query has a scalar function anywhere in it, your entire query plan goes single-threaded. Ouch. (Ironically, the contents of the function are allowed to go multi-threaded, so…yay.)

A better indicator of performance here is in the Messages tab:

  • Without the function: 93 milliseconds elapsed time, 299 ms CPU time
  • With the function: 6,427 ms elapsed time, 5,375 ms CPU time

Ouch. tl;dr – even if your scalar function doesn’t touch tables, it still cripples performance by forcing serial processing, blowing up your CPUs, and obfuscating your query plans. Scalar user-defined functions: not even once.

Previous Post
5 Questions to Ask When You Upgrade SQL Server
Next Post
Things SQL Server Setup Doesn’t Do – And Should

19 Comments. Leave new

  • Jose C. Ortiz
    May 28, 2019 9:30 am

    so is rather preferable to use cross apply ?

    Reply
    • Jose – if you’re using a scalar function, it doesn’t really matter how you refer to it in a query: you’re still screwed.

      If you’re considering CROSS APPLY, you’re most likely thinking of a different kind of function: table-valued functions. Those have a totally different set of caveats.

      Reply
  • Stephen Mandeville
    May 28, 2019 9:36 am

    SQL Server execution plans are like relationships…they are full of lies.

    lol

    been saying this for years to the devss don’t touch function in set based processing

    Reply
  • I’ve spent a lot of the past couple of years extracting UDFs from code. Apologies if this is a stupid question but why do proprietary functions not suffer the same fate. Are they not the same under the hood?

    Reply
    • Joe – this post is about proprietary (user-defined) functions, so I think what you’re asking is, “Why don’t system functions suffer the same performance fate as user-defined functions?” The short answer there is that Microsoft can do performance optimizations on system functions (building them into the engine itself) that they can’t do for our wild and crazy functions that can be unpredictable.

      Reply
  • Terry Stoneberg
    May 28, 2019 9:56 am

    Does it matter if the scalar function is in the WHERE clause instead of the SELECT clause?

    Reply
  • Bryan Rebok
    May 28, 2019 11:02 am

    Brent, I agree with your point 100%.

    In testing your example locally, I believe that I found something interesting. If you create another comparable function WITH SCHEMABINDING and then run the COUNT query for both functions, the performance is the same.

    However, if you run a query that selects the DISTINCT results, the performance of the schema bound function is comparable to a query that doesn’t call the function at all. The performance of the function without schema binding is still extremely slow.

    — here is the baseline
    SELECT DISTINCT 1
    FROM dbo.Users;

    — still excruciatingly slow
    SELECT DISTINCT dbo.Returns1(1)
    FROM dbo.Users;

    — much faster than the non-schema bound version and comparable to the query without the function call
    SELECT DISTINCT dbo.Returns1WithSchemaBinding(1)
    FROM dbo.Users;

    Reply
    • That’s great, except, uh, that’s different results than my query. As long as you’re willing to change the query results, you can often make queries faster. 😀

      Reply
      • Bryan Rebok
        May 28, 2019 11:40 am

        Agreed. Changing the query results is probably not the best thing. 🙂

        I was just trying to show that not all query patterns in SQL Server seem to exhibit the scalar UDF slowness when (1) the function doesn’t select from tables and (2) schema binding is enabled on the function. Removing the DISTINCT keyword from the queries in my test resulted in similar performance, where the schema-bound scalar UDF was much faster than the non-schema bound scalar UDF.

        Reply
  • AlessandroD
    May 29, 2019 12:52 am

    Scalar user-defined functions: bug by design… 🙂
    SQL Server is a db engine, so it’s a bug, nothing else

    Reply
  • Not UDF’s, but similar issues with reports putting RANK on the columns (legacy, they won’t re-write…you know the drill). I’m trying to gently push them into at least dumping the raw data into temp tables first, but for now, they know this query (overnight report fortunately), will be slow.

    Reply
  • Kevin Kelso
    May 29, 2019 8:10 am

    Your reference to Pentium 3 reminded me of one of the best videos ever created. Enjoy. https://www.youtube.com/watch?v=qpMvS1Q1sos

    Reply
  • This is a load of BS. Time to get over the hysteria about UDFs. Same with cursors for that matter. They can be useful tools just as well as abused tools. They are not inherently evil.

    Reply
  • I occurred a strange error:

    Msg 596, Level 21, State 1, Line 0 Cannot continue the execution because the session is in the kill state.

    It seems that error is generated when I use a cursor like :

    DECLARE dbCursor CURSOR FOR SELECT (SELECT TOP 1 VAL FROM dbo.fn_Table(‘n:’,dbo.fn_Scalar) ORDER BY VAL) from table

    If I taken the select to run it, it is ok.

    If in cursor I execute only fn_Table OR fn_Scalar, it is ok.

    If in cursor I execute BOTH functions, I have the above error.

    In older versions of SQL server (than 2019) the cursor is running properly.

    Could you, please, give me an idea?

    Thank you.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.