How Scalar User-Defined Functions Slow Down Queries

T-SQL
10 Comments

When your query has a scalar user-defined function in it, SQL Server may not parallelize it and may hide the work that it’s doing in your execution plan.

To show it, I’ll run a simple query against the Users table in the Stack Overflow database.

I don’t have an index on Reputation, so SQL Server has to sort all of the Users by their Reputation. That’s a CPU-intensive operation, so SQL Server automatically parallelizes it across multiple CPU cores:

The racing stripes on the plan operators indicate that the operations went parallel. Another way to see that is by using SET STATISTICS TIME ON, which adds CPU time and execution time information to the Messages tab of SSMS:

See how CPU time is higher than elapsed time? That’s an indication that the query went parallel. Because work was distributed across multiple cores, SQL Server was able to get 969 milliseconds of work done in just 357 milliseconds.

But when we add a scalar user-defined function…

Say our users want to see their names and locations formatted a little more nicely. Instead of two columns that say “Brent Ozar” and “San Diego”, they want a single column that says “Brent Ozar from San Diego”. And I don’t want to put that concatenation logic all over the place in every query I write, so I encapsulate it in a function:

The results are a little easier on the eyes:

Now, when I use that function inside the query, the query technically works fine:

But the execution plan is missing a little something:

And the statistics time output shows that it still needed a lot of CPU, but since it didn’t go parallel, it took longer on the clock:

If you dig deeply enough in the execution plan properties, SQL Server notes that it couldn’t build a valid parallel execution plan, but it doesn’t say why:

There are a few ways to fix this.

One way is to inline the contents of your function – literally copy/paste the scalar function’s contents directly into your query:

This query goes parallel, proving that the concatenation and coalescing isn’t what was stopping us from going parallel – it was the presence of a scalar user-defined function:

Statistics time output shows that we went parallel and ran faster overall:

 

Another way to fix it is to upgrade to SQL Server 2019 and set your database to 2019 compatibility level. One of 2019’s most ambitious features, code named Froid, aims to automatically inline scalar user-defined functions without you having to rewrite them.

When I set my database into 2019 compat level, the query with the scalar function goes parallel again:

And statistics time output shows the performance improvement:

However, SQL Server 2019’s scalar function inlining comes with a huge number of drawbacks. Your query can actually go slower, or your scalar function may use features that SQL Server 2019 refuses to inline.

To fix this problem:

Want to learn more about troubleshooting these kinds of issues?

Check out my How I Use the First Responder Kit class. I walk you through using sp_BlitzWho, sp_BlitzFirst, sp_BlitzIndex, and many of the other scripts in the open source First Responder Kit.

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.

Or, if you’ve got a Recorded Class Season Pass, you can hop in and watch the recordings in your account at any time.

See you in class!

Previous Post
How Bad Statistics Cause Bad SQL Server Query Performance
Next Post
When You’re Troubleshooting Blocking, Look at Query #2, Too.

10 Comments. Leave new

  • How does your above UDF perform when WITH SCHEMA BINDING is enabled within the function? Does that enable parallelism?

    Reply
  • How about if you create a table-valued function to do the same same function, and applying it to the query?

    Reply
    • (-1 for failing to check my grammar before hitting post)

      Reply
      • Sure, go ahead and give that a shot – that’s why I use open source databases for my demos, so y’all can follow along and try your own experiments. (I cover that answer in my Mastering Query Tuning class, but it’s tough to do justice to it here in the comments.)

        Reply
  • So scalar UDF when can they be used safely???

    Reply
  • Bertram Gilfoyle
    June 20, 2022 2:03 pm

    Hi Brent,

    great post. Thanks for pointing this issue out. One follow up question/thought: Since the cost of the UDF is set to 0 inappropriateley, does this mean that the query optimizer does some “stupid” stuff? For example, if high costs would be associated with the UDF, then the optimizer would first try to reduce the amount of tuples on which the UDF is applied, right? However, if the cost is set to 0, the optimizer thinks that he can apply the UDF to all tuples altough some of them are not needed, right? Hence, the processing time is bad because we are doing some work that is not necessary.

    From DB2 I know that there are some parameters, which can be used to control the behavior of UDFs. Is there something similar for Microsoft SQL Server?

    Many thanks in advance! Looking forward to your response!

    Reply
  • Just ran into this issue when moving a SQL 2016 server VM to newer hardware on a new VM host. The modern HCI VM hosts have much faster CPUs, so there were quite a few less physical CPU cores, resulting in massive increases in runtimes for any stored procs using a scalar function inside a query with a decent sized row count. Inlining the function code obviously fixes the issue, and we were able to do this for the longest running queries to get runtimes back to reasonable levels. I’m wondering if it makes sense to convert scalar functions to procedural code inside an assembly and then mass replace the hundreds of places scalar functions are still being called with the procedural code methods. Replacing all the function calls with inline code would take forever, and with the amount of code on this server, a version upgrade could take many months of testing. Any idea if we’d see the same issue with a call to a method in an assembly as we see with the scalar functions?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.