Don’t Use Scalar User-Defined Functions in Computed Columns.

Last Updated May 6, 2020

Scalar functions in computed columns cause all queries that hit that table to execute serially. But it gets worse!

Scalar functions in computed columns
cause index maintenance to go single-threaded.

If you’re running Expensive Edition, index rebuilds can be both online and parallel. That’s pretty cool, because it keeps all your gadgets and gizmos mostly available during the whole operation, and the parallel bit usually makes things faster.

That is, unless you have a computed column in there that references a scalar function. I decided to write my test function to not perform any data access so it could be persisted. It’s dead simple, and I’m tacking it on to a column in the PostLinks table of the Stack Overflow database.

CREATE FUNCTION dbo.PIDMultiplier (@pid int)
RETURNS INT
    WITH RETURNS NULL ON NULL INPUT,
         SCHEMABINDING
AS
    BEGIN
        DECLARE @Out BIGINT;
        SELECT  @Out = @pid * 2
        RETURN @Out;
    END;
GO

ALTER TABLE dbo.PostLinks
ADD Multiplied AS dbo.PIDMultiplier(PostId) PERSISTED;

CREATE FUNCTION dbo.PIDMultiplier (@pid int)

RETURNS INT

WITH RETURNS NULL ON NULL INPUT,

SCHEMABINDING

BEGIN

DECLARE @Out BIGINT;

SELECT @Out = @pid * 2

RETURN @Out;

END;

ALTER TABLE dbo.PostLinks

ADD Multiplied AS dbo.PIDMultiplier(PostId) PERSISTED;

For this one, all we have to do is turn on actual execution plans and rebuild the index, then drop the column and rebuild again.

ALTER TABLE dbo.PostLinks REBUILD WITH ( ONLINE = ON );

ALTER TABLE dbo.PostLinks DROP COLUMN Multiplied;

ALTER TABLE dbo.PostLinks REBUILD WITH ( ONLINE = ON );

ALTER TABLE dbo.PostLinks DROP COLUMN Multiplied;

Here are my execution plans. The rebuild I ran when the table had my computed column in it stayed serial.

Parallel, sans computed column:

Dude, you're getting a parallel. — Dude, you’re getting a parallel.

They cause DBCC CHECKDB
to go single-threaded, too.

Probably the most important maintenance item you should be doing, aside from backups, is running DBCC CHECKDB. Seriously, if you’re not doing them both, start today. Ola Hallengren has basically done all the work for you. Back when I had a real job, I used his scripts everywhere.

Before we were so rudely interrupted by a soap box, we were talking about parallelism. This part was a little bit more complicated, but don’t worry, you don’t have to follow along. Just look at the pretty pictures. Sleep now. Yes. Sleep.

The first couple times I tried, the DBCC check never went parallel. Since I’m on my laptop, and not a production server, I can set Cost Threshold for Parallelism to 0. You read that right, ZE-RO! Hold onto your drool dish.

With that set, I fire up Ye Olde Oaken sp_BlitzTrace so I can capture everything with Extended Events. You’ll need all three commands, but you’ll probably have to change @SessionId, and you may have to change @TargetPath. Run the first command to start your session up.

EXEC dbo.sp_BlitzTrace @SessionId = 61 , @Action = 'start' , @TargetPath = 'c:\temp\' , @TraceParallelism = 1 , @TraceExecutionPlansAndKillMyPerformance = 1 

EXEC dbo.sp_BlitzTrace @Action = 'stop' 

EXEC dbo.sp_BlitzTrace @Action = 'read'

EXEC dbo.sp_BlitzTrace @SessionId = 61 , @Action = 'start' , @TargetPath = 'c:\temp\' , @TraceParallelism = 1 , @TraceExecutionPlansAndKillMyPerformance = 1

EXEC dbo.sp_BlitzTrace @Action = 'stop'

EXEC dbo.sp_BlitzTrace @Action = 'read'

With that running, toss in your DBCC command. I’m only using DBCC CHECKTABLE here to simplify. Rest assured, if you run DBCC CHECKDB, the CHECKTABLE part is included. The only checks that DBCC CHECKDB doesn’t run are CHECKIDENT and CHECKCONSTRAINT. Everything else is included.

DBCC CHECKTABLE('dbo.PostLinks') WITH NO_INFOMSGS, ALL_ERRORMSGS

ALTER TABLE dbo.PostLinks
ADD Multiplied AS dbo.PIDMultiplier(PostId) PERSISTED;

DBCC CHECKTABLE('dbo.PostLinks') WITH NO_INFOMSGS, ALL_ERRORMSGS

ALTER TABLE dbo.PostLinks

ADD Multiplied AS dbo.PIDMultiplier(PostId) PERSISTED;

Run DBCC CHECKTABLE, add the computed column back, and then run it again. When those finish, run the sp_BlitzTrace commands to stop and read session data. You should see execution plans for each run, and they should be way different.

So even DBCC checks are serialized. Crazy, right? I’d been hearing about performance hits to varying degrees when running DBCC checks against tables with computed columns for a while, but never knew why. There may be a separate reason for regular computed columns vs. ones that reference scalar functions. When I took the equivalent SQL out of a function, the DBCC check ran parallel.

ALTER TABLE dbo.PostLinks ADD Multiplied AS PostId * 2 PERSISTED;

1	ALTER TABLE dbo.PostLinks ADD Multiplied AS PostId * 2 PERSISTED;

Of course, those online index rebuilds running single threaded might be a blessing in disguise, if you haven’t patched SQL recently.

I don’t have much of a grand closing paragraph here. These things can seriously mess you up for a lot of reasons. If you’re a vendor, please get away from using scalar functions, and please please don’t use them in computed columns.

Thanks for reading!

It’s Now Easier to Query sp_BlitzFirst’s Historical Tables

SQL Server Agent is an application server.

27 Comments. Leave new

Graeme Martin
January 26, 2016 1:43 pm

DECLARE @Out INT; –?

Reply
- Erik Darling
  January 26, 2016 2:29 pm
  
  You’d be surprised.
  
  Reply
Alex Friedman
January 27, 2016 7:32 am

Expensive Edition heh

Reply
Christian
January 27, 2016 7:54 am

You say ” . If you’re a vendor, please get away from using scalar functions, […]”
What do you think about using scalar functions in check constraints?

Reply
- Erik Darling
  January 27, 2016 8:11 am
  
  I feel that you would probably win a marksman badge for shooting yourself in the foot.
  
  Reply
Haris Khan
January 27, 2016 11:02 am

Scalar Functions ALWAYS cause any queries using them to run single-threaded. It’s horrific in computed columns because the function basically causes the query to run RBAR. Why would anyone think this is a smart idea?

Reply
- Jeff Moden
  July 26, 2017 3:27 pm
  
  If the computed column is PERSISTED, it’s not the problem that you would think.
  
  Reply
  - Erik Darling
    July 26, 2017 3:33 pm
    
    The computed column that I use in this demo is persisted.
    
    As far as regular query use goes, perhaps not, but you should see what happens when you insert a bunch of rows 🙂
    
    Reply
    - Jeff Moden
      July 26, 2017 3:44 pm
      
      Agreed but it’s either going to happen from the table or it’s going to happen in code. iSFs will certainly make such a thing faster but if the column is supposed to change when someone updates other columns, it’s one way to ensure that happens.
      
      I DO agree that computed columns can be a royal PITA when it comes to table changes especially if you’ve made the mistake of functions that call functions but sometimes the promise of future pain is worth simplicity in the present. 😉 It’s a whole lot better than some nub that want’s to build the formula into a regular view and then query on it. 😉
      
      Reply
      - Erik Darling
        July 26, 2017 3:54 pm
        
        It would be really nice if iTVFs were allowed in computed columns as long as they had a TOP 1 in them (with some schemabinding guarantees), since there’s apparently no way to fix scalar functions.
      - Jeff Moden
        July 26, 2017 4:40 pm
        
        What would be even better would be if MS were to fix scalar functions to make them more practical without having to do some trick to make it fast.
      - Brent Ozar
        July 27, 2017 6:42 am
        
        Jeff – yeah, amen to that!
    - Jeff Moden
      July 26, 2017 4:45 pm
      
      Understood. I was replying to Haris. Computed columns have served me well when I needed them to. It’s like voting for POTUS… sometimes (all the time with POTUS), you have to choose between the lesser of two evils. 😉
      
      Reply
Cody
June 15, 2016 1:28 am

I was caught out by this today. It’s garbage because now you have to include all of your simple computed columns as part of the table definition, rather than simply referencing the UDF in one place.

If you have lots of computed columns that reference other computed columns that’s a lot of copy and pasting to create a technical debt / maintenance nightmare.

Not fixed in SQL 2016 either.

Reply
- VM
  July 19, 2017 8:00 pm
  
  And you want to do this because you want to bury business logic so deep that no one can figure out how the application works?
  
  Reply
- Jeff Moden
  July 26, 2017 3:30 pm
  
  As a bit of a sidebar and except when you make the mistake of shrinking a database, fragmentation doesn’t matter. I’ve not rebuilt my indexes since 17 Jan 2016 and performance actually got better over the first 3 months of that.
  
  Reply
  - Erik Darling
    July 26, 2017 3:35 pm
    
    I generally agree that rebuilds shouldn’t be run as regular maintenance for performance, but I would say that for good reasons to rebuild (like changing a setting, definition, or fill factor of an index), this will be a problem.
    
    Reply
    - Jeff Moden
      August 6, 2017 10:01 am
      
      Heh… “Fill Factor”. Prior to 2016, I would have agreed. Now, not so much. I’ve found that indexes will create an effective “natural fill factor” due to page splits if you just leave them alone. The extra space produced by the page splits is normally used quite nicely when other data is added to the table because NCIs are normally not temporal or otherwise ever increasing in nature. Although I have a personal hatred for GUIDs (actually a misnomer since the advent of Type 4 GUIDs), even those benefit from just leaving the indexes alone to do what they were designed to do.
      
      Again, the exception to that rule is the horrific method they used to shrink a database. I don’t know what else they did there to screw things to the floor so badly but that’s the only time that I’ll rebuild or reorganize indexes anymore and seems to work very nicely whether you have a heavy OLTP, batch, or combination of predominate queries. Brent was definitely sucking the right kind of bong water when he first made that suggestion for SAN based systems. 😉
      
      Reply
      - Erik Darling
        August 7, 2017 12:39 pm
        
        Hey Jeff — yeah, Fill Factor was just an example of an option you can change with a Rebuild. It’s not something I run around changing.
        
        Thanks!
      - Stephen Cena
        May 31, 2024 12:11 pm
        
        “Fill factor” feels like the darkness control on your toaster. Yes it’s there. Yes it gives you the illusion you have control over something you shouldn’t. We turn it anyways hoping that we get the toast we oh so want and end up just burning ourselves.
Daniel C O'Brien
November 27, 2017 2:55 pm

Is there a way to create a computed column that queries another table without using a UDF? I realize it would not be deterministic and could not be persisted. I’m on SQL Server 2016 if it makes any difference.

Reply
- Erik Darling
  November 27, 2017 3:00 pm
  
  No, but that doesn’t mean there’s not a better way to do what you want. Try asking the question on dba.stackexchange.com.
  
  Reply
Daniel C O'Brien
November 27, 2017 3:32 pm

Thanks – here’s the question if you have thoughts – https://dba.stackexchange.com/questions/191843/referencing-another-table-from-a-computed-column.

Reply
Jeff Moden
June 24, 2023 6:54 pm

It’s long overdue and I kept forgetting to post this. I first have to apologize because my previous posts doubted the claim of this article. Looking back at them, I even confused myslef. 🙁

I did my own independent tests for the claim that “Scalar functions in computed columns
cause index maintenance to go single-threaded.”

And, as I think I stated on another similar post here in the world of OZAR, I have verified with both test code and production code that if you have computed column that uses a scalar UDF, then everything (that I’ve tried) having to do with that table is going single threaded even when the column is not referenced directly.

Belated but very strong “Thank you” goes out to Erik, Brent, and a couple of threadizens on these related articles.

That claim is 100% correct and I stand corrected.

Reply
- Brent Ozar
  June 26, 2023 9:47 am
  
  Doesn’t that suck? It’s so terrible. One of those hilarious gotchas that catches people by surprise.
  
  Reply
  - Jeff Moden
    June 27, 2023 4:26 pm
    
    Heh… And all that fancy stuff they built in to revert to the best plan boils down to which wheel you want missing on your coding skateboard. Lordy.
    
    Thanks again for what you and former team members have done and continue to do for the community.
    
    Reply
UDF (User Defined Functions) and Serialization – EugeneChiang.com
February 21, 2024 11:16 pm

[…] https://www.brentozar.com/archive/2016/01/another-reason-why-scalar-functions-in-computed-columns-is… […]

Reply