T-SQL & Development

Using Triggers to Replace Scalar UDFs on Computed Columns

By Brent Ozar · October 20, 2020 · 46 comments

Your database is riddled with computed columns whose definition includes a scalar user-defined function. Even up to & including SQL Server 2019, which boasts faster scalar function processing, any table that includes a scalar function cause all access to that table to go single-threaded. In that case, a trigger can actually be a great replacement.

No, wait, come back. Triggers have a pretty bad reputation amongst the database community because:

You can pile a lot of work into them, leading to surprise slowdowns at scale
You can hold locks open for a long time as you work through a lot of tables
Troubleshooting can be tricky since authors rarely log errors generated in triggers
They almost always have a bug involving multi-record processing

But there are some places where a trigger is actually the least-worst fit, and this is one of ’em. To show the problem, I’ll take the mid-size 50GB Stack Overflow database and do a simple count of the number of rows:

USE StackOverflow2013;
GO
SELECT SUM(1) FROM dbo.Users;
GO

USE StackOverflow2013;

SELECT SUM(1) FROM dbo.Users;

The execution plan goes parallel:

But add a computed column to the table, and then run our count again:

CREATE OR ALTER FUNCTION dbo.fn_GetSeniority(@Reputation INT)
RETURNS VARCHAR(10)
AS
    BEGIN
	DECLARE @Seniority VARCHAR(10)
    IF @Reputation > 10000
		SET @Seniority = 'Senior';
	ELSE
		SET @Seniority = 'Junior';
	RETURN @Seniority;
    END;
GO
ALTER TABLE dbo.Users 
	ADD Seniority AS dbo.fn_GetSeniority(Reputation);
GO
SELECT SUM(1) FROM dbo.Users;

CREATE OR ALTER FUNCTION dbo.fn_GetSeniority(@Reputation INT)

RETURNS VARCHAR(10)

BEGIN

DECLARE @Seniority VARCHAR(10)

IF @Reputation > 10000

SET @Seniority = 'Senior';

ELSE

SET @Seniority = 'Junior';

RETURN @Seniority;

END;

ALTER TABLE dbo.Users

ADD Seniority AS dbo.fn_GetSeniority(Reputation);

SELECT SUM(1) FROM dbo.Users;

And the query can no longer go parallel, even though it’s not calling the function:

If we look at the query plan’s properties, we can see that SQL Server was unable to build a parallel execution plan, although it’s not obvious as to why:

We can fix that with a trigger.

Instead of basing the Seniority column off a computed, user-defined function, we can:

Remove the computed column
Add a real column
Keep it up to date with a trigger

Here’s how:

ALTER TABLE dbo.Users 
	DROP COLUMN Seniority;
GO
ALTER TABLE dbo.Users 
	ADD Seniority VARCHAR(10);
GO
/* Seed the starting values */
UPDATE dbo.Users
	SET Seniority = dbo.fn_GetSeniority(Reputation);
GO
CREATE OR ALTER TRIGGER Users_Seniority ON dbo.Users
AFTER INSERT, UPDATE
AS  
BEGIN
UPDATE u
	SET Seniority = dbo.fn_GetSeniority(u.Reputation)
	FROM dbo.Users u
	INNER JOIN inserted i ON u.Id = i.Id;
END;  
GO

ALTER TABLE dbo.Users

DROP COLUMN Seniority;

ALTER TABLE dbo.Users

ADD Seniority VARCHAR(10);

/* Seed the starting values */

UPDATE dbo.Users

SET Seniority = dbo.fn_GetSeniority(Reputation);

CREATE OR ALTER TRIGGER Users_Seniority ON dbo.Users

AFTER INSERT, UPDATE

BEGIN

UPDATE u

SET Seniority = dbo.fn_GetSeniority(u.Reputation)

FROM dbo.Users u

INNER JOIN inserted i ON u.Id = i.Id;

END;

When I insert or update the users table, the trigger goes single-threaded as it updates the Seniority column – but that’s it. Now, when I run a count again, the count can go multi-threaded:

Sure, I would rather have the application owners recompile their app and keep the Seniority column up to date themselves as they do inserts & updates, but I often don’t have that luxury – especially when lots of applications are touching the Users table, and they’re all accustomed to the Seniority computed column just being constantly updated, automatically.

This post isn’t meant to be a start-to-finish tutorial on how to write the fastest triggers – it’s just a quick 500 words to remind you that in this scenario, triggers are better than the alternative of having scalar UDFs as computed columns.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

46 comments

Dave_W

October 20, 2020 at 9:36 am

Since the Seniority value should not change if the Reputation column isn’t updated, you can avoid running the Seniority update when users are changing their profile data, etc. Just put the update in a block protected by the conditional “IF ( UPDATE (Reputation) ) ” to only run the update if the Reputation column was inserted/updated. This would be most beneficial if you have a heavyweight function that’s dependent on an oft-unchanged column.

Reply
1. Brent Ozar
  
  October 20, 2020 at 10:04 am
  
  Dave_W – make sure you read the linked post about the bug I find in most triggers. You’re in for a nasty surprise. 😉
  
  Reply
  1. Dave_W
    
    October 20, 2020 at 11:16 am
    
    Totally get the set-based SQL issue – something I’ve found in code handed to me as well. Just saying that if someone updates half the rows in the table, but not the Reputation column, there’s no need to take the performance hit of running the Seniority update for the updated rows. Nothing in the BOL for UPDATE() says it doesn’t work for a trigger invoked by a multi-row update.
    
    Reply
  2. Dave_W
    
    October 21, 2020 at 11:07 am
    
    Okay, I’ve read your new post about UPDATE(). I’ll forgive you for not realizing I’m one of the 0.01% of the population who know what UPDATE() actually does. You thought I was trying to horizontally partition the trigger – only run it for records with changed values. In actuality, I was trying to vertically partition the trigger – don’t run it if the Reputation column is not in the UPDATE SQL statement. Or am I not as smrt as I thunk I are?
    
    Reply
Viking Student since 2017

October 20, 2020 at 10:30 am

Since in-line table valued functions aren’t parallelism inhibitors, would converting the function also work?

Reply
1. Jeff Moden
  
  October 20, 2020 at 11:00 pm
  
  @Viking,
  
  I’m pretty sure that you cannot use an iTVF in a computed column. Only scalar expressions are allowed for computed columns. To use an iTVF, you’d need to us a subquery, and that’s not allowed. I don’t know of a way to use an iTVF without it being part of a FROM clause.
  
  If you know a way, man… please share it because I’d love to use it.
  
  Reply
  1. Viking Student since 2017
    
    October 21, 2020 at 8:14 am
    
    Ah…sorry to disappoint. I haven’t needed to use a function in a calculated column and wasn’t aware that you couldn’t use an iTVF.
    
    Reply
Eddy Neal

October 20, 2020 at 11:37 am

Depending on the definition of the computed column, could you alternatively mark it as persisted and enable trace flag 176?

Reply
1. Jeff Moden
  
  October 21, 2020 at 6:51 am
  
  Cool tip. I’m going to give that a try on my test box and see what digs. Thanks.
  
  Reply
2. Jeff Moden
  
  October 21, 2020 at 7:52 am
  
  Ok… I just read up on trace flag 176 and, while it does deal with computed columns, it doesn’t appear to have anything to do with the problem that Brent described in the article. I’m not sure why you think it might be the fix for a computed column that’s based on a Scalar UDF denying the ability for queries against the table to go parallel. Please explain?
  
  Reply
  1. Ben Thul
    
    October 21, 2020 at 8:48 am
    
    If the column is persisted, presumably the function needn’t be evaluated at run time and so its property as a parallelism inhibitor would seem moot. Something to test I suppose but it seems reasonable to me.
    
    Reply
    1. Brent Ozar
      
      October 21, 2020 at 8:49 am
      
      Ben – bad news. Your assumption is incorrect.
      
      Reply
  2. Eddy Neal
    
    October 21, 2020 at 2:05 pm
    
    From what I’d been reading, trace 176 disabled the expansion of persisted computed column definitions. This is supposed to stop the Scalar UDF definition from making its way into the query plan, which in turn prevents the query plan from being serialised.
    
    There’s an SO answer with a bit more detail and a link to an older article describing the behaviour: https://dba.stackexchange.com/a/187370.
    
    I’ve only got access to an Azure SQL instance at the moment, so can’t set the trace flag to test this though.
    
    Ben – I was able to confirm while gathering the first query plan that any reference to a Scalar UDF (even if the computed column is persisted) makes the query fall back to serial execution with a NonParallelPlanReason of TSQLUserDefinedFunctionsNotParallelizable.
    
    Reply
    1. Jeff Moden
      
      October 21, 2020 at 3:42 pm
      
      I’ve been in search of more info on why people think that Trace Flag 176 could be a magic bullet for this and found much more detail, including some of the pitfalls of using it as well as why it can work in the following article.
      
      https://sqlperformance.com/2017/05/sql-plan/properly-persisted-computed-columns
      
      I’m going to give it a whirl on my laptop and see what happens insofar as parallelism goes. Unfortunately, that’s not going to be a great test for the negative aspects that Paul White writes about in the referenced article.
      
      Reply
Jeff Moden

October 20, 2020 at 11:15 pm

Ok… this is one of those things that I’ven somehow never heard about before. High praise to you posting about it because I’ve got tables that have computed columns that use Scalar UDFs. After finding this good article, I also did the test with a properly PERSISTED Scalar UDF and no difference.

The funny part was that I originally couldn’t get your code to go parallel at all. It turns out that’s because I have the Cost Threshold for Parallelism set to 50 on my boxes. After I (temporarily) took it down to 25, then it would go parallel until the Scalar UDF was added as a computed column, persisted or otherwise.

You really had me worried about computed columns and so I also checked to make sure that a Scalar expression other than a Scalar UDF would still allow parallelism. Thankfully, it does.

The good part of all this is that now I know and I learned something new today. The bad part is, now I’ve got a shedload of computed columns to fix. Thank you Microsoft for continuing to give me things to do.

Lordy… to recoin and old prayer…
“Dear Lord, I’m a DBA. Please give me patience because, if you give me strength, I’m going to need bail money to go with it”. 😀

Reply
1. Brent Ozar
  
  October 21, 2020 at 2:19 am
  
  Woohoo! I taught Jeff Moden something today! I need to quit while I’m ahead.
  
  Reply
  1. Jeff Moden
    
    October 21, 2020 at 6:45 am
    
    BWAAAA-HAAAAA!!!! No. Please keep working and send me some money so I can retire. 😀
    
    Reply
Silviu Spataru

October 21, 2020 at 12:32 am

I replaced such a table with a view (with the same name), plus another table to hold the non-computed columns, plus an INSTEAD OF DELETE trigger.
But using triggers – I could’ve saved several hours of work. And SELECTs would go even faster. Thanks, Brent.

Reply
Wojciech Sawicki

October 21, 2020 at 1:31 am

My version:
CREATE OR ALTER TRIGGER [Users_Seniority] ON [dbo].[Users] AFTER INSERT, UPDATE AS
BEGIN
IF @@ROWCOUNT = 0 –prevent from update Users … where 1=0;
RETURN;
SET NOCOUNT ON; –rows affected, careful, this will reset the @@ROWCOUNT variable

IF UPDATE([Seniority]) –prevent from update Users set Seniority=’Unknown’
BEGIN
–THROW?
ROLLBACK;
END;
IF UPDATE([Reputation]) –prevent from modify [Reputation] if update Users set Age=Age+1
BEGIN
UPDATE [u]
SET
[Seniority] = [dbo].[fn_GetSeniority]([u].[Reputation])
FROM [dbo].[Users] [u]
INNER JOIN [inserted] [i] ON [u].[Id] = [i].[Id]
WHERE NOT EXISTS
(
SELECT *
FROM –prevent from update Users set Reputation=Reputation
[deleted] [d]
WHERE [d].[Id] = [i].[Id]
AND [d].[Reputation] = [i].[Reputation]
);
END;
END;
GO

Reply
1. Brent Ozar
  
  October 21, 2020 at 2:15 am
  
  Bad news – read the post carefully about the part “bug involving multi-record processing”. Your trigger has that bug.
  
  Reply
  1. Wojciech Sawicki
    
    October 21, 2020 at 2:32 am
    
    Bug with one record without any variable?
    
    Reply
    1. Brent Ozar
      
      October 21, 2020 at 2:34 am
      
      No, the IF UPDATE([Seniority]) won’t work the way you expect with multiple rows. For more details, go ahead and work through the code in the related post – I put a ton of work into that. Thanks!
      
      Reply
      1. Wojciech Sawicki
        
        October 21, 2020 at 4:57 am
        
        I did it and my trigger work correctly.
      2. Jeff Moden
        
        October 21, 2020 at 7:19 am
        
        I guess I don’t understand here, Brent (probably low caffeine levels for both of us). I understand the variable thing and totally agree with the RBAR thing in triggers being a problem but I’ve never had a problem with IF UPDATE() in a proper set based trigger. And, yeah, I read the article you’ve provided the link for.
        
        Also, I’m not seeing the “bug” in the code Wojciech posted above. Are you sure that you’re not mistaking the IF @@ROWCOUNT = 0 RETURN early out of the trigger as “the bug”? That’s definitely not “the bug” we all know and love and it’s actually a pretty good practice, especially if you have code in the trigger that does row detection in the INSERTED/DELETED logical tables to calculate which of the 3 basic triggering events caused the trigger to fire.
        
        The only thing that I don’t like about IF UPDATE() is that it’s just an indication that a column was part of an update (or insert/delete) rather than whether or not a value on a row actually changed to something different.
      3. Brent Ozar
        
        October 21, 2020 at 7:22 am
        
        No problemo! I’ve made a note to write a longer blog post about the use of “IF UPDATE(colname)” in the future.
      4. Jeff Moden
        
        October 21, 2020 at 7:37 am
        
        Ok. Looking forward to that. Thanks, Brent. I’m still confused because I didn’t see IF UPDATE() in the trigger article you provided the link to. Have I simply looked at the wrong article?
      5. Jeff Moden
        
        October 21, 2020 at 7:44 am
        
        To be sure, here’s the article that your “bug involving multi-record processing” link led to.
        https://www.brentozar.com/archive/2019/05/the-silent-bug-i-find-in-most-triggers/
        
        If there’s something in there about a bug with IF UPDATE() then I’m not finding it. And, no… despite appearances in the written word, I’m not trying to be argumentative. I really want to know because I’ve 1) never experienced a problem with IF UPDATE() and 2) we use it a whole lot at work and so if there is an actual bug with IF UPDATE() itself, then I really want to know what it is.
      6. Brent Ozar
        
        October 21, 2020 at 7:50 am
        
        Jeff – here you go: https://www.brentozar.com/archive/2020/10/if-your-trigger-uses-update-its-probably-broken/
      7. Jeff Moden
        
        October 21, 2020 at 8:26 am
        
        BWAAA-HAAA!!! Good Sir, you’re amazing. You just cranked out that article. I’m humbled.
        
        To be clear, from the discussion, I was thinking that IF UPDATE() itself had a problem. It doesn’t. I’m much relieved.
        
        I totally agree with your new article on the subject of how IF UPDATE() can be so severely and incorrectly used. Looking back at Wojciech’s trigger code, it definitely has the IF UPDATE() problem especially with the IF UPDATE([Seniority]), which looks like it will actually prevent inserts from occurring.
        
        The point that you’re trying to make is the IF UPDATE() doesn’t only apply to UPDATEs. It also applies to INSERTS/DELETES, both of which will qualify IF UPDATE(somecolumn) because both of those actions “update” the “somecolumn” being tested.
        
        That’s also why I’ve never run into a problem with IF UPDATE()… I made that good realization long ago and, especially on multi-triggers (handles more than one triggering event), I first calculate what the triggering event is and write the code so the fault you so very adequately describe in your good article never happens. I’m a bit ashamed that I missed what you were getting at in Wojciech’s trigger code… it turns out the apparent caffeine deprivation was all on my part.
        
        Thanks for your attention to detail and the article you cranked out so nasty fast. Again, truly amazing.
      8. Brent Ozar
        
        October 21, 2020 at 8:29 am
        
        Hahaha, thank YOU for sticking through it, sir. I wouldn’t have cranked it out so quickly if you hadn’t asked.
      9. Wojciech Sawicki
        
        October 21, 2020 at 10:19 am
        
        ok i understood the misunderstanding.
        update should not be understood as a column that changes value, but a column appearing in the update or insert list. In my case, I want to prevent any changes to the Seniority column. It just can’t follow a set or insert statement. Prevention from changes set Reputation = Reputation is in a condition where not exists
    2. Brian Boodman
      
      October 21, 2020 at 6:54 am
      
      “Bug with one record without any variable?”
      
      Admittedly, this bug probably won’t be triggered if you never modify more than one row at a time.
      However, I’m not a fan of leaving time bombs in your code.
      
      Reply
      1. Jeff Moden
        
        October 21, 2020 at 7:34 am
        
        Nah… From what I can see in the trigger he posted, his code doesn’t have “the bug”. He was asking about the IF UPDATE() thing, which isn’t a problem no matter how many or how few rows are processed by a trigger.
      2. Jeff Moden
        
        October 21, 2020 at 8:35 am
        
        Ok… correction to the above. I missed what Brent was getting at. IF UPDATE() itself doesn’t have a problem… people’s perception of what it does is the problem. Despite its name, it doesn’t only fire on UPDATEs of a given column. It also fires on all INSERTS and DELETEs because those also cause an “update” of EVERY column in a row or rows.
        
        It’s not good excuse but the reason why I missed the IF UPDATE() bug in Wojciech’s trigger code is I was concentrating on the wrong thing (I should have been drinking more coffee instead of posting). Although I’m still a bit timid this morning for the well deserved proverbial snap in the ass with a rolled up towel that Brent gave me, I’ll say that it looks like Wojciech’s trigger code will prevent inserts from actually happening.
        
        Ok… I’m going back to sucking my thumb and twiddling my hair while in search of more coffee now. 😀
Steve M

October 21, 2020 at 6:12 am

I’m going to wait with TF176

https://support.microsoft.com/en-us/help/4541096/kb4541096-fix-access-violation-occurs-when-you-run-query-on-computed-c

Reply
Joel Coehoorn

October 21, 2020 at 9:48 am

What about:

ALTER TABLE dbo.Users
ADD Seniority AS
CASE WHEN Reputation > 1000 THEN ‘Senior’
ELSE ‘Junior’ END

Reply
Kevin

October 21, 2020 at 10:40 am

I had to do a quick look through our many tables and only encountered a couple computed columns. Simple first last name concatenation stuff. Seems harmless. And I can see the benefit to moving it to a trigger, in these cases. However, the table is a temporal table. Which because I can’t use INSTEAD trigger, I would now have 2 updates going to the history table instead of one, if I implemented the trigger. So now, the trick is, weigh the parallelism performance hit versus the row count hit on the history table, and both are queried pretty evenly. This will be fun to test.

Reply
Faiza

October 24, 2020 at 4:32 am

Hi Brent,
Thanks for such a great post. What about adding a computed column as below:
[CreatedDate] AS (todatetimeoffset(dateadd(minute,[CreatedDateOFFSET],[CreatedDateUTC]),[CreatedDateOFFSET])) PERSISTED

Also if the table has an index will it go parallel against Index and not just against the table?

Reply
1. Brent Ozar
  
  October 24, 2020 at 4:34 am
  
  Faiza – feel free to grab the database and give it a shot yourself. That’s one of the best ways to learn so you can see how things work. Go for it!
  
  Reply
Faiza

October 24, 2020 at 4:43 am

Its going parallel in both cases, that’s why I was confused how todatetimeoffset is going in parallel.

Reply
1. Brent Ozar
  
  October 24, 2020 at 4:44 am
  
  It’s not a user-defined function. This post is about UDFs: user-defined functions, not system-defined functions.
  
  Reply
Faiza

October 24, 2020 at 4:51 am

Sorry and thanks.
I did tried both it also forces all Index’s to go single threaded too.

Reply
Ron VIP Student since 2017

November 16, 2020 at 4:06 pm

Wow, taught Jeff Moden something! I bow to the masters.

Reply
Charlieface

December 5, 2020 at 12:52 pm

In this particular case, you could write the computed column directly as a CASE expression. If a scalar function was really necessary you should have added WITH SCHEMABINDING.

But people tend to use functions to add info gleaned from other tables, for example a total of child items. In this case I would use an indexed view.

Reply
1. Brent Ozar
  
  December 5, 2020 at 2:01 pm
  
  Charlieface – I’m using a simplified version of the computed column in order to quickly teach the topic. Thanks!
  
  Reply
  1. Charlieface
    
    December 7, 2020 at 1:27 pm
    
    True, but I still think there is no almost no use case for scalar functions in computed columns, as the two options I mention are better.
    If you really must, defo use SCHEMABINDING.
    
    Reply

Using Triggers to Replace Scalar UDFs on Computed Columns

We can fix that with a trigger.

Get my new posts by email

Keep digging

46 comments

Leave a comment Cancel reply