Query Exercise Answer: Fixing a Slow Computed Column

In last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or not. I noted that even with an index on Reputation, SQL Server 2022 simply ignored the index, did a table scan, and spent 2 minutes of time calling the user-defined function on a row-by-row basis.

First off, a disclaimer: I didn’t write that exercise with a goal of showing the difference between old & new SQL Server versions, between old & new compatibility levels. However, several folks in the comments said, “Hey, I’m on an older version, on an older compatibility level, and the query is blazing fast already. What do I need to fix?” That’s a good reminder of what work you need to do before you go live on SQL Server 2022 (or any new version.) Moving on.

One Fix: Persisting the Computed Column

One way to fix it is to drop the computed column, then create it again with the PERSISTED keyword:

That way, SQL Server computes each row’s contents just once, and then persists it in the table itself, instead of running the function every time we query it. By executing the above code, we get:

Wait, what? As a reminder, here’s the contents of our function:

HOW THE CELKO IS THAT NOT DETERMINISTIC? What, does string comparison change from time to time? Books Online’s article on determinism says “All of the string built-in functions are deterministic,” but I’m guessing this has something to do with a possibly changing collation, and I couldn’t care less about digging further. That’s left as an exercise for the reader. Moving on.

An Actual Quick Fix: WITH SCHEMABINDING

I used to hear people say, “You should put WITH SCHEMABINDING on your scalar functions and they’ll go faster.” Every single time I tried it, it made no difference whatsoever.

Until…last week! In the comments, Ag suggested to add that hint to the function, and it ran rapidly! We’ll remove the existing column, tweak the function – the only change is WITH SCHEMABINDING – and then add it back in:

Then rerun our query without changes:

Even on a large Stack Overflow database, the query runs in milliseconds and produces a great execution plan:

SQL Server uses the index only only reads a few hundred rows before the work is done. Sure, there’s no parallelism in the query, and the plan properties report that “TSQLUserDefinedFunctionsNotParallelizable”, but who cares? The query’s so fast, even with millions of candidate rows, that it doesn’t matter. Good enough, ship it.

My Old Faithful Fix: A Trigger

I know. I know, the word makes you break out in itchy hives, but triggers are a great way to quickly replace computed columns.

Let’s drop the existing computed column, add a new “real” column, configure the trigger to populate it going forward, and then backfill it to set up the existing data:

The first few statements will run instantly (assuming nobody else is locking the Users table), but the last statement will take a minute since it’s actually adding a new value to every row in the Users table, AND it’s calling a scalar function on every row. You could make that faster with set-based operations, but more on that in a minute.

After that, run the query we’ve been trying to tune:

It runs in a couple milliseconds (down from a minute) and uses the Reputation index, as the execution plan shows:

I like the trigger solution for a few reasons:

  • It keeps the function’s logic as-is, which can be helpful in situations where the logic is really complex, time-tested, and labor-intensive to rewrite.
  • It can be implemented really quickly.
  • Queries against the table can now go parallel if they need to.

But there’s a drawback: the trigger’s still getting called on every insert and update. If the workload involves batch jobs to load or modify the table, that’s going to be problematic because they’re going to slow down. Those modifications didn’t call the computed column’s function before since they most likely weren’t reading that column. If our workload includes batch jobs, we’re gonna need a faster solution. We’ll clean up after ourselves before trying the next one:

Rewriting the Function

Most of the time when you hear database people complaining about functions, they’re complaining about user-defined functions. For example, in this case, we can get a dramatic performance improvement if we forklift the business logic out of the user-defined function and inline it directly into the table’s definition itself. Here’s the end result:

Note that the logic isn’t identical because I had to tweak it: instead of using IF, I used CASE statements, and there’s no @Url parameter anymore because we’re referencing the WebsiteUrl column directly. The more complex your user-defined function is, the harder work this is going to be. To learn how to do these kinds of rewrites, check out Microsoft’s PDF whitepaper on Froid, the inline function technology they added in SQL Server 2019. That paper gives several specific examples of language to use when rewriting functions to go inline, which is quite nice of them.

Now, when the SELECT runs, the plan looks big:

And while the estimates aren’t good, the plan shape is fine, it uses the index, and it runs in milliseconds.

Eagle-eyed readers will note that I did not use the PERSISTED keyword in the above example. If you add that, the resulting plan shape looks even better because the function doesn’t have to run at read time:

And the query finishes in even fewer milliseconds. However, executing the ALTER TABLE with the PERSISTED keyword will lock & rewrite the table, so it’s a question of whether your workload can tolerate that outage. You could always start with the non-persisted version to quickly make the pain go away, and then later if you still need more performance, take an outage to drop the non-persisted computed column (which will be instantaneous) and then add the persisted one (which will be the opposite of instantaneous.)

Hope you enjoyed the challenge! For more exercises, check out the prior Query Exercises posts and catch up on the ones you missed so far.

Previous Post
Announcing New Membership Tiers: Free & Mentoring
Next Post
[Video] Office Hours in a Mexican Hot Tub

5 Comments. Leave new

  • A UDF must be created with schemabinding in order to be deterministic. I think that if it isn’t deterministic, than any column that is based on this function will not have any statistics on that column. If you create the function with schemabinding and the function is deterministic, than a column that is based on the function can have statistics and a better query plan.
    By the way since I’m not a big fan of UDFs my solution was to use a case statement directly in create table statement instead of basing it on the UDF.

    Reply
  • Thomas Franz
    July 25, 2024 8:23 am

    I’m not sure, if it applies to the newer SQL versions, but using scalar functions on a computed column forced single threaded plans for this table (and every query, where this table is joind) *and* DBCC CHECKDB.

    So I would always prever either the trigger or the case in the computed column. Drawback of the CASE is, that you may have multiple tables with those columns and have to ensure that you change them all, when you make logic adjustments. On the other hand you would need to execute an UPDATE on all tables with this trigger after a logic changes too (but you could limit it to the edge cases you just fixed, so it would be faster)

    Reply
  • Dennis Post
    July 25, 2024 9:44 am

    Isn’t putting the code in the column definition the same as using the same code in the function with schemabinding?

    CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
    RETURNS BIT WITH SCHEMABINDING
    AS
    BEGIN
    RETURN CASE
    WHEN (WebsiteUrl LIKE ‘http://%’ OR WebsiteUrl LIKE ‘https://%’ OR WebsiteUrl LIKE ‘ftp://%’)
    AND (WebsiteUrl LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
    AND WebsiteUrl NOT LIKE ‘% %’ — No spaces allowed in URL
    AND WebsiteUrl LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
    AND WebsiteUrl LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
    AND WebsiteUrl LIKE ‘%[A-Za-z0-9/_-]%’) — Ensure there’s at least one valid character in the path
    THEN 1
    ELSE 0
    END AS Result;
    END
    GO

    https://dba.stackexchange.com/questions/140381/is-there-any-benefit-to-schemabinding-a-function-beyond-halloween-protection

    I think I learned of this SQL Kiwi / Paul White post from you.

    Reply
    • It isn’t the same. As long as you have a computed column that is based on a UDF, SQL Server won’t use parallelism when it reads the data from this table (even if the query doesn’t reference the computed column). While I agree that in the specific query that was used in this change it wasn’t a problem, in the real world you’ll might have other queries that read data from the same table, that should use parallelism, but won’t be able to.

      Reply
  • Douglas Osborne
    July 26, 2024 12:46 pm

    That was a great read and reflects why you always need to be hungry to learn more and take your game to the next level – thanks Brent!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.