Query Exercise Answer: Fixing a Slow Computed Column

Last Updated July 18, 2024

In last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or not. I noted that even with an index on Reputation, SQL Server 2022 simply ignored the index, did a table scan, and spent 2 minutes of time calling the user-defined function on a row-by-row basis.

First off, a disclaimer: I didn’t write that exercise with a goal of showing the difference between old & new SQL Server versions, between old & new compatibility levels. However, several folks in the comments said, “Hey, I’m on an older version, on an older compatibility level, and the query is blazing fast already. What do I need to fix?” That’s a good reminder of what work you need to do before you go live on SQL Server 2022 (or any new version.) Moving on.

One Fix: Persisting the Computed Column

One way to fix it is to drop the computed column, then create it again with the PERSISTED keyword:

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO
ALTER TABLE dbo.Users ADD IsValidUrl
	AS dbo.IsValidUrl(WebsiteUrl) PERSISTED;
GO

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

ALTER TABLE dbo.Users ADD IsValidUrl

AS dbo.IsValidUrl(WebsiteUrl) PERSISTED;

That way, SQL Server computes each row’s contents just once, and then persists it in the table itself, instead of running the function every time we query it. By executing the above code, we get:

Wait, what? As a reminder, here’s the contents of our function:

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT
AS
BEGIN
    DECLARE @Result BIT = 0

    -- Regex pattern for a valid URL
    -- This pattern covers:
    -- - Scheme (http, https, ftp)
    -- - Optional username:password
    -- - Domain name or IP address
    -- - Optional port
    -- - Optional path
    -- - Optional query string
    -- - Optional fragment

    IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'
    BEGIN
        IF @Url LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND @Url NOT LIKE '% %'            -- No spaces allowed in URL
           AND @Url LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND @Url LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND @Url LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
        BEGIN
            SET @Result = 1
        END
    END

    RETURN @Result
END
GO

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))

RETURNS BIT

BEGIN

DECLARE @Result BIT = 0

-- Regex pattern for a valid URL

-- This pattern covers:

-- - Scheme (http, https, ftp)

-- - Optional username:password

-- - Domain name or IP address

-- - Optional port

-- - Optional path

-- - Optional query string

-- - Optional fragment

IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'

BEGIN

IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND @Url NOT LIKE '% %' -- No spaces allowed in URL

AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain

AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

BEGIN

SET @Result = 1

END

RETURN @Result

END

HOW THE CELKO IS THAT NOT DETERMINISTIC? What, does string comparison change from time to time? Books Online’s article on determinism says “All of the string built-in functions are deterministic,” but I’m guessing this has something to do with a possibly changing collation, and I couldn’t care less about digging further. That’s left as an exercise for the reader. Moving on.

An Actual Quick Fix: WITH SCHEMABINDING

I used to hear people say, “You should put WITH SCHEMABINDING on your scalar functions and they’ll go faster.” Every single time I tried it, it made no difference whatsoever.

Until…last week! In the comments, Ag suggested to add that hint to the function, and it ran rapidly! We’ll remove the existing column, tweak the function – the only change is WITH SCHEMABINDING – and then add it back in:

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT WITH SCHEMABINDING
AS
BEGIN
    DECLARE @Result BIT = 0

    -- Regex pattern for a valid URL
    -- This pattern covers:
    -- - Scheme (http, https, ftp)
    -- - Optional username:password
    -- - Domain name or IP address
    -- - Optional port
    -- - Optional path
    -- - Optional query string
    -- - Optional fragment

    IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'
    BEGIN
        IF @Url LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND @Url NOT LIKE '% %'            -- No spaces allowed in URL
           AND @Url LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND @Url LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND @Url LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
        BEGIN
            SET @Result = 1
        END
    END

    RETURN @Result
END
GO
ALTER TABLE dbo.Users ADD IsValidUrl
	AS dbo.IsValidUrl(WebsiteUrl);

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))

RETURNS BIT WITH SCHEMABINDING

BEGIN

DECLARE @Result BIT = 0

-- Regex pattern for a valid URL

-- This pattern covers:

-- - Scheme (http, https, ftp)

-- - Optional username:password

-- - Domain name or IP address

-- - Optional port

-- - Optional path

-- - Optional query string

-- - Optional fragment

IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'

BEGIN

IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND @Url NOT LIKE '% %' -- No spaces allowed in URL

AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain

AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

BEGIN

SET @Result = 1

END

RETURN @Result

END

ALTER TABLE dbo.Users ADD IsValidUrl

AS dbo.IsValidUrl(WebsiteUrl);

Then rerun our query without changes:

SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;

SELECT TOP 200 *

FROM dbo.Users

WHERE IsValidUrl = 1

ORDER BY Reputation DESC;

Even on a large Stack Overflow database, the query runs in milliseconds and produces a great execution plan:

SQL Server uses the index only only reads a few hundred rows before the work is done. Sure, there’s no parallelism in the query, and the plan properties report that “TSQLUserDefinedFunctionsNotParallelizable”, but who cares? The query’s so fast, even with millions of candidate rows, that it doesn’t matter. Good enough, ship it.

My Old Faithful Fix: A Trigger

I know. I know, the word makes you break out in itchy hives, but triggers are a great way to quickly replace computed columns.

Let’s drop the existing computed column, add a new “real” column, configure the trigger to populate it going forward, and then backfill it to set up the existing data:

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO
ALTER TABLE dbo.Users ADD IsValidUrl BIT;
GO
CREATE OR ALTER TRIGGER Users_IsValidUrl ON dbo.Users
AFTER INSERT, UPDATE
AS  
BEGIN
UPDATE u
	SET IsValidUrl = dbo.IsValidUrl(u.WebsiteUrl)
	FROM dbo.Users u
	INNER JOIN inserted i ON u.Id = i.Id;
END;  
GO  
UPDATE dbo.Users
	SET WebsiteUrl = WebsiteUrl;
GO

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

ALTER TABLE dbo.Users ADD IsValidUrl BIT;

CREATE OR ALTER TRIGGER Users_IsValidUrl ON dbo.Users

AFTER INSERT, UPDATE

BEGIN

UPDATE u

SET IsValidUrl = dbo.IsValidUrl(u.WebsiteUrl)

FROM dbo.Users u

INNER JOIN inserted i ON u.Id = i.Id;

END;

UPDATE dbo.Users

SET WebsiteUrl = WebsiteUrl;

The first few statements will run instantly (assuming nobody else is locking the Users table), but the last statement will take a minute since it’s actually adding a new value to every row in the Users table, AND it’s calling a scalar function on every row. You could make that faster with set-based operations, but more on that in a minute.

After that, run the query we’ve been trying to tune:

SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;

SELECT TOP 200 *

FROM dbo.Users

WHERE IsValidUrl = 1

ORDER BY Reputation DESC;

It runs in a couple milliseconds (down from a minute) and uses the Reputation index, as the execution plan shows:

I like the trigger solution for a few reasons:

It keeps the function’s logic as-is, which can be helpful in situations where the logic is really complex, time-tested, and labor-intensive to rewrite.
It can be implemented really quickly.
Queries against the table can now go parallel if they need to.

But there’s a drawback: the trigger’s still getting called on every insert and update. If the workload involves batch jobs to load or modify the table, that’s going to be problematic because they’re going to slow down. Those modifications didn’t call the computed column’s function before since they most likely weren’t reading that column. If our workload includes batch jobs, we’re gonna need a faster solution. We’ll clean up after ourselves before trying the next one:

DROP TRIGGER [dbo].[Users_IsValidUrl]
GO
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO

DROP TRIGGER [dbo].[Users_IsValidUrl]

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

Rewriting the Function

Most of the time when you hear database people complaining about functions, they’re complaining about user-defined functions. For example, in this case, we can get a dramatic performance improvement if we forklift the business logic out of the user-defined function and inline it directly into the table’s definition itself. Here’s the end result:

ALTER TABLE dbo.Users ADD IsValidUrl
AS CASE WHEN (WebsiteUrl LIKE 'http://%' OR WebsiteUrl LIKE 'https://%' OR WebsiteUrl LIKE 'ftp://%')
	AND (WebsiteUrl LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND WebsiteUrl NOT LIKE '% %'            -- No spaces allowed in URL
           AND WebsiteUrl LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND WebsiteUrl LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND WebsiteUrl LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
     ) THEN 1 ELSE 0 END;
GO

ALTER TABLE dbo.Users ADD IsValidUrl

AS CASE WHEN (WebsiteUrl LIKE 'http://%' OR WebsiteUrl LIKE 'https://%' OR WebsiteUrl LIKE 'ftp://%')

AND (WebsiteUrl LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND WebsiteUrl NOT LIKE '% %' -- No spaces allowed in URL

AND WebsiteUrl LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND WebsiteUrl LIKE '%/%' -- Ensure there's at least one slash after the domain

AND WebsiteUrl LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

) THEN 1 ELSE 0 END;

Note that the logic isn’t identical because I had to tweak it: instead of using IF, I used CASE statements, and there’s no @Url parameter anymore because we’re referencing the WebsiteUrl column directly. The more complex your user-defined function is, the harder work this is going to be. To learn how to do these kinds of rewrites, check out Microsoft’s PDF whitepaper on Froid, the inline function technology they added in SQL Server 2019. That paper gives several specific examples of language to use when rewriting functions to go inline, which is quite nice of them.

Now, when the SELECT runs, the plan looks big:

And while the estimates aren’t good, the plan shape is fine, it uses the index, and it runs in milliseconds.

Eagle-eyed readers will note that I did not use the PERSISTED keyword in the above example. If you add that, the resulting plan shape looks even better because the function doesn’t have to run at read time:

And the query finishes in even fewer milliseconds. However, executing the ALTER TABLE with the PERSISTED keyword will lock & rewrite the table, so it’s a question of whether your workload can tolerate that outage. You could always start with the non-persisted version to quickly make the pain go away, and then later if you still need more performance, take an outage to drop the non-persisted computed column (which will be instantaneous) and then add the persisted one (which will be the opposite of instantaneous.)

Hope you enjoyed the challenge! For more exercises, check out the prior Query Exercises posts and catch up on the ones you missed so far.

Announcing New Membership Tiers: Free & Mentoring

[Video] Office Hours in a Mexican Hot Tub

5 Comments. Leave new

Adi
July 25, 2024 7:20 am

A UDF must be created with schemabinding in order to be deterministic. I think that if it isn’t deterministic, than any column that is based on this function will not have any statistics on that column. If you create the function with schemabinding and the function is deterministic, than a column that is based on the function can have statistics and a better query plan.
By the way since I’m not a big fan of UDFs my solution was to use a case statement directly in create table statement instead of basing it on the UDF.

Reply
Thomas Franz
July 25, 2024 8:23 am

I’m not sure, if it applies to the newer SQL versions, but using scalar functions on a computed column forced single threaded plans for this table (and every query, where this table is joind) *and* DBCC CHECKDB.

So I would always prever either the trigger or the case in the computed column. Drawback of the CASE is, that you may have multiple tables with those columns and have to ensure that you change them all, when you make logic adjustments. On the other hand you would need to execute an UPDATE on all tables with this trigger after a logic changes too (but you could limit it to the edge cases you just fixed, so it would be faster)

Reply
Dennis Post
July 25, 2024 9:44 am

Isn’t putting the code in the column definition the same as using the same code in the function with schemabinding?

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT WITH SCHEMABINDING
AS
BEGIN
RETURN CASE
WHEN (WebsiteUrl LIKE ‘http://%’ OR WebsiteUrl LIKE ‘https://%’ OR WebsiteUrl LIKE ‘ftp://%’)
AND (WebsiteUrl LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
AND WebsiteUrl NOT LIKE ‘% %’ — No spaces allowed in URL
AND WebsiteUrl LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
AND WebsiteUrl LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
AND WebsiteUrl LIKE ‘%[A-Za-z0-9/_-]%’) — Ensure there’s at least one valid character in the path
THEN 1
ELSE 0
END AS Result;
END
GO

https://dba.stackexchange.com/questions/140381/is-there-any-benefit-to-schemabinding-a-function-beyond-halloween-protection

I think I learned of this SQL Kiwi / Paul White post from you.

Reply
- Adi
  July 25, 2024 10:38 am
  
  It isn’t the same. As long as you have a computed column that is based on a UDF, SQL Server won’t use parallelism when it reads the data from this table (even if the query doesn’t reference the computed column). While I agree that in the specific query that was used in this change it wasn’t a problem, in the real world you’ll might have other queries that read data from the same table, that should use parallelism, but won’t be able to.
  
  Reply
Douglas Osborne
July 26, 2024 12:46 pm

That was a great read and reflects why you always need to be hungry to learn more and take your game to the next level – thanks Brent!

Reply

Get Free SQL Stuff

"*" indicates required fields

X/Twitter

This field is for validation purposes and should be left unchanged.

Name*

First Last

Email*

Things I want*

First Responder Kit (Scripts and Guides)

Monday Links (SQL & tech news)

Blog posts (2-4 per week)

Query Exercise Answer: Fixing a Slow Computed Column

One Fix: Persisting the Computed Column

An Actual Quick Fix: WITH SCHEMABINDING

My Old Faithful Fix: A Trigger

Rewriting the Function

Related

5 Comments. Leave new

Leave a Reply Cancel reply

Hi! I’m Brent Ozar.

Get Free SQL Stuff