In last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or not. I noted that even with an index on Reputation, SQL Server 2022 simply ignored the index, did a table scan, and spent 2 minutes of time calling the user-defined function on a row-by-row basis.
First off, a disclaimer: I didn’t write that exercise with a goal of showing the difference between old & new SQL Server versions, between old & new compatibility levels. However, several folks in the comments said, “Hey, I’m on an older version, on an older compatibility level, and the query is blazing fast already. What do I need to fix?” That’s a good reminder of what work you need to do before you go live on SQL Server 2022 (or any new version.) Moving on.
One Fix: Persisting the Computed Column
One way to fix it is to drop the computed column, then create it again with the PERSISTED keyword:
|
1 2 3 4 5 |
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl; GO ALTER TABLE dbo.Users ADD IsValidUrl AS dbo.IsValidUrl(WebsiteUrl) PERSISTED; GO |
That way, SQL Server computes each row’s contents just once, and then persists it in the table itself, instead of running the function every time we query it. By executing the above code, we get:
Wait, what? As a reminder, here’s the contents of our function:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX)) RETURNS BIT AS BEGIN DECLARE @Result BIT = 0 -- Regex pattern for a valid URL -- This pattern covers: -- - Scheme (http, https, ftp) -- - Optional username:password -- - Domain name or IP address -- - Optional port -- - Optional path -- - Optional query string -- - Optional fragment IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%' BEGIN IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme AND @Url NOT LIKE '% %' -- No spaces allowed in URL AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path BEGIN SET @Result = 1 END END RETURN @Result END GO |
HOW THE CELKO IS THAT NOT DETERMINISTIC? What, does string comparison change from time to time? Books Online’s article on determinism says “All of the string built-in functions are deterministic,” but I’m guessing this has something to do with a possibly changing collation, and I couldn’t care less about digging further. That’s left as an exercise for the reader. Moving on.
An Actual Quick Fix: WITH SCHEMABINDING
I used to hear people say, “You should put WITH SCHEMABINDING on your scalar functions and they’ll go faster.” Every single time I tried it, it made no difference whatsoever.
Until…last week! In the comments, Ag suggested to add that hint to the function, and it ran rapidly! We’ll remove the existing column, tweak the function – the only change is WITH SCHEMABINDING – and then add it back in:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl; GO CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX)) RETURNS BIT WITH SCHEMABINDING AS BEGIN DECLARE @Result BIT = 0 -- Regex pattern for a valid URL -- This pattern covers: -- - Scheme (http, https, ftp) -- - Optional username:password -- - Domain name or IP address -- - Optional port -- - Optional path -- - Optional query string -- - Optional fragment IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%' BEGIN IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme AND @Url NOT LIKE '% %' -- No spaces allowed in URL AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path BEGIN SET @Result = 1 END END RETURN @Result END GO ALTER TABLE dbo.Users ADD IsValidUrl AS dbo.IsValidUrl(WebsiteUrl); |
Then rerun our query without changes:
|
1 2 3 4 |
SELECT TOP 200 * FROM dbo.Users WHERE IsValidUrl = 1 ORDER BY Reputation DESC; |
Even on a large Stack Overflow database, the query runs in milliseconds and produces a great execution plan:
SQL Server uses the index only only reads a few hundred rows before the work is done. Sure, there’s no parallelism in the query, and the plan properties report that “TSQLUserDefinedFunctionsNotParallelizable”, but who cares? The query’s so fast, even with millions of candidate rows, that it doesn’t matter. Good enough, ship it.
My Old Faithful Fix: A Trigger
I know. I know, the word makes you break out in itchy hives, but triggers are a great way to quickly replace computed columns.
Let’s drop the existing computed column, add a new “real” column, configure the trigger to populate it going forward, and then backfill it to set up the existing data:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl; GO ALTER TABLE dbo.Users ADD IsValidUrl BIT; GO CREATE OR ALTER TRIGGER Users_IsValidUrl ON dbo.Users AFTER INSERT, UPDATE AS BEGIN UPDATE u SET IsValidUrl = dbo.IsValidUrl(u.WebsiteUrl) FROM dbo.Users u INNER JOIN inserted i ON u.Id = i.Id; END; GO UPDATE dbo.Users SET WebsiteUrl = WebsiteUrl; GO |
The first few statements will run instantly (assuming nobody else is locking the Users table), but the last statement will take a minute since it’s actually adding a new value to every row in the Users table, AND it’s calling a scalar function on every row. You could make that faster with set-based operations, but more on that in a minute.
After that, run the query we’ve been trying to tune:
|
1 2 3 4 |
SELECT TOP 200 * FROM dbo.Users WHERE IsValidUrl = 1 ORDER BY Reputation DESC; |
It runs in a couple milliseconds (down from a minute) and uses the Reputation index, as the execution plan shows:
I like the trigger solution for a few reasons:
- It keeps the function’s logic as-is, which can be helpful in situations where the logic is really complex, time-tested, and labor-intensive to rewrite.
- It can be implemented really quickly.
- Queries against the table can now go parallel if they need to.
But there’s a drawback: the trigger’s still getting called on every insert and update. If the workload involves batch jobs to load or modify the table, that’s going to be problematic because they’re going to slow down. Those modifications didn’t call the computed column’s function before since they most likely weren’t reading that column. If our workload includes batch jobs, we’re gonna need a faster solution. We’ll clean up after ourselves before trying the next one:
|
1 2 3 4 |
DROP TRIGGER [dbo].[Users_IsValidUrl] GO ALTER TABLE dbo.Users DROP COLUMN IsValidUrl; GO |
Rewriting the Function
Most of the time when you hear database people complaining about functions, they’re complaining about user-defined functions. For example, in this case, we can get a dramatic performance improvement if we forklift the business logic out of the user-defined function and inline it directly into the table’s definition itself. Here’s the end result:
|
1 2 3 4 5 6 7 8 9 |
ALTER TABLE dbo.Users ADD IsValidUrl AS CASE WHEN (WebsiteUrl LIKE 'http://%' OR WebsiteUrl LIKE 'https://%' OR WebsiteUrl LIKE 'ftp://%') AND (WebsiteUrl LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme AND WebsiteUrl NOT LIKE '% %' -- No spaces allowed in URL AND WebsiteUrl LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part AND WebsiteUrl LIKE '%/%' -- Ensure there's at least one slash after the domain AND WebsiteUrl LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path ) THEN 1 ELSE 0 END; GO |
Note that the logic isn’t identical because I had to tweak it: instead of using IF, I used CASE statements, and there’s no @Url parameter anymore because we’re referencing the WebsiteUrl column directly. The more complex your user-defined function is, the harder work this is going to be. To learn how to do these kinds of rewrites, check out Microsoft’s PDF whitepaper on Froid, the inline function technology they added in SQL Server 2019. That paper gives several specific examples of language to use when rewriting functions to go inline, which is quite nice of them.
Now, when the SELECT runs, the plan looks big:
And while the estimates aren’t good, the plan shape is fine, it uses the index, and it runs in milliseconds.
Eagle-eyed readers will note that I did not use the PERSISTED keyword in the above example. If you add that, the resulting plan shape looks even better because the function doesn’t have to run at read time:
And the query finishes in even fewer milliseconds. However, executing the ALTER TABLE with the PERSISTED keyword will lock & rewrite the table, so it’s a question of whether your workload can tolerate that outage. You could always start with the non-persisted version to quickly make the pain go away, and then later if you still need more performance, take an outage to drop the non-persisted computed column (which will be instantaneous) and then add the persisted one (which will be the opposite of instantaneous.)
Hope you enjoyed the challenge! For more exercises, check out the prior Query Exercises posts and catch up on the ones you missed so far.







5 Comments. Leave new
A UDF must be created with schemabinding in order to be deterministic. I think that if it isn’t deterministic, than any column that is based on this function will not have any statistics on that column. If you create the function with schemabinding and the function is deterministic, than a column that is based on the function can have statistics and a better query plan.
By the way since I’m not a big fan of UDFs my solution was to use a case statement directly in create table statement instead of basing it on the UDF.
I’m not sure, if it applies to the newer SQL versions, but using scalar functions on a computed column forced single threaded plans for this table (and every query, where this table is joind) *and* DBCC CHECKDB.
So I would always prever either the trigger or the case in the computed column. Drawback of the CASE is, that you may have multiple tables with those columns and have to ensure that you change them all, when you make logic adjustments. On the other hand you would need to execute an UPDATE on all tables with this trigger after a logic changes too (but you could limit it to the edge cases you just fixed, so it would be faster)
Isn’t putting the code in the column definition the same as using the same code in the function with schemabinding?
CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT WITH SCHEMABINDING
AS
BEGIN
RETURN CASE
WHEN (WebsiteUrl LIKE ‘http://%’ OR WebsiteUrl LIKE ‘https://%’ OR WebsiteUrl LIKE ‘ftp://%’)
AND (WebsiteUrl LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
AND WebsiteUrl NOT LIKE ‘% %’ — No spaces allowed in URL
AND WebsiteUrl LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
AND WebsiteUrl LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
AND WebsiteUrl LIKE ‘%[A-Za-z0-9/_-]%’) — Ensure there’s at least one valid character in the path
THEN 1
ELSE 0
END AS Result;
END
GO
https://dba.stackexchange.com/questions/140381/is-there-any-benefit-to-schemabinding-a-function-beyond-halloween-protection
I think I learned of this SQL Kiwi / Paul White post from you.
It isn’t the same. As long as you have a computed column that is based on a UDF, SQL Server won’t use parallelism when it reads the data from this table (even if the query doesn’t reference the computed column). While I agree that in the specific query that was used in this change it wasn’t a problem, in the real world you’ll might have other queries that read data from the same table, that should use parallelism, but won’t be able to.
That was a great read and reflects why you always need to be hungry to learn more and take your game to the next level – thanks Brent!