Query Exercise: Fix This Computed Column.
Take any size of the Stack Overflow database and check out the WebsiteUrl column of the Users table:
Sometimes it’s null, sometimes it’s an empty string, sometimes it’s populated but the URL isn’t valid.
Let’s say that along the way, someone decided to ask ChatGPT to build a function to check for valid website URLs, and then used that code to add a new IsValidUrl column to the Users table (and yes, this is inspired by a real-life client example, hahaha):
Transact-SQL
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX)) RETURNS BIT AS BEGIN DECLARE @Result BIT = 0 -- Regex pattern for a valid URL -- This pattern covers: -- - Scheme (http, https, ftp) -- - Optional username:password -- - Domain name or IP address -- - Optional port -- - Optional path -- - Optional query string -- - Optional fragment IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%' BEGIN IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme AND @Url NOT LIKE '% %' -- No spaces allowed in URL AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path BEGIN SET @Result = 1 END END RETURN @Result END GO ALTER TABLE dbo.Users ADD IsValidUrl AS dbo.IsValidUrl(WebsiteUrl); |
The user-defined function isn’t accurate, for starters – it’s letting things through that aren’t valid URLs, and stopping things that are actually valid – but let’s set that aside for a second.
What happens when we try to get the top users by reputation? To give SQL Server the best shot, I’m using SQL Server 2022, with the database in 2022 compatibility level, with an index on Reputation:
Transact-SQL
|
1 2 3 4 5 6 7 8 |
CREATE INDEX Reputation ON dbo.Users(Reputation); SET STATISTICS TIME, IO ON; GO SELECT TOP 200 * FROM dbo.Users WHERE IsValidUrl = 1 ORDER BY Reputation DESC; |
The actual query plan is deceivingly simple, despite its terrible performance that takes about a minute to run:

WHERE IS YOUR FUNCTION INLINING GOD NOW? I could make movie jokes about this all day. Anyhoo, the plan ignored the Reputation index, did a 2-second table scan, and spent nearly a minute doing the scalar function and the filtering.
To add insult to injury, if you’re going to do 1 minute of CPU work, it sure would help to parallelize that query across multiple cores – but that query can’t get parallelism, as explained in the plan properties:

Your Query Exercise this week isn’t to fix the accuracy of the function – you can leave it as inaccurate if you like. Your challenge is to have the exact same query run in less than a second. Our goal is to avoid changing application code, and to get a very fast fix in place without blaming the developers. You’re the data professional: be professional.
Put your queries in a Github Gist, and include that link in your comments. Check out the solutions from other folks, and compare and contrast your work. After you’ve given it a try, read my thoughts in the following post. Have fun!
Update: please read the post in its entirety, and follow the instructions. Please don’t just throw ideas in there or half-formed T-SQL. For someone to test your work, they need to see your exact work. C’mon, folks – this isn’t a major project, just a single function. Be fair to people on the other side of the screen.
Related

Hi! I’m Brent Ozar.
I make Microsoft SQL Server go faster. I love teaching, travel, cars, and laughing. I’m based out of Las Vegas. He/him. I teach SQL Server training classes, or if you haven’t got time for the pain, I’m available for consulting too.
Get Free SQL Stuff
"*" indicates required fields


54 Comments. Leave new
Question — Just to make sure I am clear on the requirements/available options: Is persisting the computed column allowed?
This was also my first thought – seems like the easiest solution unless we’re missing something
Sure! Give it a shot and see what happens. 😉
SELECT TOP 200 *
FROM dbo.Users u
WHERE dbo.IsValidUrl(u.WebsiteURL) = 1 –IsValidUrl = 1
ORDER BY Reputation DESC;
Carlos, read the second to last paragraph of the post out loud to yourself, and see if anything jumps out at you.
no changes to the select statement. Got it!
I don’t have access to the Stack Overflow database to test the results, but it seems to me that you can simply change the computed column.
ALTER TABLE dbo.Users ADD IsValidUrl
AS CAST(CASE when (WebsiteUrl LIKE ‘http://%’
OR WebsiteUrl LIKE ‘https://%’
OR WebsiteUrl LIKE ‘ftp://%’
)
AND (WebsiteUrl LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
AND WebsiteUrl NOT LIKE ‘% %’ — No spaces allowed in URL
AND WebsiteUrl LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
AND WebsiteUrl LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
AND WebsiteUrl LIKE ‘%[A-Za-z0-9/_-]%’
)
THEN 1
ELSE 0
END as BIT) PERSISTED
Ed – dang, it’s a shame there’s not a link in the post to provide free access to the Stack Overflow database. WINK WINK
(And that way you’d find out that, like so often in life, the things you think will work… don’t.)
yeah, I just don’t have a good spot to put it, without someone whining about security, or disk space, or the issue du jour.
Add with schemabinding to the function?
Try it and see what happens! That’s the point of the exercise, to find out if your guesses work. (And no, saying “Brent please test my guess” isn’t how exercises work, hahaha.)
Icic
So, after adding “with schemabinding”, the query uses the reputation index and then key lookup. It runs in less than 1 sec.
That doesn’t seem to be the case for me. But again, like I mention in the blog post, feel free to post your exact code so other folks can see & test it.
Worked perfectly for me, assuming you also persist the column.
https://www.brentozar.com/pastetheplan/?id=SyrhwoB_A
Hmm, can you show the function? I hate to keep beating this drum, but the instructions in the post are really important – people need to see the code in order to understand what’s happening. Thanks!
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO
CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT
WITH SCHEMABINDING
AS
BEGIN
DECLARE @Result BIT = 0
— Regex pattern for a valid URL
— This pattern covers:
— – Scheme (http, https, ftp)
— – Optional username:password
— – Domain name or IP address
— – Optional port
— – Optional path
— – Optional query string
— – Optional fragment
IF @Url LIKE ‘http://%’ OR @Url LIKE ‘https://%’ OR @Url LIKE ‘ftp://%’
BEGIN
IF @Url LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
AND @Url NOT LIKE ‘% %’ — No spaces allowed in URL
AND @Url LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
AND @Url LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
AND @Url LIKE ‘%[A-Za-z0-9/_-]%’ — Ensure there’s at least one valid character in the path
BEGIN
SET @Result = 1
END
END
RETURN @Result
END
GO
ALTER TABLE dbo.Users ADD IsValidUrl
AS dbo.IsValidUrl(WebsiteUrl);
GO
–CREATE INDEX Reputation ON dbo.Users(Reputation);
SET STATISTICS TIME, IO ON;
GO
SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;
Ag – I left instructions in the post on how to share your code via Github Gists. WordPress comments don’t work for code because it gets mangled. Sorry about that!
If we add “with schemabinding,” the function becomes deterministic so that SQL server can maintain statistics for this column. Therefore, the plan will be different.
The query still can’t run in parallel.
Here we go
https://gist.github.com/A-ops-star/2ff90da5ebf003925b5d0211dc184a6e
Perfect, that works! Here’s the execution plan: https://www.brentozar.com/pastetheplan/?id=rJMMWY8_A
Works great, runs in milliseconds. It’s totally okay that it can’t go parallel – there’s so little work involved. Good job!
It took some time, but I finally posted the answer. I appreciate your patience!
schemabind the function then you can persist it to the table.
CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT WITH schemabinding
Make sure to try it so you can test your hunch – you might be quite surprised. Cheers!
A lot of work to save those developers.
— Drop Column
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
— Add Column Back as BIT
ALTER TABLE dbo.Users ADD IsValidUrl BIT;
— Create an Insert and Update Triggers
CREATE TRIGGER [dbo].[TRI_IsValidURL]
ON [dbo].[Users]
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
UPDATE U
SET IsValidUrl = dbo.IsValidUrl(U.WebsiteUrl)
FROM Users U
JOIN inserted I ON I.id = U.id
END
CREATE TRIGGER [dbo].[TRU_IsValidURL]
ON [dbo].[Users]
AFTER UPDATE
AS
BEGIN
SET NOCOUNT ON
UPDATE U
SET IsValidUrl = dbo.IsValidUrl(U.WebsiteUrl)
FROM Users U
JOIN deleted D ON D.id = U.id
WHERE COALESCE(U.WebsiteUrl, ”) COALESCE(D.WebsiteUrl, ”)
END
— Update IsValidUrl Column
WHILE 1 = 1
BEGIN
UPDATE TOP (100000)
dbo.Users WITH (HOLDLOCK)
SET IsValidUrl = dbo.IsValidUrl(WebsiteUrl)
WHERE IsValidUrl is null
IF @@ROWCOUNT = 0
BREAK
END
—
SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;
Congratulations, and good work! You were the first to come up with a working solution! Good job.
Ah, I wasn’t the first to come up with the Trigger solution 🙂
I have a crusty old version of StackOverflow2010 that was still at compatibility level 100 (SQL Server 2008). The query ran almost instantly. Bumping up to 160 dragged it down to 3 seconds. Now I have something to work with.
I found exactly the same. My Compat Level 100 execution plan is just what Brent wants, see https://www.brentozar.com/pastetheplan/?id=HkhSSor_R but on 150, I get the same plan as Brent https://www.brentozar.com/pastetheplan/?id=H1mGLoSdA
How weird! I don’t know what quirk of the optimiser could cause this. My actual idea for fixing this was to go down the indexed view route (I was thinking nonclustered columnstore, just to see), but we’ve both stumbled on an alternative solution.
Here’s what I did:
1.Find the query id of the problematic query.
2.Apply query store hint ‘OPTION(USE HINT(”FORCE_LEGACY_CARDINALITY_ESTIMATION”))’ to get this running in less than 1 second.
3.Work on making code changes to the function for a permanent fix.
https://www.brentozar.com/pastetheplan/?id=BkxVs-UuA
SELECT query_hint_id,
query_id,
query_hint_text,
last_query_hint_failure_reason,
last_query_hint_failure_reason_desc,
query_hint_failure_count,
source,
source_desc
FROM sys.query_store_query_hints;
GO
EXEC sys.sp_query_store_set_hints @query_id= 3928, @query_hints = N’OPTION(USE HINT(”FORCE_LEGACY_CARDINALITY_ESTIMATION”))’;
AB – very nicely done! That works beautifully. Good work!
Rewrite the index:
CREATE INDEX [Reputation] ON [dbo].[Users]
(
[Reputation] DESC
)
INCLUDE([AboutMe],[Age],[CreationDate],[DisplayName],[DownVotes],[EmailHash],[LastAccessDate],[Location],[UpVotes],[Views],[WebsiteUrl],[AccountId]) WITH (DROP_EXISTING = ON)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 30 ms.
(200 rows affected)
Table ‘Users’. Scan count 1, logical reads 74, physical reads 0, page server reads 0, read-ahead reads 77, page server read-ahead reads 0, lob logical reads 8, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
(1 row affected)
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 568 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Anders – interesting, in the real world, would you double the size of the table with an index like this? And include extremely hot columns like Views and LastAccessDate, which would change every time every user updates a page?
Forgot the “You’re the data professional: be professional.” Sorry 🙂
Hahaha, no worries!
I tried following the steps on this post, but I somehow didn’t actually encounter these issues. Did I cheat by deciding not to give SQL “it’s best shot”? I used SQL 2019 with compatibility level 100, with the “Stack Overflow SQL Server Database – Mini 2010 Version.” I’d pulled from for some other blog post.
With that setup, running it as-is only took 165ms. My copy of the DB did NOT ignore the reputation column, thus it only actually read 291 rows of the reputation.
I don’t have anything to post a Gist, since I didn’t write any queries: I just ran the this post as-is on the copy of the DB I already had lying around. In leiu of a Gist, here’s my query plan: https://www.brentozar.com/pastetheplan/?id=rJ3icqIOA .
Try a modern compatibility level. 😉
Why run code when you don’t have to?
So my fix is to remove the computed column and make the IsValidUrl a true Bit columns and use INSERT and UPDATE TRIGGERs to see if the IsValidUrl needs updating.
You could leave the FUNCTION in or move the URL validation code to the TRIGGER.
I moved the code to trigger, again why make SQL Server’s harder then needed
https://gist.github.com/Montro1981/f09b611a0264ce434ec3b72f43b2d008
*why make SQL Server’s life harder then needed
Just for lulz and giggles:
“Case: Some developer named Brent Ozar asked CrapG… Uhm ChatGPT for a Function to valid an URL in SQL:
https://chatgpt.com/share/4c4ce280-4879-4df1-bb1d-ab0c3ec3f973
Query performance tanked after implementing this function and management asked me (Development DBA) to have a look.”
The query plans:
Updates with the trigger https://www.brentozar.com/pastetheplan/?id=HyypSqDO0
The nice an simple select https://www.brentozar.com/pastetheplan/?id=HydeLqP_A
Nicely done! That was the exact solution I started with too. Good job!
(Well, I should add – my updates were nowhere near as elegant as yours, honestly, heh.)
So am I right that we can change the code of the function, but not the select query?
You are exactly right!
create a filtered index where isvalidurl = 1
Read the post in its entirety, including the last paragraph. Thanks.
If the reason that Parallelism is not available for a function, it would be too obvious to move to a trigger which fires only in case WebsiteUrl is not null or an empty string…Brent is def hiding something precious from us…maybe we could have the computed column created and then execute a stored procedure that populates it? No, it couldn’t be this.
Got one: Hey, what if we use Edge’s Copilot instead of ChatGPT, and get this regex for validating the URLS:
((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w#-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)
(just kidding, I got it off stack overflow, oh the irony)
Got two: What if we create the column first as a BIT, then create the function, would that change anything?
Got three: Can we use a temp table inside of a function? That could better things up, or worsen them, who knows. Not me.
Got four: Create the computed column then create an index for it. Once the computation happens, with the select we should see a seek happening. Hope so!
Got zero: I suggest you read the last paragraph of Brent’s post.
Got three: Nope, no data manipulations in tables (temp or otherwise) in functions are allowed.
@Montro1981, does your “Got zero” relate to the intro of my comment? If that’s the case, I mention using a trigger, the same as your solution (which I found out about later), so what’s your problem with it?
Just so you know, people who start sentences with “I suggest” are never fun at parties.
@TY, I’m sorry if you feel offended by my comment on your post. My intention was to make a lighthearted joke.
I’ll explain further , because you started making a list of things with got one, two etc and you didn’t seemed to not follow the “rules” Brent laid down in the last paragraph of his post I thought it was funny to make a “got zero” and make a friendly suggestion, as I can’t tell you want you should and should not do.
As well as point out to you that your question in your “got three” is not possible.
@Montro1981, wow, that was too deep, thanks and good luck!
[…] last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or […]
[…] latest instance involved the recent Query Exercise where you were challenged to fix a computed column’s performance. In the comments, some folks noted that performance of the query was actually great on old compat […]
–Why was @Url NVARCHAR(max)
–Column WebsiteUrl is nvarchar(200)
–as result got cardinality estimate warning in plan
CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(200))
RETURNS BIT
———–IT makes different
with schemabinding
AS
BEGIN
DECLARE @Result BIT = 0
— Regex pattern for a valid URL
— This pattern covers:
— – Scheme (http, https, ftp)
— – Optional username:password
— – Domain name or IP address
— – Optional port
— – Optional path
— – Optional query string
— – Optional fragment
IF @Url LIKE ‘http://%’ OR @Url LIKE ‘https://%’ OR @Url LIKE ‘ftp://%’
BEGIN
IF @Url LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
AND @Url NOT LIKE ‘% %’ — No spaces allowed in URL
AND @Url LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
AND @Url LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
AND @Url LIKE ‘%[A-Za-z0-9/_-]%’ — Ensure there’s at least one valid character in the path
BEGIN
SET @Result = 1
END
END
RETURN @Result
END
GO
—And second changes
ALTER TABLE dbo.Users ADD IsValidUrl
AS dbo.IsValidUrl(WebsiteUrl) persisted not null
GO
–And thats all folks time=0ms