Query Exercise: Fix This Computed Column.

Query Exercises
54 Comments

Take any size of the Stack Overflow database and check out the WebsiteUrl column of the Users table:

Sometimes it’s null, sometimes it’s an empty string, sometimes it’s populated but the URL isn’t valid.

Let’s say that along the way, someone decided to ask ChatGPT to build a function to check for valid website URLs, and then used that code to add a new IsValidUrl column to the Users table (and yes, this is inspired by a real-life client example, hahaha):

The user-defined function isn’t accurate, for starters – it’s letting things through that aren’t valid URLs, and stopping things that are actually valid – but let’s set that aside for a second.

What happens when we try to get the top users by reputation? To give SQL Server the best shot, I’m using SQL Server 2022, with the database in 2022 compatibility level, with an index on Reputation:

The actual query plan is deceivingly simple, despite its terrible performance that takes about a minute to run:

Crouching Tiger, Hidden Scalar

WHERE IS YOUR FUNCTION INLINING GOD NOW? I could make movie jokes about this all day. Anyhoo, the plan ignored the Reputation index, did a 2-second table scan, and spent nearly a minute doing the scalar function and the filtering.

To add insult to injury, if you’re going to do 1 minute of CPU work, it sure would help to parallelize that query across multiple cores – but that query can’t get parallelism, as explained in the plan properties:

SpacesNotAvailableEither

Your Query Exercise this week isn’t to fix the accuracy of the function – you can leave it as inaccurate if you like. Your challenge is to have the exact same query run in less than a second. Our goal is to avoid changing application code, and to get a very fast fix in place without blaming the developers. You’re the data professional: be professional.

Put your queries in a Github Gist, and include that link in your comments. Check out the solutions from other folks, and compare and contrast your work. After you’ve given it a try, read my thoughts in the following post. Have fun!

Update: please read the post in its entirety, and follow the instructions. Please don’t just throw ideas in there or half-formed T-SQL. For someone to test your work, they need to see your exact work. C’mon, folks – this isn’t a major project, just a single function. Be fair to people on the other side of the screen.

Previous Post
Updated, Larger Stack Overflow Demo Database
Next Post
[Video] Office Hours in My Backyard

54 Comments. Leave new

  • Question — Just to make sure I am clear on the requirements/available options: Is persisting the computed column allowed?

    Reply
  • Carlos Benito
    July 17, 2024 5:46 pm

    SELECT TOP 200 *
    FROM dbo.Users u
    WHERE dbo.IsValidUrl(u.WebsiteURL) = 1 –IsValidUrl = 1
    ORDER BY Reputation DESC;

    Reply
  • I don’t have access to the Stack Overflow database to test the results, but it seems to me that you can simply change the computed column.
    ALTER TABLE dbo.Users ADD IsValidUrl
    AS CAST(CASE when (WebsiteUrl LIKE ‘http://%’
    OR WebsiteUrl LIKE ‘https://%’
    OR WebsiteUrl LIKE ‘ftp://%’
    )
    AND (WebsiteUrl LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
    AND WebsiteUrl NOT LIKE ‘% %’ — No spaces allowed in URL
    AND WebsiteUrl LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
    AND WebsiteUrl LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
    AND WebsiteUrl LIKE ‘%[A-Za-z0-9/_-]%’
    )
    THEN 1
    ELSE 0
    END as BIT) PERSISTED

    Reply
    • Ed – dang, it’s a shame there’s not a link in the post to provide free access to the Stack Overflow database. WINK WINK

      (And that way you’d find out that, like so often in life, the things you think will work… don’t.)

      Reply
      • yeah, I just don’t have a good spot to put it, without someone whining about security, or disk space, or the issue du jour.

        Reply
  • Add with schemabinding to the function?

    Reply
    • Try it and see what happens! That’s the point of the exercise, to find out if your guesses work. (And no, saying “Brent please test my guess” isn’t how exercises work, hahaha.)

      Reply
      • Icic
        So, after adding “with schemabinding”, the query uses the reputation index and then key lookup. It runs in less than 1 sec.

        Reply
        • That doesn’t seem to be the case for me. But again, like I mention in the blog post, feel free to post your exact code so other folks can see & test it.

          Reply
          • Worked perfectly for me, assuming you also persist the column.
            https://www.brentozar.com/pastetheplan/?id=SyrhwoB_A

          • Hmm, can you show the function? I hate to keep beating this drum, but the instructions in the post are really important – people need to see the code in order to understand what’s happening. Thanks!

          • ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
            GO

            CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
            RETURNS BIT
            WITH SCHEMABINDING
            AS
            BEGIN
            DECLARE @Result BIT = 0

            — Regex pattern for a valid URL
            — This pattern covers:
            — – Scheme (http, https, ftp)
            — – Optional username:password
            — – Domain name or IP address
            — – Optional port
            — – Optional path
            — – Optional query string
            — – Optional fragment

            IF @Url LIKE ‘http://%’ OR @Url LIKE ‘https://%’ OR @Url LIKE ‘ftp://%’
            BEGIN
            IF @Url LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
            AND @Url NOT LIKE ‘% %’ — No spaces allowed in URL
            AND @Url LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
            AND @Url LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
            AND @Url LIKE ‘%[A-Za-z0-9/_-]%’ — Ensure there’s at least one valid character in the path
            BEGIN
            SET @Result = 1
            END
            END

            RETURN @Result
            END
            GO

            ALTER TABLE dbo.Users ADD IsValidUrl
            AS dbo.IsValidUrl(WebsiteUrl);

            GO

            –CREATE INDEX Reputation ON dbo.Users(Reputation);

            SET STATISTICS TIME, IO ON;
            GO
            SELECT TOP 200 *
            FROM dbo.Users
            WHERE IsValidUrl = 1
            ORDER BY Reputation DESC;

          • Ag – I left instructions in the post on how to share your code via Github Gists. WordPress comments don’t work for code because it gets mangled. Sorry about that!

          • If we add “with schemabinding,” the function becomes deterministic so that SQL server can maintain statistics for this column. Therefore, the plan will be different.
            The query still can’t run in parallel.

          • Perfect, that works! Here’s the execution plan: https://www.brentozar.com/pastetheplan/?id=rJMMWY8_A

            Works great, runs in milliseconds. It’s totally okay that it can’t go parallel – there’s so little work involved. Good job!

          • It took some time, but I finally posted the answer. I appreciate your patience!

  • schemabind the function then you can persist it to the table.

    CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
    RETURNS BIT WITH schemabinding

    Reply
  • Carlos Benito
    July 17, 2024 7:46 pm

    A lot of work to save those developers.

    — Drop Column
    ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

    — Add Column Back as BIT
    ALTER TABLE dbo.Users ADD IsValidUrl BIT;

    — Create an Insert and Update Triggers
    CREATE TRIGGER [dbo].[TRI_IsValidURL]
    ON [dbo].[Users]
    AFTER INSERT
    AS
    BEGIN
    SET NOCOUNT ON
    UPDATE U
    SET IsValidUrl = dbo.IsValidUrl(U.WebsiteUrl)
    FROM Users U
    JOIN inserted I ON I.id = U.id
    END

    CREATE TRIGGER [dbo].[TRU_IsValidURL]
    ON [dbo].[Users]
    AFTER UPDATE
    AS
    BEGIN
    SET NOCOUNT ON
    UPDATE U
    SET IsValidUrl = dbo.IsValidUrl(U.WebsiteUrl)
    FROM Users U
    JOIN deleted D ON D.id = U.id
    WHERE COALESCE(U.WebsiteUrl, ”) COALESCE(D.WebsiteUrl, ”)
    END

    — Update IsValidUrl Column
    WHILE 1 = 1
    BEGIN
    UPDATE TOP (100000)
    dbo.Users WITH (HOLDLOCK)
    SET IsValidUrl = dbo.IsValidUrl(WebsiteUrl)
    WHERE IsValidUrl is null
    IF @@ROWCOUNT = 0
    BREAK
    END


    SELECT TOP 200 *
    FROM dbo.Users
    WHERE IsValidUrl = 1
    ORDER BY Reputation DESC;

    Reply
  • Michael McCormick
    July 17, 2024 7:49 pm

    I have a crusty old version of StackOverflow2010 that was still at compatibility level 100 (SQL Server 2008). The query ran almost instantly. Bumping up to 160 dragged it down to 3 seconds. Now I have something to work with.

    Reply
  • Here’s what I did:

    1.Find the query id of the problematic query.
    2.Apply query store hint ‘OPTION(USE HINT(”FORCE_LEGACY_CARDINALITY_ESTIMATION”))’ to get this running in less than 1 second.
    3.Work on making code changes to the function for a permanent fix.

    https://www.brentozar.com/pastetheplan/?id=BkxVs-UuA

    SELECT query_hint_id,
    query_id,
    query_hint_text,
    last_query_hint_failure_reason,
    last_query_hint_failure_reason_desc,
    query_hint_failure_count,
    source,
    source_desc
    FROM sys.query_store_query_hints;
    GO

    EXEC sys.sp_query_store_set_hints @query_id= 3928, @query_hints = N’OPTION(USE HINT(”FORCE_LEGACY_CARDINALITY_ESTIMATION”))’;

    Reply
  • Anders Sorensen
    July 18, 2024 10:39 am

    Rewrite the index:

    CREATE INDEX [Reputation] ON [dbo].[Users]
    (
    [Reputation] DESC
    )
    INCLUDE([AboutMe],[Age],[CreationDate],[DisplayName],[DownVotes],[EmailHash],[LastAccessDate],[Location],[UpVotes],[Views],[WebsiteUrl],[AccountId]) WITH (DROP_EXISTING = ON)

    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 1 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 30 ms.

    (200 rows affected)
    Table ‘Users’. Scan count 1, logical reads 74, physical reads 0, page server reads 0, read-ahead reads 77, page server read-ahead reads 0, lob logical reads 8, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    (1 row affected)

    SQL Server Execution Times:
    CPU time = 31 ms, elapsed time = 568 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    Reply
  • Brian Boodman
    July 18, 2024 1:57 pm

    I tried following the steps on this post, but I somehow didn’t actually encounter these issues. Did I cheat by deciding not to give SQL “it’s best shot”? I used SQL 2019 with compatibility level 100, with the “Stack Overflow SQL Server Database – Mini 2010 Version.” I’d pulled from for some other blog post.

    With that setup, running it as-is only took 165ms. My copy of the DB did NOT ignore the reputation column, thus it only actually read 291 rows of the reputation.

    I don’t have anything to post a Gist, since I didn’t write any queries: I just ran the this post as-is on the copy of the DB I already had lying around. In leiu of a Gist, here’s my query plan: https://www.brentozar.com/pastetheplan/?id=rJ3icqIOA .

    Reply
  • Why run code when you don’t have to?
    So my fix is to remove the computed column and make the IsValidUrl a true Bit columns and use INSERT and UPDATE TRIGGERs to see if the IsValidUrl needs updating.

    You could leave the FUNCTION in or move the URL validation code to the TRIGGER.
    I moved the code to trigger, again why make SQL Server’s harder then needed
    https://gist.github.com/Montro1981/f09b611a0264ce434ec3b72f43b2d008

    Reply
  • So am I right that we can change the code of the function, but not the select query?

    Reply
  • Gary Martin
    July 20, 2024 5:52 am

    create a filtered index where isvalidurl = 1

    Reply
  • If the reason that Parallelism is not available for a function, it would be too obvious to move to a trigger which fires only in case WebsiteUrl is not null or an empty string…Brent is def hiding something precious from us…maybe we could have the computed column created and then execute a stored procedure that populates it? No, it couldn’t be this.

    Got one: Hey, what if we use Edge’s Copilot instead of ChatGPT, and get this regex for validating the URLS:
    ((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w#-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)

    (just kidding, I got it off stack overflow, oh the irony)

    Got two: What if we create the column first as a BIT, then create the function, would that change anything?

    Got three: Can we use a temp table inside of a function? That could better things up, or worsen them, who knows. Not me.

    Got four: Create the computed column then create an index for it. Once the computation happens, with the select we should see a seek happening. Hope so!

    Reply
    • Got zero: I suggest you read the last paragraph of Brent’s post.

      Got three: Nope, no data manipulations in tables (temp or otherwise) in functions are allowed.

      Reply
      • @Montro1981, does your “Got zero” relate to the intro of my comment? If that’s the case, I mention using a trigger, the same as your solution (which I found out about later), so what’s your problem with it?

        Just so you know, people who start sentences with “I suggest” are never fun at parties.

        Reply
      • @TY, I’m sorry if you feel offended by my comment on your post. My intention was to make a lighthearted joke.

        I’ll explain further , because you started making a list of things with got one, two etc and you didn’t seemed to not follow the “rules” Brent laid down in the last paragraph of his post I thought it was funny to make a “got zero” and make a friendly suggestion, as I can’t tell you want you should and should not do.
        As well as point out to you that your question in your “got three” is not possible.

        Reply
  • […] last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or […]

    Reply
  • […] latest instance involved the recent Query Exercise where you were challenged to fix a computed column’s performance. In the comments, some folks noted that performance of the query was actually great on old compat […]

    Reply
  • Wojciech Sawicki
    August 8, 2024 8:44 am

    –Why was @Url NVARCHAR(max)
    –Column WebsiteUrl is nvarchar(200)
    –as result got cardinality estimate warning in plan

    CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(200))
    RETURNS BIT
    ———–IT makes different
    with schemabinding
    AS
    BEGIN
    DECLARE @Result BIT = 0

    — Regex pattern for a valid URL
    — This pattern covers:
    — – Scheme (http, https, ftp)
    — – Optional username:password
    — – Domain name or IP address
    — – Optional port
    — – Optional path
    — – Optional query string
    — – Optional fragment

    IF @Url LIKE ‘http://%’ OR @Url LIKE ‘https://%’ OR @Url LIKE ‘ftp://%’
    BEGIN
    IF @Url LIKE ‘%://[A-Za-z0-9.-]%.%’ — Check for domain/IP after scheme
    AND @Url NOT LIKE ‘% %’ — No spaces allowed in URL
    AND @Url LIKE ‘%.[A-Za-z]%’ — Ensure there’s a period in domain/IP part
    AND @Url LIKE ‘%/%’ — Ensure there’s at least one slash after the domain
    AND @Url LIKE ‘%[A-Za-z0-9/_-]%’ — Ensure there’s at least one valid character in the path
    BEGIN
    SET @Result = 1
    END
    END

    RETURN @Result
    END
    GO

    —And second changes

    ALTER TABLE dbo.Users ADD IsValidUrl
    AS dbo.IsValidUrl(WebsiteUrl) persisted not null
    GO

    –And thats all folks time=0ms

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.