Can AI Rewrite Bad Queries in Seconds? Kinda.

Development
12 Comments

When I see horrific code, stuff that would take a lot of manual labor to fix, I like to lob it over to an LLM like ChatGPT just to see how it does.

For example, on a call with a client, I opened up one of their slowest pieces of code to find that it was running a cursor, processing data row by row rather than working in sets. To simulate it, I’ve written up a few stored procedures against the Stack Overflow database:

  • Posts_Update_AnswerCount takes a single PostId, and updates the AnswerCount of that row
  • Posts_Update_CommentCount does the same thing, but for a different column
  • Post_Update_AllCounts calls both of the above for a list of PostIds, looping through them one post at a time

When I look at that code, sure, conceptually I know what work has to be done – but it’s stupid manual labor. So I copied that code into a text file and asked ChatGPT, “Please rewrite this Microsoft SQL Server code into a more performant version.” That’s about as simple of a prompt as you can get! 

Here’s the code ChatGPT 4o came up with in the first pass, and you can read its full response to see its justifications:

Is it better than what we started with? Yes. Is it good enough that I wanna put it into production? No, because it’s still making two separate passes over the exact same rows in the Posts table, updating them one column at a time.

I replied to ChatGPT with, “Can you make it even faster and more efficient?” The result:

Now we’re getting somewhere! I love that it used CTEs with null protection, too, to handle the case where a post might have no answers or comments. I gave it one last shot by asking, “Could you make it faster, simpler, and easier to read?” but it reverted back to the two-update solution.

I then passed the same code and same prompts over to Google’s Gemini 2.0 Flash Thinking Experimental model. Its first result was similar to ChatGPT’s in that it had two separate update statements, but after prompting again to make it faster and more efficient, it came to this conclusion:

Instead of using a pair of CTEs like ChatGPT 4o, Gemini Flash 2.0 Thinking decided to use a pair of left join subqueries. I’m amused that both platforms chose to dump the contents of string_split into a table variable rather than using a temp table or using the string_split directly in the update’s where clause.

In both cases, the LLM (ChatGPT or Gemini) is able to change the query, but it may take multiple prompts to get to the best result – and the results don’t necessarily get better with every prompt. Today, these tools work best in the hands of someone who actually knows the results that they’re looking for, like you, dear reader.

That’s not to say you shouldn’t use them! This example here represents a really good starting point use case. In the client’s example (and this has happened several times), the LLM was able to give me a good-enough starting point in seconds, saving me a lot of time with manual labor on the rewrite. I still needed to do manual labor – I just had to do less, and that’s a win, because, uh, I’m lazy.

Previous Post
Pop Quiz, 2025 Edition: What Do These Things Cost Today?
Next Post
[Video] Office Hours in My Vegas Backyard

12 Comments. Leave new

  • Joshua Grippo
    January 22, 2025 5:11 pm

    “amused that both platforms chose to dump the contents of string_split into a table variable rather than using a temp table”

    Does amused mean it was cheeky and found a good solution that is against common advise or does it mean it should have used the temp table?

    Reply
  • Being lazy is, in my opinion, the most important attribute of a good DBA. If you’re running something by hand that you can automate, you’re not lazy enough. If there’s a clearly defined troubleshooting path for a common issue and you don’t have it in a script, you’re not lazy enough.

    Reply
  • I’ve used it for debugging silly syntax errors and for checking if a statement is portable to another database platform. It’s not bad. It’s kind of like the old 4GLs like Focus and other natural languages in the (distant) past. It’s perfectly fine of you don’t ask it to do too much.

    Reply
  • Cliff Williams
    January 22, 2025 5:50 pm

    I actually have been involved in doing this exact same thing. I’ve started a project to fix some of out ETL databases that have not been touched in over 12 years. They are a god awful mess, written by a Lotus Notes guy with no SQL experience at all.

    It just so happens that I started this project the day after I finished “Fundamentals of Query Tuning” and 2 weeks after completing “Fundamentals of Index Tuning”. Which were a great help and doing this and really enforced the training from these courses.

    I’ve been using AWS Claud and ChatCPT and I have to say Claud has given me the better results, not perfect all the time, because LLMs are never perfect, but results that were more accurate for the task. (IMO)

    Here is Clauds result after 2 tweeks of the 1st response.

    CREATE OR ALTER PROC dbo.Posts_Update_Counts
    @PostIds VARCHAR(MAX) = NULL,
    @SinglePostId INT = NULL
    AS
    BEGIN
    SET NOCOUNT ON;

    WITH PostIdsCTE AS (
    SELECT DISTINCT CAST(value AS INT) AS PostId
    FROM STRING_SPLIT(ISNULL(@PostIds, CAST(@SinglePostId AS VARCHAR(20))), ‘,’)
    WHERE value IS NOT NULL
    ),
    AnswerCounts AS (
    SELECT
    ParentId,
    COUNT(*) OVER(PARTITION BY ParentId) as answer_count
    FROM dbo.Posts WITH (NOLOCK)
    WHERE ParentId IN (SELECT PostId FROM PostIdsCTE)
    ),
    CommentCounts AS (
    SELECT
    PostId,
    COUNT(*) OVER(PARTITION BY PostId) as comment_count
    FROM dbo.Comments WITH (NOLOCK)
    WHERE PostId IN (SELECT PostId FROM PostIdsCTE)
    )
    UPDATE p WITH (ROWLOCK)
    SET
    AnswerCount = ISNULL(a.answer_count, 0),
    CommentCount = ISNULL(c.comment_count, 0)
    FROM dbo.Posts p
    INNER JOIN PostIdsCTE pid ON p.Id = pid.PostId
    LEFT JOIN (
    SELECT DISTINCT ParentId, answer_count
    FROM AnswerCounts
    ) a ON p.Id = a.ParentId
    LEFT JOIN (
    SELECT DISTINCT PostId, comment_count
    FROM CommentCounts
    ) c ON p.Id = c.PostId
    OPTION (RECOMPILE);
    END;
    GO

    Reply
  • Just for comparison sake, I asked Copilot to do the same thing and here is the results:

    CREATE OR ALTER PROC dbo.Posts_Update_AllCounts
    @PostIds VARCHAR(MAX)
    AS
    BEGIN
    — Convert the comma-separated string into a table
    DECLARE @PostIdTable TABLE (PostId INT);
    INSERT INTO @PostIdTable (PostId)
    SELECT CAST(value AS INT)
    FROM string_split(@PostIds, ‘,’);

    — Update AnswerCount and CommentCount for all specified posts
    MERGE dbo.Posts AS p
    USING (
    SELECT t.PostId,
    ISNULL(a.AnswerCount, 0) AS AnswerCount,
    ISNULL(c.CommentCount, 0) AS CommentCount
    FROM @PostIdTable t
    LEFT JOIN (
    SELECT pA.ParentId, COUNT(*) AS AnswerCount
    FROM dbo.Posts pA
    GROUP BY pA.ParentId
    ) a ON t.PostId = a.ParentId
    LEFT JOIN (
    SELECT c.PostId, COUNT(*) AS CommentCount
    FROM dbo.Comments c
    GROUP BY c.PostId
    ) c ON t.PostId = c.PostId
    ) AS src
    ON p.Id = src.PostId
    WHEN MATCHED THEN
    UPDATE SET
    p.AnswerCount = src.AnswerCount,
    p.CommentCount = src.CommentCount;
    END;
    GO

    First pass.

    Reply
  • Alvaro Costa-Neto
    January 23, 2025 12:06 am

    Hey, brent I just used the same code from your blog and asked Amazon Q, and here’s the version it produce in the first round. Looks like another good variation from the other LLMs, and it uses CROSS APPLY instead of LEFT JOINs:

    CREATE OR ALTER PROCEDURE dbo.Post_Update_AllCounts
    @PostIds VARCHAR(MAX)
    AS
    BEGIN
    UPDATE p
    SET p.AnswerCount = (
    SELECT COUNT(*)
    FROM dbo.Posts pA
    WHERE pA.ParentId = p.Id
    ),
    p.CommentCount = (
    SELECT COUNT(*)
    FROM dbo.Comments c
    WHERE c.PostId = p.Id
    )
    FROM dbo.Posts p
    CROSS APPLY STRING_SPLIT(@PostIds, ‘,’) AS ids
    WHERE p.Id = CAST(ids.value AS INT);
    END
    GO

    Reply
  • Hi Brent and thanks for the post!

    My personal experience is that AI can do a surprisingly good job, as long as you give it enough instructions. I’ve been testing various services quite a lot in the past months, and it has taught me a lot about being patient and clear with the AI “helpers”. In that sense, I guess they’re quite similar to people 🙂

    So, to me, it’s been a journey of finding the correct balance of lazy. Most of the time I would rather not re-write all of that T-SQL, but these days I put effort writing the prompt (which would still be less effort than re-writing all the T-SQL). If you specify what the query should look like, what parts you need optimized, and what it should use (temp tables, CTEs, etc.), you’ll get much better code out of IT.

    And that goes for follow-up prompts as well. If you just ask it to “make it faster”, or “easier to read”, and keep asking, it’ll eventually go completely crazy and give you just complete garbage. I’ve tested this, it goes bad surprisingly quickly! But if you are specific “Make this part of the code more efficient by doing X”, that’s where it can make your work much faster.

    Reply
  • […] Can AI Rewrite Bad Queries in Seconds? Kinda. (Brent Ozar) […]

    Reply
  • Robert Sterbal
    January 23, 2025 10:19 pm

    Just curious if paying for LLM service subscriptions is part of process here. Do you have a sense as to which ones make sense to pay or ask my employer to pay for?

    Reply
  • I find many of these not fully correct as when using set var1 to isnull(count1, 0), you are setting it to zero because currently there are no count. However, prior to your update there could have been an existing count, and you have now set it to zero because you currently have no count. If you used set var1 to isnull(value, var1), then you preserve the prior value when there is no current count. I suspect one cannot blame the LLM/AI as the issue is present in the initial query given to it.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.