Can AI Rewrite Bad Queries in Seconds? Kinda.

Last Updated January 14, 2025

When I see horrific code, stuff that would take a lot of manual labor to fix, I like to lob it over to an LLM like ChatGPT just to see how it does.

For example, on a call with a client, I opened up one of their slowest pieces of code to find that it was running a cursor, processing data row by row rather than working in sets. To simulate it, I’ve written up a few stored procedures against the Stack Overflow database:

Posts_Update_AnswerCount takes a single PostId, and updates the AnswerCount of that row
Posts_Update_CommentCount does the same thing, but for a different column
Post_Update_AllCounts calls both of the above for a list of PostIds, looping through them one post at a time

CREATE OR ALTER PROC dbo.Posts_Update_AnswerCount
    @PostId INT AS
BEGIN
    UPDATE dbo.Posts
        SET AnswerCount = 
            (SELECT SUM(1) FROM dbo.Posts pA
                WHERE pA.ParentId = @PostId)
        WHERE Id = @PostId;
END
GO

CREATE OR ALTER PROC dbo.Posts_Update_CommentCount
    @PostId INT AS
BEGIN
    UPDATE dbo.Posts
        SET CommentCount = 
            (SELECT SUM(1) FROM dbo.Comments c
                WHERE PostId = @PostId)
        WHERE Id = @PostId;
END
GO

CREATE OR ALTER PROC dbo.Post_Update_AllCounts
    @PostIds VARCHAR(MAX) AS
BEGIN
    DECLARE @Current_PostId INT;
    DECLARE postids_cursor CURSOR FOR
        SELECT value
        FROM string_split(@PostIds, ',');

    OPEN postids_cursor;

    FETCH NEXT FROM postids_cursor INTO @Current_PostId;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        EXEC dbo.Posts_Update_AnswerCount @Current_PostId;
        EXEC dbo.Posts_Update_CommentCount @Current_PostId;
        FETCH NEXT FROM postids_cursor INTO @Current_PostId;
    END
    CLOSE postids_cursor;
    DEALLOCATE postids_cursor;
END
GO

CREATE OR ALTER PROC dbo.Posts_Update_AnswerCount

@PostId INT AS

BEGIN

UPDATE dbo.Posts

SET AnswerCount =

(SELECT SUM(1) FROM dbo.Posts pA

WHERE pA.ParentId = @PostId)

WHERE Id = @PostId;

END

CREATE OR ALTER PROC dbo.Posts_Update_CommentCount

@PostId INT AS

BEGIN

UPDATE dbo.Posts

SET CommentCount =

(SELECT SUM(1) FROM dbo.Comments c

WHERE PostId = @PostId)

WHERE Id = @PostId;

END

CREATE OR ALTER PROC dbo.Post_Update_AllCounts

@PostIds VARCHAR(MAX) AS

BEGIN

DECLARE @Current_PostId INT;

DECLARE postids_cursor CURSOR FOR

SELECT value

FROM string_split(@PostIds, ',');

OPEN postids_cursor;

FETCH NEXT FROM postids_cursor INTO @Current_PostId;

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC dbo.Posts_Update_AnswerCount @Current_PostId;

EXEC dbo.Posts_Update_CommentCount @Current_PostId;

FETCH NEXT FROM postids_cursor INTO @Current_PostId;

END

CLOSE postids_cursor;

DEALLOCATE postids_cursor;

END

When I look at that code, sure, conceptually I know what work has to be done – but it’s stupid manual labor. So I copied that code into a text file and asked ChatGPT, “Please rewrite this Microsoft SQL Server code into a more performant version.” That’s about as simple of a prompt as you can get!

Here’s the code ChatGPT 4o came up with in the first pass, and you can read its full response to see its justifications:

CREATE OR ALTER PROC dbo.Posts_Update_AllCounts
    @PostIds VARCHAR(MAX)
AS
BEGIN
    -- Convert the comma-separated string into a table
    DECLARE @PostIdTable TABLE (PostId INT);
    INSERT INTO @PostIdTable (PostId)
    SELECT CAST(value AS INT)
    FROM string_split(@PostIds, ',');

    -- Update AnswerCount for all specified posts
    UPDATE p
    SET AnswerCount = sub.AnswerCount
    FROM dbo.Posts p
    INNER JOIN (
        SELECT pA.ParentId, COUNT(*) AS AnswerCount
        FROM dbo.Posts pA
        INNER JOIN @PostIdTable t ON pA.ParentId = t.PostId
        GROUP BY pA.ParentId
    ) sub ON p.Id = sub.ParentId;

    -- Update CommentCount for all specified posts
    UPDATE p
    SET CommentCount = sub.CommentCount
    FROM dbo.Posts p
    INNER JOIN (
        SELECT c.PostId, COUNT(*) AS CommentCount
        FROM dbo.Comments c
        INNER JOIN @PostIdTable t ON c.PostId = t.PostId
        GROUP BY c.PostId
    ) sub ON p.Id = sub.PostId;
END;
GO

CREATE OR ALTER PROC dbo.Posts_Update_AllCounts

@PostIds VARCHAR(MAX)

BEGIN

-- Convert the comma-separated string into a table

DECLARE @PostIdTable TABLE (PostId INT);

INSERT INTO @PostIdTable (PostId)

SELECT CAST(value AS INT)

FROM string_split(@PostIds, ',');

-- Update AnswerCount for all specified posts

UPDATE p

SET AnswerCount = sub.AnswerCount

FROM dbo.Posts p

INNER JOIN (

SELECT pA.ParentId, COUNT(*) AS AnswerCount

FROM dbo.Posts pA

INNER JOIN @PostIdTable t ON pA.ParentId = t.PostId

GROUP BY pA.ParentId

) sub ON p.Id = sub.ParentId;

-- Update CommentCount for all specified posts

UPDATE p

SET CommentCount = sub.CommentCount

FROM dbo.Posts p

INNER JOIN (

SELECT c.PostId, COUNT(*) AS CommentCount

FROM dbo.Comments c

INNER JOIN @PostIdTable t ON c.PostId = t.PostId

GROUP BY c.PostId

) sub ON p.Id = sub.PostId;

END;

Is it better than what we started with? Yes. Is it good enough that I wanna put it into production? No, because it’s still making two separate passes over the exact same rows in the Posts table, updating them one column at a time.

I replied to ChatGPT with, “Can you make it even faster and more efficient?” The result:

CREATE OR ALTER PROC dbo.Posts_Update_AllCounts
    @PostIds VARCHAR(MAX)
AS
BEGIN
    -- Parse the comma-separated PostIds into a table variable
    DECLARE @PostIdTable TABLE (PostId INT PRIMARY KEY);
    INSERT INTO @PostIdTable (PostId)
    SELECT DISTINCT CAST(value AS INT)
    FROM string_split(@PostIds, ',');

    -- Use a common table expression (CTE) for answer and comment counts
    WITH AnswerCounts AS (
        SELECT pA.ParentId AS PostId, COUNT(*) AS AnswerCount
        FROM dbo.Posts pA
        WHERE pA.ParentId IN (SELECT PostId FROM @PostIdTable)
        GROUP BY pA.ParentId
    ),
    CommentCounts AS (
        SELECT c.PostId, COUNT(*) AS CommentCount
        FROM dbo.Comments c
        WHERE c.PostId IN (SELECT PostId FROM @PostIdTable)
        GROUP BY c.PostId
    )
    -- Perform updates in a single pass
    UPDATE p
    SET 
        AnswerCount = ISNULL(ac.AnswerCount, 0),
        CommentCount = ISNULL(cc.CommentCount, 0)
    FROM dbo.Posts p
    LEFT JOIN AnswerCounts ac ON p.Id = ac.PostId
    LEFT JOIN CommentCounts cc ON p.Id = cc.PostId
    WHERE p.Id IN (SELECT PostId FROM @PostIdTable);
END;
GO

CREATE OR ALTER PROC dbo.Posts_Update_AllCounts

@PostIds VARCHAR(MAX)

BEGIN

-- Parse the comma-separated PostIds into a table variable

DECLARE @PostIdTable TABLE (PostId INT PRIMARY KEY);

INSERT INTO @PostIdTable (PostId)

SELECT DISTINCT CAST(value AS INT)

FROM string_split(@PostIds, ',');

-- Use a common table expression (CTE) for answer and comment counts

WITH AnswerCounts AS (

SELECT pA.ParentId AS PostId, COUNT(*) AS AnswerCount

FROM dbo.Posts pA

WHERE pA.ParentId IN (SELECT PostId FROM @PostIdTable)

GROUP BY pA.ParentId

CommentCounts AS (

SELECT c.PostId, COUNT(*) AS CommentCount

FROM dbo.Comments c

WHERE c.PostId IN (SELECT PostId FROM @PostIdTable)

GROUP BY c.PostId

)

-- Perform updates in a single pass

UPDATE p

SET

AnswerCount = ISNULL(ac.AnswerCount, 0),

CommentCount = ISNULL(cc.CommentCount, 0)

FROM dbo.Posts p

LEFT JOIN AnswerCounts ac ON p.Id = ac.PostId

LEFT JOIN CommentCounts cc ON p.Id = cc.PostId

WHERE p.Id IN (SELECT PostId FROM @PostIdTable);

END;

Now we’re getting somewhere! I love that it used CTEs with null protection, too, to handle the case where a post might have no answers or comments. I gave it one last shot by asking, “Could you make it faster, simpler, and easier to read?” but it reverted back to the two-update solution.

I then passed the same code and same prompts over to Google’s Gemini 2.0 Flash Thinking Experimental model. Its first result was similar to ChatGPT’s in that it had two separate update statements, but after prompting again to make it faster and more efficient, it came to this conclusion:

CREATE OR ALTER PROC dbo.Post_Update_AllCounts
    @PostIds VARCHAR(MAX)
AS
BEGIN
    -- Using a table variable for better performance than a cursor
    DECLARE @PostIdTable TABLE (PostId INT PRIMARY KEY);

    INSERT INTO @PostIdTable (PostId)
    SELECT value
    FROM STRING_SPLIT(@PostIds, ',');

    -- Update AnswerCounts and CommentCounts in a single, highly efficient operation
    UPDATE p
    SET
        AnswerCount = ISNULL(ac.AnswerCount, 0),
        CommentCount = ISNULL(cc.CommentCount, 0)
    FROM dbo.Posts p
    INNER JOIN @PostIdTable pt ON p.Id = pt.PostId
    LEFT JOIN (
        SELECT ParentId, COUNT(*) AS AnswerCount
        FROM dbo.Posts
        WHERE ParentId IN (SELECT PostId FROM @PostIdTable)
        GROUP BY ParentId
    ) AS ac ON p.Id = ac.ParentId
    LEFT JOIN (
        SELECT PostId, COUNT(*) AS CommentCount
        FROM dbo.Comments
        WHERE PostId IN (SELECT PostId FROM @PostIdTable)
        GROUP BY PostId
    ) AS cc ON p.Id = cc.PostId;
END
GO

CREATE OR ALTER PROC dbo.Post_Update_AllCounts

@PostIds VARCHAR(MAX)

BEGIN

-- Using a table variable for better performance than a cursor

DECLARE @PostIdTable TABLE (PostId INT PRIMARY KEY);

INSERT INTO @PostIdTable (PostId)

SELECT value

FROM STRING_SPLIT(@PostIds, ',');

-- Update AnswerCounts and CommentCounts in a single, highly efficient operation

UPDATE p

SET

AnswerCount = ISNULL(ac.AnswerCount, 0),

CommentCount = ISNULL(cc.CommentCount, 0)

FROM dbo.Posts p

INNER JOIN @PostIdTable pt ON p.Id = pt.PostId

LEFT JOIN (

SELECT ParentId, COUNT(*) AS AnswerCount

FROM dbo.Posts

WHERE ParentId IN (SELECT PostId FROM @PostIdTable)

GROUP BY ParentId

) AS ac ON p.Id = ac.ParentId

LEFT JOIN (

SELECT PostId, COUNT(*) AS CommentCount

FROM dbo.Comments

WHERE PostId IN (SELECT PostId FROM @PostIdTable)

GROUP BY PostId

) AS cc ON p.Id = cc.PostId;

END

Instead of using a pair of CTEs like ChatGPT 4o, Gemini Flash 2.0 Thinking decided to use a pair of left join subqueries. I’m amused that both platforms chose to dump the contents of string_split into a table variable rather than using a temp table or using the string_split directly in the update’s where clause.

In both cases, the LLM (ChatGPT or Gemini) is able to change the query, but it may take multiple prompts to get to the best result – and the results don’t necessarily get better with every prompt. Today, these tools work best in the hands of someone who actually knows the results that they’re looking for, like you, dear reader.

That’s not to say you shouldn’t use them! This example here represents a really good starting point use case. In the client’s example (and this has happened several times), the LLM was able to give me a good-enough starting point in seconds, saving me a lot of time with manual labor on the rewrite. I still needed to do manual labor – I just had to do less, and that’s a win, because, uh, I’m lazy.

Pop Quiz, 2025 Edition: What Do These Things Cost Today?

[Video] Office Hours in My Vegas Backyard

12 Comments. Leave new

Joshua Grippo
January 22, 2025 5:11 pm

“amused that both platforms chose to dump the contents of string_split into a table variable rather than using a temp table”

Does amused mean it was cheeky and found a good solution that is against common advise or does it mean it should have used the temp table?

Reply
- Brent Ozar
  January 22, 2025 5:12 pm
  
  Check out my Fundamentals of Query Tuning class to find out! WINK WINK
  
  Reply
  - Joshua Grippo
    January 22, 2025 5:13 pm
    
    I knew that was coming 🙂
    
    Reply
TRF
January 22, 2025 5:22 pm

Being lazy is, in my opinion, the most important attribute of a good DBA. If you’re running something by hand that you can automate, you’re not lazy enough. If there’s a clearly defined troubleshooting path for a common issue and you don’t have it in a script, you’re not lazy enough.

Reply
Kevin Fries
January 22, 2025 5:33 pm

I’ve used it for debugging silly syntax errors and for checking if a statement is portable to another database platform. It’s not bad. It’s kind of like the old 4GLs like Focus and other natural languages in the (distant) past. It’s perfectly fine of you don’t ask it to do too much.

Reply
Cliff Williams
January 22, 2025 5:50 pm

I actually have been involved in doing this exact same thing. I’ve started a project to fix some of out ETL databases that have not been touched in over 12 years. They are a god awful mess, written by a Lotus Notes guy with no SQL experience at all.

It just so happens that I started this project the day after I finished “Fundamentals of Query Tuning” and 2 weeks after completing “Fundamentals of Index Tuning”. Which were a great help and doing this and really enforced the training from these courses.

I’ve been using AWS Claud and ChatCPT and I have to say Claud has given me the better results, not perfect all the time, because LLMs are never perfect, but results that were more accurate for the task. (IMO)

Here is Clauds result after 2 tweeks of the 1st response.

CREATE OR ALTER PROC dbo.Posts_Update_Counts
@PostIds VARCHAR(MAX) = NULL,
@SinglePostId INT = NULL
AS
BEGIN
SET NOCOUNT ON;

WITH PostIdsCTE AS (
SELECT DISTINCT CAST(value AS INT) AS PostId
FROM STRING_SPLIT(ISNULL(@PostIds, CAST(@SinglePostId AS VARCHAR(20))), ‘,’)
WHERE value IS NOT NULL
),
AnswerCounts AS (
SELECT
ParentId,
COUNT(*) OVER(PARTITION BY ParentId) as answer_count
FROM dbo.Posts WITH (NOLOCK)
WHERE ParentId IN (SELECT PostId FROM PostIdsCTE)
),
CommentCounts AS (
SELECT
PostId,
COUNT(*) OVER(PARTITION BY PostId) as comment_count
FROM dbo.Comments WITH (NOLOCK)
WHERE PostId IN (SELECT PostId FROM PostIdsCTE)
)
UPDATE p WITH (ROWLOCK)
SET
AnswerCount = ISNULL(a.answer_count, 0),
CommentCount = ISNULL(c.comment_count, 0)
FROM dbo.Posts p
INNER JOIN PostIdsCTE pid ON p.Id = pid.PostId
LEFT JOIN (
SELECT DISTINCT ParentId, answer_count
FROM AnswerCounts
) a ON p.Id = a.ParentId
LEFT JOIN (
SELECT DISTINCT PostId, comment_count
FROM CommentCounts
) c ON p.Id = c.PostId
OPTION (RECOMPILE);
END;
GO

Reply
Rich
January 22, 2025 8:32 pm

Just for comparison sake, I asked Copilot to do the same thing and here is the results:

CREATE OR ALTER PROC dbo.Posts_Update_AllCounts
@PostIds VARCHAR(MAX)
AS
BEGIN
— Convert the comma-separated string into a table
DECLARE @PostIdTable TABLE (PostId INT);
INSERT INTO @PostIdTable (PostId)
SELECT CAST(value AS INT)
FROM string_split(@PostIds, ‘,’);

— Update AnswerCount and CommentCount for all specified posts
MERGE dbo.Posts AS p
USING (
SELECT t.PostId,
ISNULL(a.AnswerCount, 0) AS AnswerCount,
ISNULL(c.CommentCount, 0) AS CommentCount
FROM @PostIdTable t
LEFT JOIN (
SELECT pA.ParentId, COUNT(*) AS AnswerCount
FROM dbo.Posts pA
GROUP BY pA.ParentId
) a ON t.PostId = a.ParentId
LEFT JOIN (
SELECT c.PostId, COUNT(*) AS CommentCount
FROM dbo.Comments c
GROUP BY c.PostId
) c ON t.PostId = c.PostId
) AS src
ON p.Id = src.PostId
WHEN MATCHED THEN
UPDATE SET
p.AnswerCount = src.AnswerCount,
p.CommentCount = src.CommentCount;
END;
GO

First pass.

Reply
Alvaro Costa-Neto
January 23, 2025 12:06 am

Hey, brent I just used the same code from your blog and asked Amazon Q, and here’s the version it produce in the first round. Looks like another good variation from the other LLMs, and it uses CROSS APPLY instead of LEFT JOINs:

CREATE OR ALTER PROCEDURE dbo.Post_Update_AllCounts
@PostIds VARCHAR(MAX)
AS
BEGIN
UPDATE p
SET p.AnswerCount = (
SELECT COUNT(*)
FROM dbo.Posts pA
WHERE pA.ParentId = p.Id
),
p.CommentCount = (
SELECT COUNT(*)
FROM dbo.Comments c
WHERE c.PostId = p.Id
)
FROM dbo.Posts p
CROSS APPLY STRING_SPLIT(@PostIds, ‘,’) AS ids
WHERE p.Id = CAST(ids.value AS INT);
END
GO

Reply
Mika Sutinen
January 23, 2025 7:58 am

Hi Brent and thanks for the post!

My personal experience is that AI can do a surprisingly good job, as long as you give it enough instructions. I’ve been testing various services quite a lot in the past months, and it has taught me a lot about being patient and clear with the AI “helpers”. In that sense, I guess they’re quite similar to people 🙂

So, to me, it’s been a journey of finding the correct balance of lazy. Most of the time I would rather not re-write all of that T-SQL, but these days I put effort writing the prompt (which would still be less effort than re-writing all the T-SQL). If you specify what the query should look like, what parts you need optimized, and what it should use (temp tables, CTEs, etc.), you’ll get much better code out of IT.

And that goes for follow-up prompts as well. If you just ask it to “make it faster”, or “easier to read”, and keep asking, it’ll eventually go completely crazy and give you just complete garbage. I’ve tested this, it goes bad surprisingly quickly! But if you are specific “Make this part of the code more efficient by doing X”, that’s where it can make your work much faster.

Reply
Dew Drop – January 23, 2025 (#4348) – Morning Dew by Alvin Ashcraft
January 23, 2025 12:02 pm

[…] Can AI Rewrite Bad Queries in Seconds? Kinda. (Brent Ozar) […]

Reply
Robert Sterbal
January 23, 2025 10:19 pm

Just curious if paying for LLM service subscriptions is part of process here. Do you have a sense as to which ones make sense to pay or ask my employer to pay for?

Reply
Rick4him
January 29, 2025 3:48 pm

I find many of these not fully correct as when using set var1 to isnull(count1, 0), you are setting it to zero because currently there are no count. However, prior to your update there could have been an existing count, and you have now set it to zero because you currently have no count. If you used set var1 to isnull(value, var1), then you preserve the prior value when there is no current count. I suspect one cannot blame the LLM/AI as the issue is present in the initial query given to it.

Reply