[Video] I Must Be an Idiot: Automatic Tuning Never Works for Me.

SQL Server 2022
20 Comments

I don’t get it. I’ve given this feature one chance after another, and every time, it takes a smoke break rather than showing up for work.

The latest instance involved the recent Query Exercise where you were challenged to fix a computed column’s performance. In the comments, some folks noted that performance of the query was actually great on old compat levels, like SQL Server 2008, and that it only sucked on newer compat levels like 2016 and later.

That would be the perfect use case for Automatic Tuning, I thought to myself! I’ve said this so many times over the last five years, but I’m an endlessly hopeful optimist, as anyone who knows me well would never say, so I gave it another chance.

We’ll set up the same user-defined function and computed column described in that blog post:

We’ll set up an index on the Reputation column, and a stored procedure that’ll benefit from using that index:

We’ll put our database in old-school compat level, and turn on Query Store to start collecting data at a frantically quick pace:

We’ll run the query a few times, noting that it runs blazing fast, sub-second:

And check Query Store’s Top Resource Consuming Queries report to show that the plan is getting captured:

Now, let’s “upgrade” our SQL Server to the latest and “greatest” compatibility level, and turn on Automatic Tuning so that it’ll “automatically” “fix” any query plans that have gotten worse:

Now, when we go to run our query again, it takes >30 seconds to run, burning CPU the entire time – and zee Automatic Tuning, it does nothing. Query Store shows that there’s a new plan, and that the runtime is way, way worse:

But sys.dm_db_tuning_recommendations shows nothing, even though Query Store is on and so is “Automatic” “Tuning”:

To see it in inaction, here’s a live stream:

I don’t get it. I’ve given this feature so many chances, and it’s never kicked in for me at the right times. I’m guessing I’m missing some secret set of steps I need to take, but whatever it is, it’s beyond me. Maybe you can get it to work in this scenario, and share your magic? Now’s your chance to make me look like a fool.

Well, I mean, like even more of a fool.

Update Aug 7 – in the comments, Uri reminds us of Kendra Little’s post about problems with automatic plan correction. She’s frustrated with it as well.

Update October 16 – Erik Darling gave it a shot too and video’d his efforts. He got it working, but the query had to run poorly ~20 times for Automatic Tuning to wake up:

He’s using the much smaller StackOverflow2013 database, so he could actually get 20 runs to finish – whereas in the current StackOverflow databases, the video would have been a whoooole lot longer before Automatic Tuning would have kicked in. He also had some problems with SQL Server abandoning automatic tuning, reporting that the query was error-prone.

Previous Post
Who’s Hiring in the Microsoft Data Platform Community? August 2024 Edition
Next Post
Query Exercise: What Makes SELECT TOP 1 or SELECT MAX Different?

20 Comments. Leave new

  • Maybe it is because SQL Server does not consider your query “Regressed” and that is why? When you ran the Regressed Query report it did not show up.

    At least, that’s what seems to be implied here in the dm_db_tuning_recommendations view docs. All of the columns reference regressed queries.

    https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-tuning-recommendations-transact-sql?view=sql-server-ver16#remarks

    I did a bit of digging (with sp_blitzcache) and found the query that is used behind the Regressed Query report. It appears to do some analysis on this table: sys.query_store_runtime_stats and calculates some run time differences–which I assume are then used to populate the dm_db_tuning_recommendations view.

    I can post that full query for the Regressed Query report, but I didn’t want to take up too much page space here.

    I didn’t have time to test all this yet though. Just poking around on lunch break…

    tl;dr seems that auto tuning might only kick in when SQL identifies a regressed query based on calculations it does on the query store runtime stats page

    Reply
    • Yeah, which would be bad, given that’s not what the documentation says it does.

      Reply
      • Are you seeing something different in the documentation?

        It’s not the clearest, but from what I see on these docs, and the definitions for the columns in the dm_db_tuning_recommendation dmv–they reference Regressed Queries or plan regressions.

        https://learn.microsoft.com/en-us/sql/relational-databases/automatic-tuning/automatic-tuning

        “Automatic tuning SQL Server identifies and fixes performance issues caused by query execution plan choice regressions.”

        My question is, how does MS define “query regression”? I found the query behind the regression report, if that’s how it works “behind the scenes”.

        Maybe the folks who write the docs are too close to the problem and assume that we’ll understand the terminology even though it isn’t defined anywhere…

        Reply
  • https://littlekendra.com/2024/01/17/automatic-plan-forcing-could-be-great-but-isnt-sql-server/

    /*EXECUTE sys.sp_configure_automatic_tuning
    @option = N’FORCE_LAST_GOOD_PLAN’, /*Option*/
    @type = N’QUERY’, /*Target*/
    @type_value = 1, /*query_id from Query Store*/
    @option_value = N’OFF’; /*Disable**/

    Reply
  • Wojciech Sawicki
    August 9, 2024 9:28 am

    The problem is completely different.
    After the version is upgraded, the function is treated as non-deterministic. Not as an attribute but in the processing logic. Why I don’t know. The remaining behavior is the result. The query demand WHERE IsValidUrl = 1 and mssql dont trust and must recalculate.
    Of course, it is better to select 200 rows to be recalculate instead of recalculate all of them, filtering IsValidUrl = 1 condition, sorting and selecting 200. But it is what it is. Automatic tuning can handle different plans but here is one (plans for different versions are not compared).

    Reply
    • Wojciech – I’m not saying you’re wrong, but you’ll want to provide more proof for the guess that the same function is deterministic in specific versions, and non-deterministic in others.

      Reply
      • Wojciech Sawicki
        August 12, 2024 5:50 am

        I base my observations on the ones described:
        https://www.brentozar.com/archive/2024/07/query-exercise-fix-this-computed-column/
        When we set COMPATIBILITY_LEVEL = 110 query use index and statistic Reputation,
        when we set COMPATIBILITY_LEVEL = 160 the index is ignored and all values are recalulated.

        But keeping 160 and after adding with schemabinding to the funcion and after adding persisted to the column the query plan is similiar to plan created with level 110

        Reply
        • That doesn’t have anything to do with deterministic vs non-deterministic. I think you might be misunderstanding what the words deterministic and non-deterministic mean.

          Reply
          • Wojciech Sawicki
            August 13, 2024 6:51 am

            Well, let’s ask mssql.
            I want 160 to work the same as 110.
            To do this I want to add the persisted attribute to the IsValidUrl column.
            But this reports:
            Computed column ‘IsValidUrl’ in table ‘Users’ cannot be persisted because the column is non-deterministic.
            So I add ‘with schemabinding’ to the function
            and this change causes the query
            SELECT OBJECTPROPERTY(OBJECT_ID(‘[dbo].[IsValidUrl]’), ‘IsDeterministic’) to return 1
            Side note: this is interesting in itself
            Now you can set the persisted attribute and the procedure works fast like in 110.

            All this is not needed in 110 hence my conclusion, perhaps unjustified, that the function is treated differently in 110 and 160. However, this is the only justification for the behavior that comes to my mind. If you have another one, please provide it.

          • You’re so close! Now, go back to 110, without the schemabinding in the function. Is the function deterministic? Use the technique you explained in your comment to find out.

  • This demo should work with three small changes.
    – Turn on Automatic Plan Correction (APC) before executing the function
    – Enabling trace flag 12618, which will tell APC to use the same plan regression detection model that is in use in Azure SQL Database. This trace flag only applies to SQL Server 2022 CU4 or later.
    The trace flag is not required for this demo but is something that I always advise folks to use if applicable to their version of SQL Server.
    – Execute the function at least 15 times, since APC likes to see more executions of queries while it smokes its cigar, before it decides to “go to work”

    USE StackOverflow;
    GO

    CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
    RETURNS BIT
    AS
    BEGIN
    DECLARE @Result BIT = 0

    END
    GO
    ALTER TABLE dbo.Users ADD IsValidUrl
    AS dbo.IsValidUrl(WebsiteUrl);
    GO

    CREATE INDEX Reputation ON dbo.Users(Reputation);
    GO
    CREATE OR ALTER PROC dbo.GetTopUsers AS
    BEGIN
    SELECT TOP 200 *
    FROM dbo.Users
    WHERE IsValidUrl = 1
    ORDER BY Reputation DESC;
    END
    GO

    /* 1 – Fast execution /
    ALTER DATABASE CURRENT
    ??????SET COMPATIBILITY_LEVEL = 110 /
    2012 */
    GO
    ALTER DATABASE SCOPED CONFIGURATION
    ??????SET LEGACY_CARDINALITY_ESTIMATION = OFF;
    GO

    ALTER DATABASE CURRENT
    ??????SET QUERY_STORE = ON
    GO
    ALTER DATABASE CURRENT SET QUERY_STORE
    ??????(OPERATION_MODE = READ_WRITE,
    ?????? DATA_FLUSH_INTERVAL_SECONDS = 60,
    ?????? INTERVAL_LENGTH_MINUTES = 1)
    GO
    ALTER DATABASE CURRENT SET QUERY_STORE CLEAR;
    GO
    ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
    GO

    ALTER DATABASE CURRENT
    SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = ON );
    GO

    /* Tell APC to use the same regression detection model that exists
    in Azure SQL Database. This requires CU4+ for SQL Server 2022
    */
    DBCC TRACEON (12618, -1);
    DBCC TRACESTATUS(-1);
    GO

    –Check the state of APC
    SELECT name, actual_state_desc, reason_desc
    FROM sys.database_automatic_tuning_options
    WHERE name in (‘CREATE_INDEX’,’DROP_INDEX’, ‘FORCE_LAST_GOOD_PLAN’);
    GO

    –Execute the function
    EXEC dbo.GetTopUsers;
    GO 16

    –There will be no recommendations yet, obviously ?
    SELECT
    ??????JSON_VALUE([state], ‘$.currentValue’) [state],
    ??????script = JSON_VALUE(details, ‘$.implementationDetails.script’),
    ??????planForceDetails.*
    FROM sys.dm_db_tuning_recommendations
    CROSS APPLY OPENJSON (Details, ‘$.planForceDetails’)
    WITH ( [query_id] int ‘$.queryId’,
    regressedPlanId int ‘$.regressedPlanId’,
    recommendedPlanId int ‘$.recommendedPlanId’,
    regressedPlanErrorCount int,
    recommendedPlanErrorCount int,
    regressedPlanExecutionCount int,
    regressedPlanCpuTimeAverage float,
    recommendedPlanExecutionCount int,
    recommendedPlanCpuTimeAverage float
    ) AS planForceDetails;
    GO

    /* 2 – Slow Execution /
    ALTER DATABASE CURRENT
    ??????SET COMPATIBILITY_LEVEL = 140 /
    2017, this can be set higher of course */

    –Execute the function, this will take longer this time under the default CE
    EXEC dbo.GetTopUsers;
    GO 16

    /* I like to execute the following queries together to see the recommendation and the >10 second CPU difference, as well as the 2 plans in Query Store
    /
    SELECT
    ??????JSON_VALUE([state], ‘$.currentValue’) [state],
    ??????script = JSON_VALUE(details, ‘$.implementationDetails.script’),
    ??????planForceDetails.

    FROM sys.dm_db_tuning_recommendations
    CROSS APPLY OPENJSON (Details, ‘$.planForceDetails’)
    WITH ( [query_id] int ‘$.queryId’,
    regressedPlanId int ‘$.regressedPlanId’,
    recommendedPlanId int ‘$.recommendedPlanId’,
    regressedPlanErrorCount int,
    recommendedPlanErrorCount int,
    regressedPlanExecutionCount int,
    regressedPlanCpuTimeAverage float,
    recommendedPlanExecutionCount int,
    recommendedPlanCpuTimeAverage float
    ) AS planForceDetails;
    GO

    /* Just to see what’s in the Query Store or take a look at the
    Queries with Forced Plans Query Store report after persisting
    the latest data from the in-memory representation of Query Store
    to disk
    */
    EXEC sys.sp_query_store_flush_db;

    SELECT TOP 10
    qsq.query_id,
    SUM(qrs.count_executions) AS sum_executions,
    SUM(qrs.count_executions) * AVG(qrs.avg_logical_io_reads) as total_logical_reads,
    ??????SUM(qrs.count_executions * qrs.avg_cpu_time / 1000.0) AS total_cpu_ms,
    AVG(qrs.avg_logical_io_reads) AS avg_logical_io_reads,
    ??????SUM(qrs.count_executions * qrs.avg_cpu_time / 1000.0)/ SUM(qrs.count_executions) AS avg_cpu_ms,
    ROUND(CONVERT(float, SUM(qrs.avg_durationqrs.count_executions))/NULLIF(SUM(qrs.count_executions), 0)0.001,2) avg_duration_ms,
    COUNT(DISTINCT qsp.plan_id) AS number_of_distinct_plans,
    (SELECT TOP 1 qsqt.query_sql_text FROM sys.query_store_query_text qsqt
    WHERE qsqt.query_text_id = MAX(qsq.query_text_id)) AS query_text,
    TRY_CONVERT(XML, (SELECT TOP 1 qsp2.query_plan from sys.query_store_plan qsp2
    WHERE qsp2.query_id=qsq.query_id
    ORDER BY qsp2.plan_id DESC)) AS query_plan
    FROM sys.query_store_query qsq
    JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id
    CROSS APPLY (SELECT TRY_CONVERT(XML, qsp.query_plan) AS query_plan_xml) AS qpx
    JOIN sys.query_store_runtime_stats qrs on
    qsp.plan_id = qrs.plan_id
    JOIN sys.query_store_runtime_stats_interval qsrsi on
    qrs.runtime_stats_interval_id=qsrsi.runtime_stats_interval_id
    WHERE qsrsi.start_time >= DATEADD(HH, -4, GETUTCDATE()) –Arbitrarily looking at the past 4 hours
    GROUP BY qsq.query_id, qsq.query_hash
    ORDER BY avg_cpu_ms DESC;
    GO

    Reply
  • As the sign says above, “Just because I have to get my two cents in”…

    I’ve seen a fair number of ‘tubes by “experts” on the subject. My final thought has always been, “great… it will automatically switch between 2 or more plans that are absolutely horrible”.

    For me, the real problem occurred when we switched from 2016 to 2022. Performance absolutely tanked. Only switching back to the 2016 comparability level brought back some semblance of performance but our large nightly batch runs still take sometimes hours longer than they used to. We were able to switch to the 2017 comparability level to take advantage of some of the newer T-SQL functionality but performance still tanks when we switch to 2022. I don’t know about 2019 but considering what everyone else wrote about performance issues there, it’s a “Why bother even trying?” type of thing for me.

    People have suggested that we’ve done something wrong with hardware… that’s not it because we I have 2017 and 2022 instances installed on my laptop and the identical fundamental code runs between 15 and 30% slower on the 2022 version.

    Heh… and some of the stuff running in the background to supposedly help a human optimize code helps optimize the code better when you simply turn it off. 😉

    Anyway, glad to see others that are not impressed, as well.

    Reply
  • LOL, I know this is late to the party, but…
    One thing in the documentation is that the automatic tuning won’t switch plans unless it determines that the estimated CPU gain is higher than 10 seconds, or if the number of errors in the new plan is higher than the number of errors in the recommended plan.
    It’s buried about half way down on https://learn.microsoft.com/en-us/sql/relational-databases/automatic-tuning/automatic-tuning?view=sql-server-ver16
    I didn’t see that specifically called out in your post or any of the comments, so I wasn’t sure if you’d seen that gem or not.
    It’s possible that you’re not getting the automatic correction because you aren’t hitting that 10 second threshold. It appears to be cumulative, so if you run the regressed plan until the total difference between your good plan and your regressed plan hits that 10 second mark, it might switch.

    Reply
    • Watch the video and note the query runtimes. We talk about that in the video.

      Reply
      • Ah… that’s interesting. There’s nothing more frustrating that saying “Well it works on my machine…” But to be fair, I do have this turned on in one of my databases, because we’re constantly getting regressions, and it does seem to kick in on a fairly regular basis for me. The forced query report shows more than 60 queries that have been automatically forced. I really wonder what the difference is…

        Reply
        • That’s why I use open source databases and literally shared my work above – so you can find out if indeed it works on your machine. Otherwise, you’re saying “well something else works on my machine.” Fair?

          Reply
  • Hey Brent,

    Thanks for the post! I just got to it over a year too late…
    Let me add one more nail in the auto tuning coffin.
    We had it enabled on a Prod environment, and after some time (a week, two, not sure what was the rule here) the DB’s gone 100% CPU usage rendering the system unusable. Turning the feature off instantly solved the problem.
    We the learned that it may have something to do with the Service Broker service being disabled, as that is the first thing that comes to attention when analyzing the logs.
    Have you encountered similar scenarios where the Automatic Tuning actually blew up the sever instead of helping? Do you know anything about the relation to the Service Broker service?
    Anything changed in your attitude over the last year?

    Thanks!
    M.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.