PSPO: How SQL Server 2022 Tries to Fix Parameter Sniffing

Parameter sniffing is a notorious problem for Microsoft SQL Server because it tries to reuse execution plans, which doesn’t work out well for widely varying parameters. Here’s a primer for the basics about how it happens.

SQL Server 2022 introduces a new feature called Parameter Sensitive Plan optimization. I’m not really sure why Microsoft capitalized the first three letters sometimes and not others, or why they only abbreviate it as “PSP optimization” – as long as we’re abbreviating, why not call it PSPO? Optimization is the word in there with the most syllables – why the hell would we leave that one as the one that isn’t abbreviated?

Because I make things go faster for a living, we’re gonna call it PSPO. You’ll see how I pronounce it later, and why.

We’ll pick up from that primer blog post, but this time around we’ll put the database in SQL Server 2022 compatibility mode, which enables PSPO.

When I run it for @Reputation = 2, hardly any users match:

The actual execution plan has some new stuff in SQL Server 2022:

Up near the top of that screenshot, if you look closely at the query, the query itself has been modified. SQL Server has added an “option” hint. Here’s the full text of the modified query:

Let’s dig into how this works.

What option plan per value means

When SQL Server compiled the query plan for the first time, it noticed that we were doing an equality search on the Users.Reputation column, and noticed that different parameters for that value might produce dramatically different numbers of rows. That explains the “predicate_range” part.

SQL Server realized:

  • Some parameters might produce less than 100 rows
  • Some might produce 100 to 1,000,000 rows
  • Some might produce over 1,000,000 rows

So SQL Server will build a small, medium, and large query plan for this one query, using different plans depending on the parameter that gets passed in. That’s pretty spiffy because the plan isn’t hard-coding specific parameter values – instead, each time the plan gets executed, SQL Server will look up that parameter in the statistics histogram and choose the small, medium, or large plan based on the value it guesses from the histogram.

To see what goes into the plan cache, let’s clear the cache, run the query, and then check the plan cache’s contents with sp_BlitzCache:

sp_BlitzCache shows that the outer stored procedure has been executed one time, and the inner statement has been executed once:

That’s kinda slick because it means SQL Server only put the medium-sized plan into cache. It didn’t build the small or large plan because it didn’t need to yet – those haven’t been executed yet, and they might never be executed.

Let’s execute the large data version, Reputation = 1. That produces a ton of rows because everybody gets 1 point when they first open their account:

The actual plan is a parallel table scan:

And SQL Server rewrote the query text to include a different QueryVariantID:

And the plan cache shows that the outer proc has been executed twice, with both the medium and large query being executed once each:

If you only read this far, and only do a short demo like that, it seems like Microsoft made huge leaps in solving the parameter sniffing issue. PSPO enables us to cache up to 3 execution plans per query – a small, medium, and large plan – and chooses between them at runtime. That’s how it’s going to look from conference stages as Microsoft brags about the effectiveness of this feature.

But let’s dig just a little bit deeper, and there are small, medium, and large problems.

Small problem: we still have sniffing.

Try running it for, say, Reputation = 3 and review the actual query plan:

Reputation = 3 reuses the query plan we built for Reputation = 2 – scroll up to the earlier screenshots if you wanna double-check my work. Note that SQL Server only estimated that it’d find 9,149 rows – that’s because the medium plan sniffed the first value it was called with, Reputation = 2.

Reputation = 3 brings back 21x more rows than Reputation = 2 did, so it reads more logical pages than there are in the table, only goes single-threaded, and the sort spills to TempDB.

If we free the plan cache, and then run it for Reputation = 3 first:

Then the “medium” plan is built with a parallel scan plan with a big memory grant, which works beautifully for Reputation = 3. However, it doesn’t work as well for Reputation = 2:

Which leaves that giant memory grant on the floor and generates a ton of CX% waits because the estimate is now 21x off in the other direction.

SQL Server 2022’s PSPO implementation doesn’t fix parameter sniffing – it amplifies the problem, because now we have more possible execution plans in memory, each of which gets sniffed. In most scenarios, I think this is still going to be a net win, because the small and large plans will likely be way less vulnerable to extreme performance emergencies. The medium plans will still be just as vulnerable.

Medium problem: direct equality searches only.

The PSPO implementation is more of a down payment than a full payment. It only works for equality searches, not range searches. One of the most common parameter sniffing problems is the date range issue: queries with start & end date parameters, like this.

PSPO doesn’t add the option hint to the query here:

Because PSPO is coded for direct equality searches only.

By direct, I mean direct comparisons to a column with known huge variances in cardinality. For an example of an inequality search, let’s take a table that has a lookup table: the Posts and PostTypes table. Stack Overflow stores all kinds of things in the Posts table, and they’re identified by their PostTypeId:

Questions and Answers are by far the most popular PostTypes:

So given that I’ve got an index on PostTypeId, which means we also have statistics on PostTypeId, this query could get different plans for different parameters:

Sadly, it does not, because PSPO doesn’t trigger here either – note that PSPO didn’t add an option hint on the query:

Note the wiiiiildly incorrect estimates on the number of Posts that will match. SQL Server’s using the density vector there, optimizing for the average PostTypeId rather than any specific one. PSPO won’t go so far as to:

  1. Fetch the PostTypeId for ‘PrivilegeWiki’, then
  2. Look up that PostTypeId in the statistics on the Posts table, then
  3. Notice that there’s a large skew by PostTypeId, and build different plans

Nope – that’s too much work for the PSPO implementation, at least in v2022.

Large problem: monitoring software is doomed.

If you look closely at the sp_BlitzCache screenshots, there’s a huge, massive, giant, unbelievably big new problem:

In the “Query Type” column, the two PSPO-generated queries just say “Statement.” When you’re looking at queries and plans – whether you’re looking at the plan cache, or sp_WhoIsActive, or your monitoring tools, or Query Store as Erik Darling demos – SQL Server can no longer tell where the query came from. It’s like every running statement is suddenly dynamic SQL with no parent.

For example, if you look at the query plan for a stored procedure that’s been “optimized” by PSPO, you get:

If you think that’s bad, there’s no way to join to the underlying statements. That’s not just bad, it’s poor. Really, really poor.

Going into SQL Server 2022 compatibility level simply breaks query performance monitoring.

And it looks bad even in shades.
I call ’em like I see ’em.

When the first previews of 2022 dropped, I played around with this feature and thought, “There is absolutely no way they’d ship something this broken.” I just kinda shrugged and moved on. But now, today, it’s heartbreaking to think this is the way the feature’s going to ship.

That’s PSPO.

Pronounced pss-poh, as in piss-poor.

I love the idea of parameter-sensitive plan optimization. It’s a great idea. Done correctly, this would make database apps go faster and reduce the troubleshooting involved when they’re having performance issues.

But this, this is just a PSPO implementation.

Previous Post
What’s New in SQL Server 2022 Release Candidate 0: Undocumented Stuff
Next Post
Drawing Entity Relationship Diagrams with Stable Diffusion

27 Comments. Leave new

  • I can imagine the technical people and marketing folks sitting around the table figuring out what to name this “feature”, coming up with PSPO and marketing saying, “We can’t call it that. Brent Ozar is going to name it ‘piss poh'”. Let’s abbeviate it “PSP Optimization” and hope he doesn’t catch on.” Nothing gets by you!

    In your dreams, how should this type of problem really be solved? Are there any database engines that do this well?

    Reply
    • HAHAHA, there you go.

      I’d love a Cost Threshold for Recompile: https://feedback.azure.com/d365community/idea/87ae36c1-0acc-ec11-a81b-6045bd796569

      Reply
      • Love It!! +1

        Reply
      • 100% agree. +1

        Reply
      • Brian Boodman
        August 29, 2022 2:50 pm

        +1 from me. I have a few queries where I justified a recompile based on the query being expensive enough that recompile cost was comparatively minor. Having this be automatic would be a win.

        Reply
      • +1

        Reply
      • Isn’t this a chicken and egg problem somehow? I mean a parameter gets sniffed on the first run and the plan cost seems low, so it does not reach the “always recompile threshold”, but it only turns out for another parameter value, that the actual cost is huge. Or am I misunderstanding something?

        I was thinking a few days ago about some kind of a retroactive approach for SQL Server. That is let’s say we have cached plan with its estimated cost and cardinalities. When run with different parameter values, SQL detects that the actual performance (cost, cardinalities etc.) is very different. It admits making a mistake and then marks a statement as “always recompile due to cardinality variation”. It could be even as greedy as to simply cancel current execution and restart with a new plan when it seems it may be faster anyway. What do you think?

        Reply
        • No? That’s only true if the query dances on the edge of CTFP (sometimes above, sometimes below.) If that’s the problem you’re facing, just lower the number.

          I’m more concerned about big queries that get dramatically different plans.

          Reply
          • OK, so if I understand it correctly your assumption is that time needed for recompilation of a query with estimated cost over X is negligible when compared to the benefits of finding a very dedicated plan (for actual parameter values). And you want a database-wide (?) setting for X. I get it.

            But does this solve the same problem as PSPO is trying to solve? This is where I am lost.

          • I think you’re overthinking it. The problem is that resource-intensive queries often need different plans based on their incoming parameters. That’s all.

            Further discussion of semantics here, in a place where Microsoft isn’t listening to suggestions, isn’t really a great use of our time. Hope that’s fair.

    • Quick, close and reincarnarte the azure feedback site before this feedback gets any traction.

      Lol

      Reply
  • Geoff Langdon
    August 26, 2022 8:29 am

    Do you know if we will be able to simply turn the feature off, so that at least the monitoring software will allow us to manually tune as we do at the moment, and we will get the proc name instead of ‘Statement’?

    Reply
  • I wonder why they didn’t use a solution as for the adaptive joins to show the multiple plans in the execution plan (okay, maybe because they would have had to calculate all 3 plans in this case).

    What happens if you query the reputation = 3 multiple times (after “initializing” the plan with reputation = 2)? Shouldn’t the “old” 2019 memory grant feedback feature kick in and grands more memory to this plan after some executions?

    Reply
    • Yep, all you have to do is run queries with the same parameters a few times and then they’ll be fast

      And never run them with any other parameters

      I’m sure users will understand and obey that

      What could go wrong

      Reply
  • Stephen Morris
    August 26, 2022 9:08 am

    Trying to be constructive perhaps Microsoft could re-use the numbered routines feature in some way

    ie put this in SSMS & see what happens

    use tempdb

    go

    create procedure p_test;1 as

    select ‘test1’

    return(0)

    go

    create procedure p_test;2 as

    select ‘test2’

    return(0)

    go

    create procedure p_test;3 as

    select ‘test3’

    return(0)

    go

    select * from sys.objects where name like ‘p_test’

    Select * from sys.numbered_procedures

    drop procedure p_test

    go

    select * from sys.objects where name like ‘p_test’

    Select * from sys.numbered_procedures

    Reply
  • I loved your blog post about Parameter Sniffing.
    Thank you for providing detailed, actionable information!

    Reply
  • That’s how I try (and fail) to get my cat’s attention: psss-poh pssss-poh pssss-poh

    Meh, now I’m pretty disappointed with the feature overall.

    Reply
  • One thing that isn’t clear to me…..

    Outside of the monitoring issues this introduces….

    Does this feature actually give a benefit?

    I ask simply because I’m in a situation where we have heavily EF’d SQL generation, where sniffing is killing us.

    Is this a case of suck-it-and-see ?

    Reply
  • It seems that the 12/13/2022 update to sp_BlitzCache fixed the “everything is a Statement” issue, although the point stands that monitoring software needs to be updated to accommodate the changes. Or… are you asserting that there is something actually broken in the PSPO plan storage model that monitoring tools CAN’T accommodate? Please elaborate if so!

    Reply
  • Brett Shearer
    January 9, 2023 4:21 am

    I solved this problem client side. I retrieve statistics histograms, work out the density and mark each query.

    https://patents.google.com/patent/WO2017165914A1/en

    I’m using this in production every day on thousands of systems and it works well. SQL doesn’t have to do any more work than it has to (ie: it calculates plans when you give different parameter cardinalities) but all queries appear normal to monitoring tools. We take out the comments in our monitoring so all queries merge back together anyway (we ignore query hash/query plan hash and roll our own hash)

    Reply
    • Very cool! I can see how that’s a lot of work (building the proxy to recognize the queries, analyze the parameters, change the parameters on the fly, add your own query hash, build a monitoring tool, etc) though. It’s a shame you had to do that – sure would be nice if SQL Server wasn’t so sensitive to parameters.

      Congratulations on the patent!

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.