Is There a Bug in SQL Server’s MAXDOP Calculation? (Update: Yes!)

I think I’ve found a bug in SQL Server setup’s MAXDOP calculation, and I need you to take a second look. Setup is recommending MAXDOP 8:

Setup recommending MAXDOP 8

Which is odd, because this is running on an AWS i3.16xlarge with 2 sockets, 2 NUMA nodes, 32 logical processors per node, 64 logical processors altogether. In this screenshot, I’ve laid Task Manager alongside setup so you can see what I mean:

If you click on the “Configure the max degree of parallelism” link in setup, it says:

In this case, the SQL Server has multiple NUMA nodes, with greater than 16 logical processors per node – that’s the last line of the screenshot. In that line, Microsoft says MAXDOP should be half of the number of logical processors with a max of 16 – so 16.

But it’s recommending 8. Hmm.

Is Soft-NUMA the culprit? No.

After SQL Server finishes installation and starts up, the error log shows automatic soft-NUMA kicked in because NUMA nodes had more than 8 physical cores:

SQL Server errorlog

So it created 4 NUMA nodes, each with 16 logical processors. In that case, maybe the setup calculation was taking that configuration into account – 4 nodes, each with 16 cores. In that case, let’s revisit the guidance:

Now, the number in play is the 3rd line of the screenshot – “Less than or equal to 16 logical processors per NUMA node.” In that case, we’re supposed to keep MAXDOP at or below the # of logical processors per NUMA node.

Now, technically 8 is below 16 – but where the heck is 8 coming from? Why not, uh, 16, or 4, or 12, or for that matter, 2?

I’m guessing I’m missing something obvious, but I asked on Twitter, and nobody figured it out. Worst case, either the documentation is wrong, or setup is wrong, there’s some kind of other recommendation that isn’t shown in setup – like maybe there’s a hidden max of MAXDOP 8 during setup?

So, what’d I miss?

Update 1: one of the smartest SQL Server folks I know, Joe Obbish, writes that he thinks it may not be documented anywhere. It’d be cool if we did get it documented, though – whatever logic is good enough for setup should be good enough for KB 2806535, so users can make that same decision as their VM sizes grow.

Mystery Solved: Yep, There’s a Mismatch.

Microsoft employee Sean Gallardy answered that setup uses a different formula than the documentation recommends:

Step 1: Calculate Hardware NUMA and Soft NUMA
Step 2: Decide whether Hardware or Soft NUMA will be used
Step 3: Divide the total logical processors by the NUMA used
Step 4: If > 15 LPs/NUMA, MAXDop = (LPs/NUMA)/2, otherwise MAXDop = LPs/NUMA

Which means that if you have:

  • 14 logical processors per NUMA node: MAXDOP will be set to 14
  • 16 logical processors per NUMA node: MAXDOP will be set to 8

<sigh> That doesn’t make any sense, but it is what it is. It’s probably always been this way since SQL Server 2016 “fixed” MAXDOP by setting it during setup.

Previous Post
Who’s Hiring in the Microsoft Data Platform Community? November 2022 Edition
Next Post
[Video] Office Hours: Surprise Edition

19 Comments. Leave new

  • The documentation has this note: “NUMA node in the above table refers to soft-NUMA nodes automatically created by SQL Server 2016 (13.x) and higher versions, or hardware-based NUMA nodes if soft-NUMA has been disabled.”

    By that logic it’s not a bug. I’ve long felt that the published guidance is ambiguous and confusing. I suspect that parts of it say “soft-NUMA” when they mean “hard NUMA” or the other way around. Some complaints from 2019 here: https://www.erikdarlingdata.com/thoughts-on-maxdop/

    Reply
    • Right, but:

      If the calculation uses soft-NUMA, then the logical processors per node are 16. The 3rd line in guidance applies, where it says “Less than or equal to 16 logical processors per NUMA node” – in which case the setting would be to keep MAXDOP at the procs per NUMA node, or 16.

      If it doesn’t, then the logical processors per node are 32. The 4th line in guidance applies, which would be to max out at 16.

      Reply
    • The only way I can mentally get this calculation to be correct is if the documentation is wrong about “logical processors”, and they meant to say “Keep MAXDOP at or below # of PHYSICAL processors per NUMA node” instead of LOGICAL processors.

      Reply
      • Just reading your article I was thinkng, yep they mixed up 16. And yes, reading on, Microsoft employee confirmed.

        Reply
  • Just set it to 0. Problem solved. Easy peasy! lol

    Reply
  • Just reading your article I was thinkng, yep they mixed up 16. And yes, reading on, Microsoft employee confirmed.

    Reply
  • It wasn’t always that way no. SQL 2016 had a different calculation. Several complex customer cases involving exchange issues lead to that setup formula. It wasn’t thoroughly documented because in the real world you’ll likely not find soft numa nodes with 15 cpus (which the soft numa formula conceptually allows).
    So no bug, just not documented. Honestly, the documented formula is still my rule of thumb for manual setting. For mixed workloads of even operational DW the new DOP Feedback feature will likely lower DOP further. Starting at 8 or conceptually up to 14 isn’t really an issue, it’ll likely be excessive parallelism for most workloads out there. And for analytics, you’ll likely tune DOP to be even higher, and crossing numa boundaries (as in using majority of cpus within a set of nodes) isn’t really an issue. The issue would be when one is using a full numa node and 1 or 2 cpus from another and then adjusting DOP is paramount (I’ve seen this happen).

    Reply
  • Michael MacGregor
    November 5, 2022 4:23 am

    TBH So what? MAXDOP can be adjusted, and most likely will be, whether server wide or query specific.

    Reply
    • Anders Pedersen
      November 5, 2022 5:08 am

      I think you’d be surprised how many DBAs didn’t understand half of this article and comments if forced to read it. I.e. the majority of DBAs that probably never reads these kind of posts.

      Reply
  • MAXDOP > 8 is very rarely helpful. Usually only slightly decreases duration with a very big increase in CPU time. Of course there may be queries that benefit from it, but it shouldn’t be your first knob to turn.

    Reply
    • Sure, if you think 8 should be the max recommendation, you’re free to contact Microsoft or submit a pull request for the documentation to be changed. Right now, that’s not what the documentation says. Cheers!

      Reply
  • Francesco Mantovani
    November 7, 2022 8:11 am

    I’m not ashamed to say that I’m puzzled. From my understanding after reading the article the rule of thumb should be:

    “Till 16 processor /2. After 16 processor just put 8.”

    Am I wrong?

    Is that right?

    Reply
    • You’re wrong, and that’s okay – it’s a complex issue! If you don’t get it after the first reading or two, don’t feel bad. Folks at Microsoft missed it too. You’re in good company. 😉

      Reply
  • Or maybe because it’s an RC1 build, there are issues and it’s using the guidance from the 2008 – 2014 versions:

    From SQL Server 2008 through SQL Server 2014 (12.x), use the following guidelines when you configure the max degree of parallelism server configuration value:

    Server with multiple NUMA nodes Greater than eight logical processors per NUMA node Keep MAXDOP at 8

    Reply
  • One of the many reason to read this blog is to find out so many details of SQL Server that I’ve either oversimplified or just misunderstood in the past. In this case both.

    You’re like a reality-kick to the shin, reminding me not to be so confident. In a good way, mostly.

    Reply
  • Thanks all! DBA Candy… If I think I know allot… I find a new article or blog comments that tells me “I don’t know enough”. A good DBA has a never ending thirst for knowledge mixed in with ego and humility. Stay hungry, don’t rust and I’ve been doing this since 1987 – INGRES and SYBASE days on VAX/VMS.

    Reply
  • The “real problem” seems to be that the logic was designed to take the 4th line “at half the #of LP/NUMA with a MAX of 16” and use the lower value rather than the higher value.
    They could simply have used: if > 15 LP/NUMA then MAXDOP = 16 and keep it simple.
    Or covering both options it would be even simpler to say: MAXDOP = LP/NUMA to a MAX of 16 and have done with it.

    Reply
  • The comment about most DBA’s not understanding… This NUMA node calculation applies for those running a massive server with a big budget. How can the accuracy of MAXDOP be tested? Is it wait states? I suspect not many DBA’s are managing a server of this size.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.