MAXDOP Isn’t Really MAXDOP. It’s More Like DOP.

Execution Plans

Here’s how Books Online describes the Max Degree of Parallelism setting:

You can use the max degree of parallelism option to limit the number of processors to use in parallel plan execution.

And here’s what the SQL Server 2019 setup screen says:

When an instance of SQL Server runs on a computer that has more than one CPU logical core, it detects the best degree of parallelism, that is, the number of processors employed to run a single statement, for each parallel plan execution. MAXDOP specifies the maximum number of cores to utilize for this instance.

Uh, no, SQL Server doesn’t really detect that.

In most cases, if SQL Server decides to parallelize your query, it’s goin’ straight to MAXDOP.

I’ll demo it with a 64-core server with SQL Server 2019’s defaults.

I’m using an i3.16xlarge AWS EC2 instance with 2 sockets, 64 cores altogether:

During setup, SQL Server suggests that we set MAXDOP to 8:

Which I find kinda amusing, since on an i3.metal with 72 cores (just 8 more cores), SQL Server 2019 setup suggests a MAXDOP of 15, but whatever.

I’ll add an index, then run a simple query against the Users table in a large Stack Overflow database:

The query just barely clears the default Cost Threshold for Parallelism (5), so it goes parallel:

Well, lemme rephrase that – the query looks like it goes parallel in the sense that there’s a parallelism operator, and icons on the plan have what Erik Darling calls racing stripes.

However, if you right-click on each operator, like the sort, and look at the number of pages assigned to each thread, it isn’t what you would call balanced. One thread read all of the data and did all of the work while the rest of the threads went out for a smoke break.

This is why your queries produce CXPACKET waits: SQL Server doesn’t do a great job of balancing work across cores, and when that work isn’t evenly balanced, SQL Server has to account for the time the idle cores aren’t doing any work.

If we raise our MAXDOP higher, the problem becomes worse. I’ll set MAXDOP up to 64, and then run the query again:

The execution plan looks identical:

But right-click on that Sort operator, and poor Thread 4 is the only thing holding this thing together:

And the query burned up 64 worker threads and generated 620 milliseconds of CXPACKET wait – in a query that finished in under 100ms:

This is why it’s so important to set Cost Threshold for Parallelism correctly.

It’s not enough just to set MAXDOP – because MAXDOP simply isn’t a maximum. It’s more like a minimum. If your query crosses over the CTFP, buckle up: you’re going way parallel, and even just MAXDOP 8 has CXPACKET issues you can’t fix.

There are indeed some cases where the degree of parallelism will be lower than MAXDOP – for example, when you try to set MAXDOP 0 on a 72-core server – but it’s nowhere near like what the documentation describes, which makes it sound like a query-by-query decision based on the size of your workload.

Previous Post
Announcing a New Live Online Class: Fundamentals of TempDB
Next Post
Paul White Explains Temp Table Caching 3 Ways

11 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.