Blog

When you query sys.dm_os_wait_stats or use our wait stats sampling query, one of your biggest wait types is probably CXPACKET. Let’s talk about what’s really going on. Here’s a 15-minute video explaining it, or you can keep scrolling down to read:

Our SQL Server Stand-In: a Class.

Who wants to help me scan the sales table?

Who wants to help me scan the sales table?

Pretend that you’re a schoolteacher in a classroom with a couple dozen students. You’ve got a class project that you need to accomplish as quickly as possible. The project is to build a list of all of the distinct words in the book Moby Dick, along with the frequency of each word.

There’s three ways you could do it:

1. Hand the entire book to a single student and let them work on it until it’s done. This student is going to be miserable, and it’s going to take a heck of a long time for this project to finish. However, this leaves the rest of the students to hang out and have fun playing Words With Friends.

2. Divide the pages of the book out between all of the students. Tear the book into a bunch of stacks of pages, distribute that work, and let them start working simultaneously. As they turn in their work to you, they can each get freed up to do other stuff. You (the teacher) will combine their work together into one master list. Each student’s paperwork will include the word “whale”, for example, and you’ll need to add up all of those “whale” references on one master list.

3. Divide the pages between a group of students (but not all of them.) Instead of putting all 24 students to work on the same project, you might tear the book into 8 sections and assign that work to 8 of your students. The other 16 students would be free to veg out in case another assignment comes in.

How This Relates to SQL Server

You’ve probably already figured out that the work involved is a query. The teacher is the master thread running a query. The students are CPU cores. Microsoft SQL Server can break a single query out across multiple cores as long as the work can be parallelized. (Paul White explains what kinds of work isn’t parallelized.)

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

Read that sentence again, because it’s really important. We’re going to come back to that.

The data warehouse server

The data warehouse server

The more students we get involved, the more work we have to do to coordinate their efforts and combine their results. The students, being rowdy goons, get frustrated when they get done with their work quickly and have to sit around waiting for other students to finish. This isn’t really a bottleneck per se – the students could go off and do other work – but they like to complain about how they had to wait around for the slow kids.

That complaining is CXPACKET – Class eXchange Packets. The class is turning in their packets, and complaining about the slow kids.

Just like kids complaining, CXPACKET by itself isn’t a problem. However, it’s an indication that something might be going wrong in this important sentence. Let’s read it again:

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

How to Reduce CXPACKET by Setting MAXDOP

The default Maximum Degree of Parallelism – the number of students we’ll get involved in a project – is zero, which means unlimited. That made sense back in the days when our SQL Servers only had a couple of cores, but today, we’ve got huge, ginormous processors. (I’m not bragging, I’m just saying.)

Microsoft KB 2806535 gives MAXDOP recommendations, and the short story is that you should set it to the number of cores in one processor, up to 8. (I’m simplifying here.)

The virtual machine has a little less power.

The virtual machine has a little less power.

Microsoft is saying that in most cases, as you go beyond 8 students working on a single project, there’s a diminishing amount of returns. You’re better off leaving the rest of the students free to work on other projects.

There’s a ton of bad advice on the web that goes a lot further and says you should set MAXDOP to 1 for OLTP-oriented workloads. They’re saying that if your application’s database needs are mostly very small inserts, updates, and deletes, then you should constrain SQL Server to only use one student at a time for any project, no matter how big that project appears to be.

That advice is usually no good because it assumes your server will never, ever run a big query. Sure, in some environments like Microsoft SharePoint, MAXDOP = 1 does make sense – but the key is really knowing the query workloads well. In most of the OLTP environments I see, people still want to run big, ugly reports in production, and we can’t stop them. When they do hit the server hard with those queries, I want to throw a few cores at the problem to make it finish faster. The key is knowing that there’s another knob we can tweak.

Defining Big Queries: Cost Threshold for Parallelism

Remember that really important sentence? Let’s read it again:

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

What exactly is a big query? Turns out we can actually tell SQL Server by setting Cost Threshold for Parallelism – the minimum cost of a query before SQL Server will consider dividing the work across multiple cores. It’s set in exactly the same place as MAXDOP – in SSMS, right-click on the server name and click Properties, Advanced.

The default Cost Threshold for Parallelism is 5 – but 5 what? Books Online says:

The cost refers to an estimated elapsed time in seconds required to run the serial plan on a specific hardware configuration.

That’s technically correct, but the words “specific hardware configuration” are a little tricky. They refer to an imaginary machine with an imaginary amount of CPU power and storage throughput. It’s like how we measure cars in horsepower. If you put together a tug-of-war between a Jaguar XFR-S and 550 horses, it’s not like it would be a dead heat. These are just numbers.

The real thing to know is that a query cost of 5 is a lot like 5 horsepower – it’s pretty darned small. With the defaults, SQL Server will aggressively try to parallelize everything.

The Real Way to Reduce CXPACKET Waits

Set MAXDOP per Microsoft’s recommendations in KB 2806535.

Set Cost Threshold for Parallelism in a way that lets your small queries stay single-threaded, but still gets more students involved when there’s a really big project. To do that, you have to…

Learn your workload. Use SQL Server’s plan cache to find the most resource-intensive queries on your system, and then analyze the costs of those queries. On one of my servers, for example, I know that we get hit with a lot of OLTP queries with a cost of around 30-40, and I don’t want those to go parallel. However, we also get a lot of reporting queries that scan the entire sales detail table, and those always have a cost of around 200 or higher.

This is a big topic, and tuning is different on a server by server basis. To get trained on learning your workload and taking the right steps for it, check out our SQL Server Performance Troubleshooting class.

↑ Back to top
  1. How can I find CXPACKET waits in MySQL?

    You wanted a comment…

  2. Nice post, very easy way to understand a complicated topic. Thanks for recommending we read it during your webinar!
    :)

  3. Why is CXPACKET waiting bad? It doesn’t measure “wasted” time, after all. It measures sleep time.

    • Tobi – by itself, it’s not bad. It’s like seeing someone at the supermarket covered in blood. It just makes you wonder what they’re doing wrong, and in this case, it’s usually indicative of someone who’s been murdering their parallelism settings.

    • CXPACKET wait is not bad and it was not mentioned as bad anywhere in this post too.

      • I ask because you’d only invest labor to reduce a wait type that is harmful.

        I do not see yet how more CXPACKET waits indicate inefficiency by themselves.

        • Tobi – a high amount of CXPACKET waits are indicative of SQL Server breaking up the work across the entire schoolroom of students without a good reason. There’s work involved in recombining the results from all those students.

    • Tobi, in addition to Brent’s excellent (of course), but overly bloody reply, here’s another one.
      CXPacket waits are not bad. In fact, if I see a server with >1 core and no CXPacket waits at all, I’d be very concerned. (And yes, I’d also be very concerned on seeing any server with only one core).
      What is bad, is *high* CXPacket waits. They are a sign that the overhead of synchronizing the parallel workers might exceed the cost saving achieved by distributing the work among them.

      (To steal Brent’s example – for counting words in Moby Dick, putting the entire class to work is a good idea. But for counting words in my comment to his blog post, it would be more efficient to just do it by yourself).

      • Hugo – Hahaha, nice reply. And that’s a GREAT point – if you see no CXPACKET waits, that’s a cause for concern too!

      • *high* – if i have a server whose waitstats show CXPACKET as the highest wait in terms of overall percentage (54%), but the avgwait_s = 0.006, cause for concern or no?

        • Mark, I am sure Brent has tuned more served just last week than I have in my entire career so I’m looking forward to his response on your question.

          But in general, one thing I find useful to gauge waitstats (not just CXPACKET, all of them) is to divide them into the number of days or hours since they were last reset. That can give a great indication,

          (I recently had a server with 40% IO related waitstats. Reason for concern. But when I did the math and found it was actually spending 15 minutes per hour waiting for the IO subsystem, the figure sounded much more convincing to management)

        • HAHAHA, Hugo, flattery will get you everywhere.

          Mark – 54% of what? For example, if I tell you that 54% of my meal last night was steak, that doesn’t really tell you anything. Did I eat a small steak, or a large one? You have to examine how much time you’re spending overall waiting for it, and look at that in context with the rest of your waits. If your server is sitting idle and no one is complaining about performance, then having 54% CXPACKET waits isn’t a problem.

          However, if your system is creaking and groaning under load, and your users are gathering at the door with pitchforks, then those CXPACKET waits may need to you need to revisit the parallelism settings I discuss in the post.

  4. Question, how do you get the cost of queries (30-40, or 200 above) once you get the ones in the cache that are resource-intensive?

  5. Nicely written post, I often have to explain to project managers, sys admins, etc what CXPackets are. This is my new reference!

    Would you expect higher CXPacket waits on a virtual SQL Server, especially older non-NUMA versions of VMWare?

  6. In my recent troubleshooting, Missing of needed Index introduces high wait time on CXPACKET. Created right index removes parallelism completely.

    • Adding that index probably greatly reduced the cost of the query, pushing it under the parallelism threshhold. I see this a good bit when I tune queries for co-workers. They wrote the query in such a way that it caused an index scan, but when we cleaned it up and got down to seeks the parallelism went away as well.

  7. It can be useful to step back and look at the whole picture. For instance, we had a case recently where the top two waits (by a large margin) were CXPACKET and PAGEIOLATCH_SH. The question then became not “Why is CXPACKET wait so high?” but, rather, “Why are we doing so much table (index?) scanning?”.

  8. Pingback: (SFTW) SQL Server Links 30/08/13 • John Sansom

  9. I have a couple questions on this subject.

    First, don’t queries actually use up to the MaxDOP on each item in an execution plan, so a MaxDOP of 8 can actually have 4 (or more) different pieces of the plan executing at the same time using a total of 32 cores? Not an issue, get the big stuff out of the way, just making sure my view is accurate.

    Second, why is it so common to see posts where it’s “if it is high then…”, but rarely in blogs is it ever encouraged to keep an eye on these stats incrementally long term? I’d love to see more of “when it’s higher than normal then….”, although, admittedly, that would have to come after an initial assessment and tuning to get normal to look ok.

    • Steve – first, yep, absolutely. It’s fairly unusual to see all items in a plan fire off at once though – usually there’s some dependencies, like we have to do an index seek first to build a list of rows, then use that list for other lookups.

      Second, right, now you’re talking about long term personalized guidance. That’s where consulting and training comes in. That’s usually outside of the scope of something someone can write quickly in a blog post, but check out some of the books I recommend for performance tuning, and they cover this:

      http://www.brentozar.com/go/books

      • My problem with the second item is more of a personal issue. That’s a big part of what my blog focuses on doing, and I want to make sure there’s nothing that’s going to come back to haunt me with any of it. For example with this one, my post on Wait Stats does just that.

        However, hearing you talking here and Jonathan Kehayias and Ted Krueger talking in their Free Accidental DBA Book left me with the impression that there’s a very good reason not to get into these details in blogs.

        I’m perfectly fine with the big monitoring companies not liking me throwing this stuff out there in easy-to-use formats. Not everyone has to like me.

  10. Couldn’t changing these settings reduce wait times, but also degrade query performance?

    What is the best way to measure query performance before and after these changes? Just run Profiler? Is there some reasonable way to quantify the performance effects of these changes?

    • Matt – great question. There’s so many ways to measure query performance that I can’t really do justice to them in a blog post comment. Some folks like to use Perfmon, some use monitoring tools, and others check the dynamic management views (DMVs).

  11. And of course, my favorite way to reduce CX_Packet waits is to tune queries that need tuning!!! :)

    I hold the belief that SharePoint needs MaxDOP = 1 not because of the workload, but rather to hide their horrible database design that deadlocks if you sneeze too loudly. :)

  12. Hi,

    Fantastic article as ever which has really helped me understand why i see CXPACKET at the top of every server’s wait stats.

    However, having read your blog & Microsoft’s Recommendations and guidelines for the “max degree of parallelism” configuration option
    in SQL Server (http://support.microsoft.com/kb/2806535) I’m still not sure how to correctly set MAXDOP for the following server config
    and i was hoping that you could clear up my confusion.

    I have a server with Physical CPU Count = 2, Hyperthread Ratio = 12, Logical CPU Count = 24.

    From my reading i understand that MAXDOP is related to the number of physical servers and from the above link given that it states
    “For servers that have NUMA configured and hyperthreading enabled, the MAXDOP value should not exceed number of physical processors per NUMA
    node.” my understanding is that i should be setting MAXDOP to 2 i.e. the number of physical CPUS.

    The server is configured to have 4 NUMA nodes and there are 6 logical CPUs per NUMA node so i’d be very grateful if you could explain what
    my MAXDOP setting should be.

    Thanks,
    George

    • George – at first glance, it sounds like you’ve got AMD CPUs. Do you know the exact model number on those?

      • Hi Brent,

        Aha, Your spidey sense is spot on as ever!

        The server in question has 2 x AMD Opteron(TM) Processor 6234 which wikipedia tells me have 12 cores each.

        CXPACKET waits up in 40% range for a server that’s been up for 8 months, with MAXDOP and Cost Threshold for Parallelism at default values.

        Thanks,
        George

  13. So is Parallelism always bad ?? in which we should force maxdop per that query ?

    • Parallelism is great – if you have a query big enough that it will benefit from using multiple logical processors.

      You should only use maxdop hints per query when you thoroughly understand how parallelism works, when you’ve successfully managed it at the server level, and you can prove that there’s no way to fix the problem at the server level. Hints should be the rare exception, not the norm.

    • Brent,

      Understood and thanks, will do!

      George

  14. George,

    The highest you normally want to set MaxDOP to is the number of cores in a NUMA node so that all processing can take place in a single node and it’s also a good number to just keep us under control. There are cases in large BI settings where you want this to be higher, and the higher the transaction level of OLTP the lower you should typically have MaxDOP.

    The number 6 for your type of processor is a starting point, nothing more. 4 is also a good starting point, and you may want to drop it from there if you have any issues. Personally, I like to start at the higher of 4 or the cores in the NUMA node and tweak it from there.

    Also, Brent is right, this is about as far as you can go without looking at your systems. Some odd deadlock situations, high number of transactions, wait stats, etc. will change my answer in the long run on a server-by-server basis.

  15. Great post Brent !

    in many cases, CXPACKET can also be reduced with a faster storage or faster CPUs (but that should be done only when you are sure the queries are optimal and server setting are good for the workload).
    If 7 students are faster (say 1 minute) and 1 student is very slow (say 10 minutes) then 9 students are waiting for a very long time.
    If you can reduce the time it takes the slower student to complete his task (say SSD or faster CPUs) you obviously cut CXPACKET significantly.

    Eyal

    • Eyal – I’m going to challenge you on that. Don’t measure bottlenecks like CPU or storage using CXPACKET – instead, look at the waits of those slower students, and focus on improving those wait types.

  16. Thanks Brent. You don’t challenge me on this, we are in agreement.
    My point was that focusing on the slower students and improving their waits will also improve CXPACKET as a side affect.
    so, if you think of what can you do to reduce CXPACKET (not that CXPACKET is a problem by itself that needs to be improved but more on a theoretical question) ? there were many answers above – like MAXDOP, Cost Threshold for Parallelism, missing indexes, rewrite query text etc. My point was that fixing the slower students (query tuning, server setting or as I gave an example CPU/Storage) will have similar affect.

  17. Could a very high wait time ‘CXPACKET’ also be caused by a faulty core? Or maybe a faulty bank on that NUMA

    • Nils – can you explain a little more about what you mean? Can you give a scenario where a CPU core would be broken somehow, and it wouldn’t show up in the system error log (or that the machine would still boot correctly)? If you’ve got a defective processor, hopefully your sysadmins will catch that in the system event logs, Dell OpenManage, HP System Insight Manager, etc.

      • Just a general question if I have say a 6 core server what is the difference between having the MAXDOP set to 6 versus the default 0?

  18. Pingback: SQL Server – Determine MAXDOP settings | John Perry

  19. Pingback: Using a HD Camcorder as a Mac Webcam - Brent Ozar | Brent Ozar

  20. Pingback: Worry about CXPACKET | Simple SQL Server

  21. but, what do you do, when developers at your company are using horrible things like Entity Framework or other ORMs? you can’t even see the query… you can’t tune it like you would do with an SP.

    • Well, I don’t think EF is horrible, but let’s set that aside for a second. You can see the queries using tools like Profiler and sp_BlitzCache, and you can do index tuning too. I’d start with Jeremiah’s new training course on execution plans.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php