When you query sys.dm_os_wait_stats, or check your waits with sp_Blitz® or sp_BlitzFirst®, one of your biggest wait types is probably CXPACKET. Let’s talk about what’s really going on.
Our SQL Server Stand-In: a Class.
Pretend that you’re a schoolteacher in a classroom with a couple dozen students. You’ve got a class project that you need to accomplish as quickly as possible. The project is to build a list of all of the distinct words in the book Moby Dick, along with the frequency of each word.
There’s three ways you could do it:
1. Hand the entire book to a single student and let them work on it until it’s done. This student is going to be miserable, and it’s going to take a heck of a long time for this project to finish. However, this leaves the rest of the students to hang out and have fun playing Words With Friends.
2. Divide the pages of the book out between all of the students. Tear the book into a bunch of stacks of pages, distribute that work, and let them start working simultaneously. As they turn in their work to you, they can each get freed up to do other stuff. You (the teacher) will combine their work together into one master list. Each student’s paperwork will include the word “whale”, for example, and you’ll need to add up all of those “whale” references on one master list.
3. Divide the pages between a group of students (but not all of them.) Instead of putting all 24 students to work on the same project, you might tear the book into 8 sections and assign that work to 8 of your students. The other 16 students would be free to veg out in case another assignment comes in.
How This Relates to SQL Server
You’ve probably already figured out that the work involved is a query. The teacher is the master thread running a query. The students are CPU cores. Microsoft SQL Server can break a single query out across multiple cores as long as the work can be parallelized. (Paul White explains what kinds of work isn’t parallelized.)
When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).
Read that sentence again, because it’s really important. We’re going to come back to that.
The more students we get involved, the more work we have to do to coordinate their efforts and combine their results. The students, being rowdy goons, get frustrated when they get done with their work quickly and have to sit around waiting for other students to finish. This isn’t really a bottleneck per se – the students could go off and do other work – but they like to complain about how they had to wait around for the slow kids.
That complaining is CXPACKET – Class eXchange Packets. The class is turning in their packets, and complaining about the slow kids.
Just like kids complaining, CXPACKET by itself isn’t a problem. However, it’s an indication that something might be going wrong in this important sentence. Let’s read it again:
When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).
How to Reduce CXPACKET by Setting MAXDOP
The default Maximum Degree of Parallelism – the number of students we’ll get involved in a project – is zero, which means unlimited. That made sense back in the days when our SQL Servers only had a couple of cores, but today, we’ve got huge, ginormous processors. (I’m not bragging, I’m just saying.)
Microsoft KB 2806535 gives MAXDOP recommendations, and the short story is that you should set it to the number of cores in one processor, up to 8. (I’m simplifying here.)
Microsoft is saying that in most cases, as you go beyond 8 students working on a single project, there’s a diminishing amount of returns. You’re better off leaving the rest of the students free to work on other projects.
There’s a ton of bad advice on the web that goes a lot further and says you should set MAXDOP to 1 for OLTP-oriented workloads. They’re saying that if your application’s database needs are mostly very small inserts, updates, and deletes, then you should constrain SQL Server to only use one student at a time for any project, no matter how big that project appears to be.
That advice is usually no good because it assumes your server will never, ever run a big query. Sure, in some environments like Microsoft SharePoint, MAXDOP = 1 does make sense – but the key is really knowing the query workloads well. In most of the OLTP environments I see, people still want to run big, ugly reports in production, and we can’t stop them. When they do hit the server hard with those queries, I want to throw a few cores at the problem to make it finish faster. The key is knowing that there’s another knob we can tweak.
Defining Big Queries: Cost Threshold for Parallelism
Remember that really important sentence? Let’s read it again:
When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).
What exactly is a big query? Turns out we can actually tell SQL Server by setting Cost Threshold for Parallelism – the minimum cost of a query before SQL Server will consider dividing the work across multiple cores. It’s set in exactly the same place as MAXDOP – in SSMS, right-click on the server name and click Properties, Advanced.
The default Cost Threshold for Parallelism is 5 – but 5 what? Books Online says:
The cost refers to an estimated elapsed time in seconds required to run the serial plan on a specific hardware configuration.
That’s technically correct, but the words “specific hardware configuration” are a little tricky. They refer to an imaginary machine with an imaginary amount of CPU power and storage throughput. It’s like how we measure cars in horsepower. If you put together a tug-of-war between a Jaguar XFR-S and 550 horses, it’s not like it would be a dead heat. These are just numbers.
The real thing to know is that a query cost of 5 is a lot like 5 horsepower – it’s pretty darned small. With the defaults, SQL Server will aggressively try to parallelize everything.
The Real Way to Reduce CXPACKET Waits
Set MAXDOP per Microsoft’s recommendations in KB 2806535.
Set Cost Threshold for Parallelism in a way that lets your small queries stay single-threaded, but still gets more students involved when there’s a really big project. To do that, you have to…
Learn your workload. Use SQL Server’s plan cache to find the most resource-intensive queries on your system, and then analyze the costs of those queries. On one of my servers, for example, I know that we get hit with a lot of OLTP queries with a cost of around 30-40, and I don’t want those to go parallel. However, we also get a lot of reporting queries that scan the entire sales detail table, and those always have a cost of around 200 or higher.
This is a big topic, and tuning is different on a server by server basis. To get trained on learning your workload and taking the right steps for it, check out our SQL Server training videos.
What If Performance Gets Worse?
After setting Cost Threshold and MAXDOP, it’s possible that performance could suddenly get worse. You might have queries with a cost in the range of 5 to 50 that used to go parallel, and now suddenly go single-threaded.
Use sp_BlitzCache® to identify the top 10 most resource-intensive queries, making sure to run it with @SortOrder = ‘CPU’, then reads, then duration, looking at your SQL Server’s plan cache overall. Look for queries with a cost between 5 and 50, and do query tuning or index tuning on those queries.
It’s also possible that 50 might not be the right number: say you’ve got a query with an estimated cost of 40, and it’s actually a wildly underestimated cost. SQL Server is making a plan that doesn’t take into account how much data actually comes back. In that case, you can either tune the query, or lower Cost Threshold to 39 to let that query go parallel.
Wanna learn more?
In my Mastering Query Tuning class, I explain why queries go parallel, how SQL Server has a tough time balancing the work evenly across multiple cores, and how you can fix that by tuning the indexes and queries specifically for parallelism.
In my Mastering Server Tuning class, I dig into more details about Cost Threshold for Parallelism and MAXDOP, and I explain why those changes alone aren’t enough to rebalance work properly across multiple cores.
You can also get a deal on all of my Mastering classes with the Recorded Class Season Pass: Masters Classes.
How can I find CXPACKET waits in MySQL?
You wanted a comment…
You buy a MySQL Enterprise Edition license from Oracle.
right click on sql server instance-> properties->advanced->Parallelism
Nice post, very easy way to understand a complicated topic. Thanks for recommending we read it during your webinar!
Why is CXPACKET waiting bad? It doesn’t measure “wasted” time, after all. It measures sleep time.
Tobi – by itself, it’s not bad. It’s like seeing someone at the supermarket covered in blood. It just makes you wonder what they’re doing wrong, and in this case, it’s usually indicative of someone who’s been murdering their parallelism settings.
Excellent explanation Brent,loved it.
Brent, you are awesome! love your answer
Great reply. Btw, great job creating analogy for CXPacket and parallelism and MAXDOP.
Nice explanation. I am going to review and change it for my server.
CXPACKET wait is not bad and it was not mentioned as bad anywhere in this post too.
I ask because you’d only invest labor to reduce a wait type that is harmful.
I do not see yet how more CXPACKET waits indicate inefficiency by themselves.
Tobi – a high amount of CXPACKET waits are indicative of SQL Server breaking up the work across the entire schoolroom of students without a good reason. There’s work involved in recombining the results from all those students.
Tobi, in addition to Brent’s excellent (of course), but overly bloody reply, here’s another one.
CXPacket waits are not bad. In fact, if I see a server with >1 core and no CXPacket waits at all, I’d be very concerned. (And yes, I’d also be very concerned on seeing any server with only one core).
What is bad, is *high* CXPacket waits. They are a sign that the overhead of synchronizing the parallel workers might exceed the cost saving achieved by distributing the work among them.
(To steal Brent’s example – for counting words in Moby Dick, putting the entire class to work is a good idea. But for counting words in my comment to his blog post, it would be more efficient to just do it by yourself).
Hugo – Hahaha, nice reply. And that’s a GREAT point – if you see no CXPACKET waits, that’s a cause for concern too!
*high* – if i have a server whose waitstats show CXPACKET as the highest wait in terms of overall percentage (54%), but the avgwait_s = 0.006, cause for concern or no?
Mark, I am sure Brent has tuned more served just last week than I have in my entire career so I’m looking forward to his response on your question.
But in general, one thing I find useful to gauge waitstats (not just CXPACKET, all of them) is to divide them into the number of days or hours since they were last reset. That can give a great indication,
(I recently had a server with 40% IO related waitstats. Reason for concern. But when I did the math and found it was actually spending 15 minutes per hour waiting for the IO subsystem, the figure sounded much more convincing to management)
HAHAHA, Hugo, flattery will get you everywhere.
Mark – 54% of what? For example, if I tell you that 54% of my meal last night was steak, that doesn’t really tell you anything. Did I eat a small steak, or a large one? You have to examine how much time you’re spending overall waiting for it, and look at that in context with the rest of your waits. If your server is sitting idle and no one is complaining about performance, then having 54% CXPACKET waits isn’t a problem.
However, if your system is creaking and groaning under load, and your users are gathering at the door with pitchforks, then those CXPACKET waits may need to you need to revisit the parallelism settings I discuss in the post.
Question, how do you get the cost of queries (30-40, or 200 above) once you get the ones in the cache that are resource-intensive?
Mark – unfortunately, that’s beyond the scope of this post. That’s something we cover in our in-person performance tuning classes, though:
No problemo, thanks for the great post though!
Nicely written post, I often have to explain to project managers, sys admins, etc what CXPackets are. This is my new reference!
Would you expect higher CXPacket waits on a virtual SQL Server, especially older non-NUMA versions of VMWare?
In my recent troubleshooting, Missing of needed Index introduces high wait time on CXPACKET. Created right index removes parallelism completely.
Adding that index probably greatly reduced the cost of the query, pushing it under the parallelism threshhold. I see this a good bit when I tune queries for co-workers. They wrote the query in such a way that it caused an index scan, but when we cleaned it up and got down to seeks the parallelism went away as well.
Stuart – exactly, took the words right out of my mouth!
It can be useful to step back and look at the whole picture. For instance, we had a case recently where the top two waits (by a large margin) were CXPACKET and PAGEIOLATCH_SH. The question then became not “Why is CXPACKET wait so high?” but, rather, “Why are we doing so much table (index?) scanning?”.
I have a couple questions on this subject.
First, don’t queries actually use up to the MaxDOP on each item in an execution plan, so a MaxDOP of 8 can actually have 4 (or more) different pieces of the plan executing at the same time using a total of 32 cores? Not an issue, get the big stuff out of the way, just making sure my view is accurate.
Second, why is it so common to see posts where it’s “if it is high then…”, but rarely in blogs is it ever encouraged to keep an eye on these stats incrementally long term? I’d love to see more of “when it’s higher than normal then….”, although, admittedly, that would have to come after an initial assessment and tuning to get normal to look ok.
Steve – first, yep, absolutely. It’s fairly unusual to see all items in a plan fire off at once though – usually there’s some dependencies, like we have to do an index seek first to build a list of rows, then use that list for other lookups.
Second, right, now you’re talking about long term personalized guidance. That’s where consulting and training comes in. That’s usually outside of the scope of something someone can write quickly in a blog post, but check out some of the books I recommend for performance tuning, and they cover this:
My problem with the second item is more of a personal issue. That’s a big part of what my blog focuses on doing, and I want to make sure there’s nothing that’s going to come back to haunt me with any of it. For example with this one, my post on Wait Stats does just that.
However, hearing you talking here and Jonathan Kehayias and Ted Krueger talking in their Free Accidental DBA Book left me with the impression that there’s a very good reason not to get into these details in blogs.
I’m perfectly fine with the big monitoring companies not liking me throwing this stuff out there in easy-to-use formats. Not everyone has to like me.
Couldn’t changing these settings reduce wait times, but also degrade query performance?
What is the best way to measure query performance before and after these changes? Just run Profiler? Is there some reasonable way to quantify the performance effects of these changes?
Matt – great question. There’s so many ways to measure query performance that I can’t really do justice to them in a blog post comment. Some folks like to use Perfmon, some use monitoring tools, and others check the dynamic management views (DMVs).
And of course, my favorite way to reduce CX_Packet waits is to tune queries that need tuning!!! 🙂
I hold the belief that SharePoint needs MaxDOP = 1 not because of the workload, but rather to hide their horrible database design that deadlocks if you sneeze too loudly. 🙂
Fantastic article as ever which has really helped me understand why i see CXPACKET at the top of every server’s wait stats.
However, having read your blog & Microsoft’s Recommendations and guidelines for the “max degree of parallelism” configuration option
in SQL Server (http://support.microsoft.com/kb/2806535) I’m still not sure how to correctly set MAXDOP for the following server config
and i was hoping that you could clear up my confusion.
I have a server with Physical CPU Count = 2, Hyperthread Ratio = 12, Logical CPU Count = 24.
From my reading i understand that MAXDOP is related to the number of physical servers and from the above link given that it states
“For servers that have NUMA configured and hyperthreading enabled, the MAXDOP value should not exceed number of physical processors per NUMA
node.” my understanding is that i should be setting MAXDOP to 2 i.e. the number of physical CPUS.
The server is configured to have 4 NUMA nodes and there are 6 logical CPUs per NUMA node so i’d be very grateful if you could explain what
my MAXDOP setting should be.
George – at first glance, it sounds like you’ve got AMD CPUs. Do you know the exact model number on those?
Aha, Your spidey sense is spot on as ever!
The server in question has 2 x AMD Opteron(TM) Processor 6234 which wikipedia tells me have 12 cores each.
CXPACKET waits up in 40% range for a server that’s been up for 8 months, with MAXDOP and Cost Threshold for Parallelism at default values.
George – yeah, I was afraid of that. Things get a lot trickier with this family of processors – each individual socket has two NUMA nodes, each of which has 6 cores in it. Per Microsoft’s KB article, you would want to set MAXDOP to 6 (the number of cores in one NUMA node.)
That explanation has really helped my understanding of the area.
And thanks for taking the time to reply on a sunday.
Hello again Brent,
On review i left out the very important piece of information that the Server is running SQL Server 2008 Standard edition and therefore from http://technet.microsoft.com/en-us/library/ms181007(v=SQL.105).aspx it seems that the max MAXDOP value i can use is 4 instead of 6 as per the number of cores in one NUMA node?
George – I’ve kinda hit the limit on what I can do here for free. For further questions, consider posting on a QA forum like http://DBA.StackExchange.com. Thanks!
From my experience hyperthreading should be turned off for SQL server, as it causes cache thrashing. I have always turned it off and have significant improvements afterwards, not in order of percent but magnitude. Talking about CXPACKET should start after turning it off. Unfortunately, all servers that run on Intel processors and have the feature, have hypterthreading enabled in the BIOS on delivery.
Peter – when you say “cache thrashing,” can you talk more about the symptoms you saw, and how you measured them?
So is Parallelism always bad ?? in which we should force maxdop per that query ?
Parallelism is great – if you have a query big enough that it will benefit from using multiple logical processors.
You should only use maxdop hints per query when you thoroughly understand how parallelism works, when you’ve successfully managed it at the server level, and you can prove that there’s no way to fix the problem at the server level. Hints should be the rare exception, not the norm.
Understood and thanks, will do!
The highest you normally want to set MaxDOP to is the number of cores in a NUMA node so that all processing can take place in a single node and it’s also a good number to just keep us under control. There are cases in large BI settings where you want this to be higher, and the higher the transaction level of OLTP the lower you should typically have MaxDOP.
The number 6 for your type of processor is a starting point, nothing more. 4 is also a good starting point, and you may want to drop it from there if you have any issues. Personally, I like to start at the higher of 4 or the cores in the NUMA node and tweak it from there.
Also, Brent is right, this is about as far as you can go without looking at your systems. Some odd deadlock situations, high number of transactions, wait stats, etc. will change my answer in the long run on a server-by-server basis.
Great post Brent !
in many cases, CXPACKET can also be reduced with a faster storage or faster CPUs (but that should be done only when you are sure the queries are optimal and server setting are good for the workload).
If 7 students are faster (say 1 minute) and 1 student is very slow (say 10 minutes) then 9 students are waiting for a very long time.
If you can reduce the time it takes the slower student to complete his task (say SSD or faster CPUs) you obviously cut CXPACKET significantly.
Eyal – I’m going to challenge you on that. Don’t measure bottlenecks like CPU or storage using CXPACKET – instead, look at the waits of those slower students, and focus on improving those wait types.
Thanks Brent. You don’t challenge me on this, we are in agreement.
My point was that focusing on the slower students and improving their waits will also improve CXPACKET as a side affect.
so, if you think of what can you do to reduce CXPACKET (not that CXPACKET is a problem by itself that needs to be improved but more on a theoretical question) ? there were many answers above – like MAXDOP, Cost Threshold for Parallelism, missing indexes, rewrite query text etc. My point was that fixing the slower students (query tuning, server setting or as I gave an example CPU/Storage) will have similar affect.
Could a very high wait time ‘CXPACKET’ also be caused by a faulty core? Or maybe a faulty bank on that NUMA
Nils – can you explain a little more about what you mean? Can you give a scenario where a CPU core would be broken somehow, and it wouldn’t show up in the system error log (or that the machine would still boot correctly)? If you’ve got a defective processor, hopefully your sysadmins will catch that in the system event logs, Dell OpenManage, HP System Insight Manager, etc.
Just a general question if I have say a 6 core server what is the difference between having the MAXDOP set to 6 versus the default 0?
None that I know of offhand unless you add more cores later down the road. Some folks will add processors and forget to revisit MAXDOP.
but, what do you do, when developers at your company are using horrible things like Entity Framework or other ORMs? you can’t even see the query… you can’t tune it like you would do with an SP.
Well, I don’t think EF is horrible, but let’s set that aside for a second. You can see the queries using tools like Profiler and sp_BlitzCache, and you can do index tuning too. I’d start with Jeremiah’s new training course on execution plans.
Thanks for the great post.
One simple/stupid question, if I understand it correctly, the default value for “Cost Threshold for Parallelism”, 5 seconds, should be enough (it seems large enough). As the performance is pretty bad and unacceptable if it takes more than 5 seconds to run, especially for publicly facing website.
Why do you suggest to give it a large number?
Eric – the answer lies in your question. What makes you think Cost Threshold for Parallelism is measured in seconds?
How do you determine when CXPACKET waits are ‘high’? Is there a rule-of-thumb piece of guidance of what you would want your CXPACKET waits to be?
Per a B.O.U. script to check out and monitor my waits, i see that since my last set of stats reset, the time spent on CXPACKET waits accounts for about 50% of all waits. Also, per a B.O.U. i took a few incremental samples of my waits and i see that in a 60 sec sample, my CXPACKET wait time is about 10 minutes and the count of waits is 247,362 and my average wait time is 2.3 ms. Is this high? This particular server is a beast and is our production machine. We have dual intel 10-core CPUs with hyper-threading. I set the MAXDOP to 10 (will likely drop this to 8 per rule-of-thumb guidelines and Brent’s suggestion), and left the cost threshold for parallelism at the default of 5 (I am currently trying to make the case that we need to bump this up to 50 per Brent’s video above).
one question that was posed to me when i introduced the idea of bumping the cost threshold of parallelism to 50, was ‘how do we know if it is a success?’ and i couldn’t answer the question. I *thought* that i recalled that when you are trying to get CXPACKET waits dialed in, you should know that you’ll never get rid of them as long as you’re running parallel queries, but in general, you should try to balance CXPACKET waits with (some other wait that i can’t remember). does this ring a bell at all?
Michael – I’d step out a little farther and ask, “Are users complaining about query performance on this server?” If they are, then focus on it – but otherwise, I wouldn’t focus on them because you’re always going to have 100% waits. Don’t go solely based on percentages alone: https://www.brentozar.com/archive/2014/03/t-sql-tuesday-stop-tuning-wait-stats-percentages/
Thank you for the fast response! To answer your question, no, there are no user complaints about SQL performance right now. Also, I get that percentages alone are not a good indicator of actual wait time since they are just showing what portion of the total time waiting is spent on a particular wait type (also referenced in your steak dinner analogy above). So I was hoping that there was a good measure for knowing whether or not CXPACKET waits are or are becoming an issue. In our environment, I took a 1-minute sample and saw that the CXPACKET waits to real-time ratio was 10-to-1 which seemed pretty high.
Is there just not a measure that is as good as the knock on the office door asking about SQL performance problems to know whether or not I actually have an issue that should be addressed?
Michael – yeah, if users aren’t complaining, I’d focus on the servers that the users are complaining about first. There are good measures, but I’m all about getting you the easiest, fastest way to happy end users. Start by asking them what they’re unhappy about.
Suuuuuuuperb article and very well explained. Good job Brent! Two thumbs up for you!
Munshi – thanks!
Great Post Brent
I have question for you
I know it depends 🙂
but generally speaking from your experience what is the acceptable range of below settings in OLTP/OLAP environments?
Cost Threshold for Parallelism
Maximum Degree of Parallelism
Alex – thanks! Check out the post above, and the answers are cleverly hidden in the post text itself. Enjoy!
Just a great article.
Clear, but detailed, and well explained.
How to set MAXDOP in SQL Server that is hosted on an Amazon EC2 instance?
Al – same as you would a regular SQL Server.
In DW/OLAP Workload,do you recommend Enable/Disable Hyper-Threading?
Thanks in advance
Alex – it truly depends on each server’s workload. If you don’t have a load testing environment that you can do A/B testing with, I’d just default to having it on.
Is this related to “Waiting Tasks” or “Waiting Time” on my SQLServer ? or also to “Page Life” ???
Sina – not directly, no.
We have a lot of CXPACKET waits showing up in DB but we never see CPU hit 100% utilization in Task Manager. How come?
Kim – reread that blog post. CXPACKET doesn’t mean you’re waiting for CPU time – it means you’re coordinating CPU work across multiple students.
We currently have a client that we manage their EMR Application – they have been having poor performance issues for about 6 months now – the vendor blames it on the infrastructure and we blame it on the code while the client is stuck in the middle. The vendor has setup a data warehousing DB and I can pull data from it and see SQL Server Wait times. The client complains of DB disconnects in the morning and very laggy/slow in the afternoon (2:30 pm) – just changing screens in the App. The highest wait times in the morning time is PageIOLATCH_SH (Completed waits – 490,160 Wait time 182.849) and in the afternoon it is CPU ( CPU (Signal Wait) Completed Waits 5,131,083 Wait time 702.568) – the vendor will no longer help us. Do you have any suggestions on how to lower the wait time and get better performance?
As per yours & Microsoft recommendation, I have created 8 secondary tempdb file with fixed size [20 GB] and also changed the Max degree of parallelism to 8. But still I am facing CXPACKET issue, also my server went slow and I am getting ACTIVE TRANSACTION error. I have posted my question on Stack exchange also. I request you to please assist me and let me know if I have to roll back my changes. 🙁
My other questions, related to that
I hope to hear from you.
Keith – CXPACKET issues don’t have anything to do with TempDB. Probably best for you to start by rereading this article really carefully, and then let’s go from there.
There should be really 2 differente wait stats about parallelism, one the innocent or normally ocurring wait from the coordinator thread, and another, the really harmful wait from uneven distribution of work of the worker threads (when some threads are idle waiting for other thread to finish their work)
Any way to distinguish between those two ? XE maybe?
Gustavo – that would be nice, but I don’t think it’s possible.
Given the Moby Dick example in the main text, it is impossible for all students to finish at the exact same time. Some will finish early, and have to wait for the others to catch up – those students waiting is the equivalent of CXPACKET waits.
Problem is, if half the students finish in an hour and the others keep working the rest of the day, the wait time is too long, and the distribution should have been better. But apart from duration, there is no difference between waiting and waiting. And though we probably can all agree on the extreme examples of long and short waits, there is a grey area in the middle where it’s impossible to draw a line.
Is a 1 millisecond wait too long? Does the answer to that depend on the total execution time of the query? Is a single node waiting 5 ms better or worse than 17 nodes each waiting 0.8 ms?
Assessing whether CXPACKETS waits are high because there is a lot of parallelism and we all rejoioce, or *too* high because there is skew – that is for now still a task only a skilled DBA can accomplish.
As far as I see it, this is somewhat similar to the difference between good/normal page splits (at either end) and bad page splits (in between)
You are worried about one and not the other.
CXPACKET Waits in the coordinator thread should be accounted somehow separately.
Those don’t tell much (just that I have parallelism, which of course I know and I WANT for some queries, after tweaking cost threshold for parallelism to a more reasonable value for recent hardware).
What would be more useful is the worker threads waiting time, or some kind of ratio between that and the coordinator wait time.
worker threads waiting time tell me there is imbalance in the work, that’s the real problem (well, assuming a reasonable good parallel plan)
Of course later SQL version have improved, nowadays there are repartition and gather streams operators laying around trying to rebalance row counts between worker threads.
Excellent presentation as always.
In the video, I believe you indicated about it being best that CXPacket waits does not go much beyond 50%. I did some waits checking on one of my servers and CxPacket was 80% – not heard any complaints from slowness but still checking.
When determining that 50%, I do not count the below waits. Would you agree that excluding these is correct when determine percentage on the server for this specific scenario?
MG – check out https://www.brentozar.com/askbrent to get sp_AskBrent, which excludes the ignorable wait types.
Watching more of your videos before downloading sp_askbrent. So, on your CXPacket video you mentioned that one really does not want CXPakcet to be over 50%. When I watch your video on sp_askbrent, you show the waitstat columns in the output – is the column I should look at being over 50% the Percent Signal Waits? Or is the some other column in sp_askbrent that should help give me more information when tuning my Cost Threshold ?
MG – I didn’t mention signal waits, so can you elaborate a little more on where you got that formula from?
correct. to my knowledge, you did not mention Percent Signal Waits. It is one of the columns being displayed in askbrent. My question here is that in cxpacket video you mention you really do not want you cxpacket waits to be over 50%. What measure would I use from AskBrent to determine that?. Percent Signal Waits is the only thing displayed in your sp with the word percent for the waits section.
I had a server with cost threshold at 30 and MAXDOP 8 (it has 1 CPU with 8 cores and HT).
The CXPACKET value was at 49% and SOS_SCHEDULER_YIELD at 12%.
I changed the threshold to 40 and CXPACKET dropped to 10% and SOS_SCHEDULER_YIELD is at 3%.
Looking at CXPACKET I’d assume the SQL isn’t waiting for processes to gather the information (parallelism) or that the threshold is so high SQL Server doesn’t launch parallel operations as often…
The SOS_SCHEDULER_YIELD means the CPUs are “working” better, no as many pressure, since there’s less, probably, parallel operations….
As my assumptions right? And what the best scenario?! The server responds very fast in both situations…
If the server isn’t having a problem, I’d leave it alone and focus on where users are complaining.
I’ll keep monitoring the server and see how it behaves in the next days.
I’ll check an integration process that runs every 30min and generates a XML file with the data to integrate with other services. Probably with a specific user and control with resource govenor its resources and limit them.
Once again thanks
Using Glenn Berry’s query, I have:
Logical CUP -16
Scheduler count – 16
Hyper thread ratio- 4
Physical CPU- 4
Virtual Machine type – HYPERVISOR
On my machine :
MAXDOP = 0
Cost Thereshold for Parallelism = 50
When ran Glenn’s custom query:
Wait type shows – 59%.
Signal_S – 29342
Can you please suggest what MAXDOP should I have?
I have one SQL instance setup with MAXDOP as 1 and assuming parallelism will not be used. But I am seeing the CXPACKET as the top wait type on this instance. And the developers said they are not using the MAXDOP query hint either. So please can help me understand.
WaitType Wait_Sec Resource_Sec Signal_Sec Wait Count Wait Percentage
CXPACKET 378412.5 188670.27 189742.2 178978957 71.69
IO_COMPLETION 71034.26 70756.82 277.45 71650181 13.46
_YIELD 53182.46 107.41 53075.05 91021289 10.08
LATCH_EX 9192.88 6848.03 2344.85 6394916 1.74
LCK_M_S 3401.76 3401.68 0.08 505 0.64
COMPLETION 2440.94 2440.89 0.05 488 0.46
BACKUPBUFFER 2250.48 2243.66 6.82 223375 0.43
PAGEIOLATCH_SH 2061.32 2058.37 2.95 98927 0.39
Thank you in advance!!
Rani – if the change was recent, you may be looking at a historical view of accumulated waits. Try using sp_AskBrent to get a 30-60 second sample and see if it’s a current wait, as opposed to a historical one.
My server had 60% CXPACKET waits, according to Glen Berry diagnostics queries, and cost threshold was at 50.
The second wait had 8%… The average wait time was 0.068ms.
I increased it to 80 and clear the wait stats and waited for two hours (enough for a full load on the server…).
It was at 65% but the average wait time was 0.035ms…
Increasing the cost threshold shouldn’t lower the CXPACKET waits (it supposed to be the value after which SQL starts parallelism ..)?
MAXDOP is 8. The server has a sock with 8 cores and 16 threads (Xeon D1540).
Pedro – unfortunately, measuring by percentages simply doesn’t work. Here’s how you measure waits:
Hope that helps!
Thanks, I’ll look deeper into that..
This started to happen also when the reads IO when sky high… They were 5ms and suddenly went to 30ms…
They were even worst (60ms) but it was because we installed Enterprise Edition and activated page compression on some tables and indexes (we analysed MS recommendations… compare reads vs writes and all other things they recommend). The reads were high (60ms) but then we restarted SQL and they went down to 30ms but still higher than the 5ms before compression was activated…
Next step is to see what queries have high reads to try to determine were the problem can be….
We activated compression cause CPU was always 3% or less active… Since compression enables less IO but more CPU and the CPU was “free” we decide to go for it…
Why is “select top x” forcing a serial execution? Can that be changed with an option somehow? Just about every ad-hoc query that we write starts with select top x. And we get the CXPACKET wait for them so often.
SQL Server 2012
Why do you think SELECT TOP(x) forces serial execution? That is definitely not always the case!
It might happen. If X is suficiently low, and the query is such that it can quickly retrieve the X requested rows without having to do a whole lot of work (as in e.g. SELECT TOP(10) Col1, Col2 FROM SomeTable ORDER BY ClusteredIndexKey), then the plan will become so cheap that it no longer benefits from parallelism. But if the ORDER BY forces SQL Server to evaluate all rows before returning the TOP X rows, or if there is a lot of aggregation going on and it still needs to process millions of rows in order to return the requested TOP X, parallelims can and will still kick in.
The combination of parallelism and TOP does come with a higher cost overhead. To illustrate, let’s expand on Brent’s example of finding all distinct words in Moby Dick. Now you only want the alphabetic first 500 words, and you divide the work across 5 students. But it is possible that some students might happen to get only pages with words that are late in the alphabet. Worst case is that all 500 first words are in the pages of a single student, so you’ll use his/her word list and none of the words from the others, But you don’t know in advance who that student wiill be. So to make sure you get the alphabetic first 500 words from the entire book, you can give each student 20% of the pages but each of them will still have to compile a list of 500 words that you then can combine. So overall, the number of processed words will be 2500 instead of just 500 as would have been the case for a serial plan.
Thank You Hugo – that’s a good point.
CXPACKET is our top wait (60%). Is that high of a percentage normal? (Some CXPACKET is normal). I’ve observed that CXPACKET is notorious in our environment. I will follow some advice found on determining (per query) whether it is normal or not, but most likely our CTFP of default 5 is too low. Agree?
Data Warehouse environment
Never, ever go by percentages: https://www.brentozar.com/archive/2014/03/t-sql-tuesday-stop-tuning-wait-stats-percentages/
Who cares about the percentages? If your total wait time is 1 minute per day, then a wait that makes up 60% of it is still only 36 seconds per day. If your total waits is 58 seconds per minute, then even a 2% wait stat is already 1.16 seconds per minute, which is high.
How is the overall performance of your server? Are users happy with performance? Is any relevant metric (eg CPU or disk) constantly or often spiking near 100%?
All that being said – the default cost threshol for parallelism of 5 is defintely too low. Experts give various recommendations on what it should be; I’d set it to 40 for OLTP or 30 for OLAP for starters and then measure effect and tweak as needed.
Thanks Brent and Hugo.
Not sure what the wait time per minute or day is – is there a query handy to check that? (SP_Blitz is on the way to my mailbox – unkown wait type). Overall performance of queries on our server – seems slow, and almost every time I check a slow query – it is CXPACKET, almost never PAGEIOLATCH or anything like that. Sometimes RUNNABLE. CPU only has occasional spikes, disk is quite fine. Will advocate for bringing CFTP up.
Oh, forgot to add – if you are seeing a lot of CXPACKET while looking at slow queries, then dig into the execution plan and look for skew between the threads. The more skewed the distribution is, the more CXPACKET you’ll see and the less perf benefit you’ll have from parallelism
You can look at the total wait time (column WaitS in Paul’s query, reporting in seconds) and compare that to the time elapsed since the latest of the last instance restart or the last manual forced reset of wait stats. That gives you an indication.
I believe (Brent, can you comment plz?) that Brent’s script will automatically grab the wait stats twice with some delay in between and then subtract the numbers to find the wait times during that period. That is useful if you run it while a representative workload is running, but not otherwise. Looking at total seconds wait time since reset gives you a better idea of overall issues, but this can be misleading if there have been idle periods since the last reset, or if you have been running unusual workloads (e.g. end-of-year work for the finance department)
Yep! Check out sp_BlitzFirst @ExpertMode =1.
That number far exceeds the time since server was restarted, so I’m assuming that I need to divide it by the number of cores. In that case the CXPACKET total wait time is 55% of the total time.
“skew between the threads” – how to check that?
To check skew between threads, look at the actual execution plan. Rightclick on an operator that runs in parallel and select “Properties”, In the Properties window, find the “Actual number of Rows” property. Click the triangle in front of it to expand it and you will see the number of rows on each thread. Thread 0 always has 0 rows, do not worry about that. Other threads should be more or less equal. If there are large differences, then the work is not equally divided between nodes, which means that the nodes with less work have to do a lot of waiting (the CX_PACKET waits) until the other nodes catch up.
You will have to check this for every parallel operator in the plan.
Thanks. I will check the long running queries for that. CTFP is going up to 30 after Christmas, as we can’t touch the servers until then 🙂
Thanks! Changing settings, as you suggested, fixed my query wait times I was experiencing. My query would sometimes take over 30 sec. to complete. After the settings change it returns consistently under 3 sec.
Hi Brent, thank you for sharing your knowledge. The best site ever.
I hope you can settle my doubts as I’ve been reading so much about this CXPACKET business I think walked myself in a circle back to total confusion and doubt.
Following your recommendations about not measuring waits by percentages…
I’ve picked a running query and then executed twice with 1-minute delay in between:
“SELECT * FROM sys.dm_exec_session_wait_stats WHERE session_id = 85”
Then I subtracted CXPACKET_Wait_2 from CXPACKET_Wait_1: 99930912-99133457=797455… (Approx. 2 hours running total, and CXCONSUMER is not included in this, DOP 16)
Am I to understand that within 60 seconds, the query spent 797 seconds waiting on CXPACKET, 13 seconds for each second of execution?!!?
What is this madness? Or am I so, so, so wrong about measuring this thing?
Ed – sure, that’s exactly the kind of thing we cover in our Mastering Query Tuning and Mastering Server Tuning classes.
Hey Brent, what happened to the video, it’s unavailable. Is it outdated?
Yes, just the blog post is available.
you are awesome. Thanks a lot Brent!!
You made my day,
OMG! “CXPACKET” literally means, “Now class exchange your packets”!!
The rarebreminder that SQL Server is built by parents
what about VMware virtual machine vCPU consideration? In my case VM total 8 vCPU and some VM have 16 vCPU so how do calculate MAXTOP setting?
By clicking on the link in the article where Microsoft talks you through it. Cheers!