Max Worker Threads: Don’t Touch That

SQL Server
57 Comments

More isn’t faster

I’ve had people give me all sorts of janky reasons for changing Max Worker Threads. I’ve even had people tell me that someone from Microsoft told them to do it.

The thing is, all changing that setting does is help you not run out of worker threads. It doesn’t make queries run any better or faster. In fact, under load, performance will most likely be downgraded to Absolute Rubbish© either way.

What’s worse? Running out of worker threads and queries having to wait for them to free up, or having way more queries trying to run on the same CPUs? Six of one, half dozen of punching yourself squarely in the nose.

In the classroom

Brent is fond of teaching people about CXPACKET by putting you in a classroom, and having the teacher hand out work to students. I’m going to stick with that basic concept.

Your CPU is the teacher, the threads are students. With 10 kids in a classroom, a teacher may have a pretty easy time answering questions, grading tests and homework, planning lessons, etc. You know, teacher stuff.

If you add 10 kids, the teacher will still have to do all that stuff, but now poor ol’ teach is bringing work home nights and weekends, and classes are going off-schedule because more kids have more questions, three of them wrote disruptive reports, one of them keeps asking the same question 1000 times a second, five of them started blocking the chalkboard from each other, and another peed their pants waiting for the teacher to call on them.

Add 10 more kids, and, well… Ever see that movie The Principal?

The lesson

Adding more kids to your classrooms doesn’t make your school any faster. At some point, you need more teachers, too.

Thanks for reading!

Previous Post
Let’s Corrupt a SQL Server Database Together, Part 1: Clustered Indexes
Next Post
I’m just saying it’s valid T-SQL syntax, that’s all.

57 Comments. Leave new

  • DB Mirroring uses 2 threads per DB. Anything above 250 DBs will exhaust default size of the pool of worker threads.

    AGs thread usage count is based on max worker threads so no worries there.

    Reply
    • Yeah, until your AGs stop syncing because you run out of worker threads?

      Reply
    • In that Mirror case, it is usefull change the max worker threads?

      I have the same concern to create a mirror with a lot of databases.

      AGs use the threads bether, but can have the same problem.

      Reply
      • No. Get enough CPUs to support your workload.

        Reply
        • Nice.

          Thanks for the post. I already searched about that mirror problem and don’t find anything about max worker threads.

          Reply
        • Stack Overflow example: we run at about 10% CPU on a dual 12 core setup replicating 367 databases (many low activity) to 2 replicas. We have to increase max worker threads to prevent exhaustion.

          Are you saying we should buy more cores and pay more licensing rather than changing the setting? 😉 I’m totally with you on the 99.99% of the time it’s a bad idea, but I very much disagree it’s absolute.

          Reply
          • One gotcha though: had a client in just about the same scenario. Replication broke right before an index rebuild job. Indexes rebuild across hundreds of databases, tons of activity queued up to send to the offline replicas. The DBAs came in, fixed replication, and whammo, instant worker thread exhaustion and the primary was unreachable, hahaha.

            I got lucky and happened to be onsite at the client when it happened, and they were a little disturbed at how excited I was to show them what was going on. “Check this out! Let’s open a connection to the DAC, and lemme show you how you’re hitting the limits via sys.dm_os_workers. We’ll temporarily increase max worker threads, and watch how this reacts – isn’t this cool?!? Nobody ever gets to hit this limit and recover quickly. Watch what happens next…”

            Client: “Uh, maybe a little less enthusiasm here. We’re down.”

          • Since you wrote your own monitoring tool to figure stuff like this out, here’s what we’ll do: If anyone wants to change this setting, they should email you their resume. If you’d hire them, they can change it. I don’t care what for, particularly. Lawn mower or dog walker or fry cook should suffice.

          • Imagine a rickety old schoolhouse. The school marm hands out the assignments to 20 kids. Oops. 5 kids just fell through the floor and into the basement. It takes them 15 seconds or longer to get back up stairs, dust themselves off, and get back to work. This is happening constantly. The janitor is telling the principal that the kids are falling through the floor because there aren’t enough kids. So yes, the “principal” is bringing in 20 more kids just to prove a point. Stupid janitor….got his damn head in the cloud…

          • We have a similar setup and also ran into exhaustion. 8 12 cores boxes with 300+ databases each in a synchronous AG with 1 replica. We decided on bumping up to 200 worker threads per core and our exhaustion issues went away. We did this understanding there could be performance hits from the overhead of managing a higher number of workers, but the performance impact has been minimal.

            We did do some testing using HammerDB and found a 3%-4% performance penalty when we got to extremely high max worker thread counts (7000 if I recall correctly).

            Like Nick, I agree it’s not a setting everyone should go changing on a whim, but there are cases where it can help.

          • Erik Darling
            March 29, 2017 6:35 am

            You guys are in a different spot than the people I see. We’re talking 2-4 core VMs with MWT cranked up to 3000.

  • In our case we have lots of service broker queues, and WAITFOR RECEIVE for the queues consumes worker threads even when just idling. When the queues and receivers were gradually increased over time it did eat up he default max worker thread of 960 for our setup, the symptom is that new queries has to wait, even when the server is not at all heavily loaded. The time spent in wait does not show up in the profiler traces, and we did not have monitoring of active threads, so it was kind of strange to see queries taking long time from the application point of view, before we identified the root cause.

    Reply
  • If mirror/AG + servicebroker threads are eating up significant amout of threads then it is smart to increase it. I see cpu load still less than 40% and worker queue almost empty during extreamly heavy load when more threads are used than it was in default configuration. It is not fun when mirroring in combination with service broker runs out of threads and synchronous mirroring process is blocking or blocked by sa processes.

    Reply
  • Fantastic analogy! Great post.

    Reply
  • Hi Erik. What would you call one teacher one student – MAXDOP 1 ?

    Reply
  • Recently saw this in a health-care environment where Citrix was running on an AG, and electronic-health-record clients were connecting to the system via citrix to other servers. We had hit the ceiling on max worker threads and had dozens of processes waiting to be assigned spids. Of course this resulted in a very large group of people talking on the phone trying to find out what was happening. On the advice of the EHR vendor we raised the worker thread count by a fair amount, and this cleared things up within about 10 minutes. It allows for all those momentary connections via citrix that are part of our system to get in and get out again quickly.

    Reply
  • What would be a reasonable method to monitor for worker thread exhaustion?

    I’ve had a few cases where we’ve had spikes in workload that resulted in hitting the upper limit, and have always had to deal with the symptoms without uncovering the cause. With these instances being so rare, I’m more interested in catching the bad actor than a knee-jerk increase of the limit.

    Reply
  • I use this that probably came from one of the folks like Grant Fritchey, or Jonathan Kehayias , so I am citing those as references to appease the plagiarism gods.

    –Analyze all requests which are currently waiting for a free worker thread
    SELECT * FROM sys.dm_os_waiting_tasks
    WHERE wait_type = ‘THREADPOOL’
    GO
    — these will have no session_id because they are waiting for a thread to be available

    Reply
    • If there are queries waiting to run because they can’t get a thread, how is this query going to run?

      Reply
      • David Lathrop
        March 27, 2019 1:07 pm

        I’m not sure if this is how SS really works, but I suspect one way that query can run and report THREADPOOL waits is trying to execute a parallelized query. I’ve seem some of these taking 35 threads on a MAXDOP=6 setting. It starts with the SQL session having a single thread, then deciding to execute the query in parallel with a certain DOP, then calculating number of threads needed. Now the query execution start-up does a call to the SQL OS scheduler asking for 34 threads to set up the query execution. If the thread pool only has 10 unused threads, it probably puts the original thread in a THREADPOOL wait until 34 unused threads are available. It may allow single (or small) thread requests to succeed while blocking other large thread count requests. Eventually, enough activity should free up threads so our big query can get started.
        If I’m right about that, seeing THREADPOOL waits in sys.dm_os_waiting_tasks would indicate you need to look at some (possibly toxic) parallel queries.

        Reply
    • This is a bit too reactive for what I’m trying to achieve.

      Right now, I’m tempted to just poll worker count through a monitoring tool and send out alerts when it hits some arbitrary threshold like 95% of max. Then I could actually do some investigation while something’s happening.

      This seems like a rather kludge solution though, when what I’m really trying to figure out is what’s causing the workload spikes. But, I’ve had a client request a solution after a recent production incident.

      Reply
  • That is an interesting question, but it seems to run fine. I have seen this query return dozens of records with no assigned spid. If you have an alternative, I would love to hear about it, Erik.

    Reply
    • That’s the fun thing about THREADPOOL. Your query could get lucky and hop on a thread, or it could sit and wait for a bit, and then run. If there are more queries waiting on THREADPOOL behind it, you could see those. If your query is at the end of an issue, you won’t see anything.

      The only way to get things reliably returned is to use the Remote DAC. Monitoring tools will generally show blank spots (or choppy graphs) during THREADPOOL issues, but you probably don’t want those connecting via the DAC — you might need that for a real problem.

      Gavin is probably on the right track to warn when threads get close to exhausted and collect running queries, but you could still miss some important information.

      Reply
      • What will be the difference if we use Remote DAC for threadpool.

        Reply
        • Assuming you’re referring to Threadpool monitoring, the Remote DAC will be available during worker thread exhaustion due to having a dedicated thread, so the monitoring will be reliable. Conversely a standard connection may or may not succeed for the reasons Erik outlined, this would result in gaps in monitoring.

          There’s only a single DAC though, so you don’t want it occupied for monitoring when you may need it for something important, like an unresponsive server.

          Reply
          • Michel Zehnder
            July 8, 2017 1:45 am

            Good point… I wonder if it is worth filing a connect item to be able to configure more than 1 DAC connection? That would certainly be useful in a few scenarios.. (and sacrificing let’s say 4 threads out of 900-something does not seem like a big deal?)

          • My fear with multiple DAC connections is that folks like monitoring vendors would start using it, and then we’d be right back where we started – unable to access the server when emergencies strike. (sigh)

  • Hi I m having a server with (sqlserver 2012) windows 2012 4 8 core processor with 16 logical processor on which 8 core. 4 reporting servers for load balancing There is an issue when ever we run a report for 100 branches the report gets run for around 40 branches and then it gets halt (means no connection on database) for around 15 to 30 mins and then again it gets connected to the database the generates reports for 30 or 50 branches and then it halts for 10 to 15 mins. procedure is begin called from the reporting server which generate pdf file for all the branches. Never seen this issue max worker thread is set to default.Need to run the report for 6000 branches.

    Reply
  • Michel Zehnder
    July 8, 2017 1:38 am

    So, I’m wondering… how much does CPU usage actually count in this discussion?
    Let’s say WTP exhaustion, and CPU below 60%, would you increase worker threads? Is there any guideline here that you are comfortable sharing, or is it really “e-mail StackOverflow/Nick and see if he accepts your resume”? 🙂

    Reply
    • Michel Zehnder
      July 8, 2017 1:39 am

      Oh, and I’m not talking about your 2-4 CPU VM… let’s start at at least 12 dedicated CPU cores?

      Reply
    • Erik Darling
      July 8, 2017 7:46 am

      That’s the cute thing about THREADPOOL. A lot of the times when it shows up, CPU usage looks low. Common scenario is a blocking chain with parallel queries. Nothing is actually doing any work, but it’s all assigned threads.

      Reply
  • jack whittaker
    August 1, 2019 4:37 am

    Reading this thread, I’m tempted to conclude that here is the place to throw hardware at the problem

    Reply
    • Just keep in mind that Standard Edition licensing is $2K USD per core, and Enterprise Edition is $7K USD per core.

      Reply
    • The only times I’ve ever had to increase worker threads is with an AG in a server that never saw its CPU above 10% and the AG was falling down due to worker thread exhaustion, and one other time when a bunch of processing workers had gone into a weird zombie-apocalypse state. I jacked the threads up about 5x normal to get the SQL Server to be responsive. Then we worked on killing all of the zombies. Then I put it back down where it belonged. I’ve actually never had to use thread exhaustion as a rationale to increase CPU core count (that I recall). I’ve either been able to fix what was causing the high cycles, or get more CPU because it was actually high and workload legitimately increased enough to warrant it.

      Reply
  • Hi Brent,

    Found this post today while investigating a memory shortage issue for one of our SQL Server servers. How would you address physical memory issues if worker threads are set by default to zero?

    Enjoyed your recent SQL Performance seminar in Sacramento last week. Hope you see my question.

    Cheers,
    Ben

    Reply
    • Ben – not sure what you mean. What specific symptoms were you investigating?

      Reply
      • Hi Brent- we have daily memory spikes on the server that exhaust 96% of physical RAM on the server. Our default setting for max worker threads are set to zero and I read on the Microsoft website that changing them to a higher value may help solve the memory RAM issue.

        https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option?view=sql-server-2017

        It says that “The default setting is best for most systems. However, depending on your system configuration, setting max worker threads to a specific value sometimes improves performance.”

        So I ran DBCC MEMORYSTATUS on the actual SQL Server instance

        Available Physical Memory 1058607104
        Available Virtual Memory 8778437885952
        Available Paging File 10194780160
        Working Set 13160472576
        Percent of Committed Memory in WS 98
        Page Faults 1666800691
        System physical memory high 1
        System physical memory low 0

        and some queries against the sys.dm_os_process_memory it shows that sql_physical_memory_in_use_MB is 12563 (12.5GB) on the host which has total allocated memory of 16GB.

        My question for you is that I have to educate our sysadmin team that bumping up the max workers thread is not going to solve the memory issue with SQL Server.

        Thoughts?

        Reply
  • Stephen Zemlicka
    September 10, 2019 12:20 pm

    I know this is an old thread but I came across this while working on an issue. We use SQL in conjunction with Autodesk Plant3D for multi-user project access. Each DB is about 2-10MB and we have a max number of users of 3 (though mostly we only have 2 in there). The server averages about 1-5% cpu utilization with a dozen or so spikes around 20% for a few seconds throughout the day. Our Veeam backups started failing at around 250DBs (we’re now at 287 databases). The response from Veeam was that there was not enough worker threads for a snapshot citing the following article:
    https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option?view=sql-server-2017
    Each database being snapshotted requires 5 threads so by the numbers we would need something like a 64 core server just to make the backups work (when 4 cores is more than enough for the actual workload). Is this a case when increasing the worker thread count is prudent instead of dumping a bunch of money into replacing a mostly idle server with something huge? Microsoft does recommend potentially increasing the worker thread count in this situation:
    https://support.microsoft.com/en-us/help/943471/error-messages-when-you-create-a-snapshot-backup-of-many-databases-at
    Thoughts?

    Reply
    • Hi! For free answers, hit a QA site like https://DBA.stackexchange.com.

      Reply
      • Fair enough. I just figured this applies to not just one but both sentences in your first paragraph. I wasn’t necessarily looking for advice as much as a discussion on this relevant topic. I’m sorry you didn’t see it the same way. But thank you anyway for taking the time to link a good resource for others that come across this.

        Reply
        • Yeah, it’s just tough to draw the line on where I can do free tech support, versus teaching folks to fish. I’m kinda damned if I do, damned if I don’t – if I do free tech support here and disagree with Veeam, then we’re opening a big ol’ can o’ worms that I won’t be able to follow up on for free. The volunteers over at https://dba.stackexchange.com might be able to, though. Hope that helps!

          Reply
          • Stephen Zemlicka
            September 10, 2019 1:08 pm

            I can totally respect that and I hope anyone else reading this does too. Thank you for the article and for taking the time to respond.

  • You know what the funniest thing is? I open a case with Microsoft about logon triggers that causes SQL Server to run out of worker threads (even a bogus logon triggers that just does SELECT 1). Their answer is : go set the max worker thread higher (as it is set to 0 right now).

    Sigh I get the feeling they just want to get rid of the bug… I don’t really want to play with this in production

    Reply
  • this article is about worthless… CPU % is low, but SQL in running out of worker threads, how about a solution? Have one? if not, what are you posting for, to make yourself feel important?

    Reply
    • GMan – sure, let’s step back a little. What queries are using up all those worker threads, and why aren’t they making progress (without using CPU)? What are they waiting on? Troubleshoot that, and you’ll be able to make queries go faster.

      After all, that’s what performance tuning is all about – faster queries, happier end users.

      Hope that helps!

      Reply
  • Rajeev Chaudhary
    August 2, 2022 5:49 pm

    We have scenario where more than 400 databases are part of an AG but CPU utilization is less than 20% throughout the day. The server has 8 CPUs. It is just that the total number of worker threads available is not sufficient to handle AG sync up between one primary and two secondary replicas. We keep running into “out of worker threads” issue and DBs stop syncing. What is the harm in increasing worker threads in this scenario?
    By the way, I have increased work threads in couple of such cases and it has been few months and have not seen any issues yet.

    Reply
    • For personal advice on your particular situation, click Consulting at the top of the site. Cheers!

      Reply
    • I had a similar situation 5 years ago. There is basically nothing I did to solve it except jack up the worker threads, and then run away from AGs as fast as we could. Your best bet is to not keep so many databases on an AG. Ultimately, if you get into a situation where something else has decided those worker threads are fair game, you risk possibly serious issues with performance. You are running with scissors.

      Reply
  • Hugo Van den Berg
    March 16, 2023 12:36 pm

    It actually depends very much on the type of workload you are running. If you have a lot of small SQL statement that have other types of waits increasing the worker threads does help. The signs you’re looking for is underutilized CPU and high wait for the worker. If you see these determine if you have a lot of SQL statements that are waiting on other things like locks, I/O or network.

    Reply
    • If I heard you right, you’re saying that queries are already waiting on locks or IO, adding more worker threads will improve performance?

      That has not been my experience.

      If you have a blocking or IO bottleneck, piling more queries into the mix – without fixing the blocking or IO – will likely make users angrier, not happier. That’s been my experience, anyway.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.