Predicting When SQL Server Will Hit the CPU Wall

Wait Stats
12 Comments

Most of the time, we use wait stats to identify SQL Server’s bottleneck and focus on what to tune.

However, let’s say you’ve got a really well-tuned SQL Server where:

  • Queries finish in milliseconds (think 5-20 milliseconds, super quick)
  • Everything your queries need is cached in memory
  • There are no blocking issues
  • The workload doesn’t do a lot of writes, and the few writes that it does, hit the transaction log extremely quickly because you’ve got great storage

In that situation, you may not see any significant wait stats, but as your workload grows, your CPU usage will gradually increase as well. You won’t see SOS_SCHEDULER_YIELD waits because queries finish so quickly, and you’ve got CPU headroom.

Right up until you don’t.

In this scenario, if your workload gradually increases, you’ll go from:

  1. Seeing nearly zero wait stats, even though CPU might be 50-60-70%
  2. Suddenly seeing SOS_SCHEDULER_YIELD waits as queries start to collide, waiting on CPU (even though CPU isn’t even 90% yet)
  3. To suddenly having the server go unresponsive, possibly even in a matter of seconds after hitting #2

There is an extremely fine line between #1, #2, and #3 because SQL Server’s CPU scheduling is nowhere near as glamorous as you might imagine. I explain how it works in the SOS_SCHEDULER_YIELD module of my Mastering Server Tuning class, and I won’t rehash that here due to its complexity. The short story is that as queries come in and execute, they’re scheduled on CPU cores regardless of that core’s workload, and they are not moved to other cores even if they’re on an overloaded core. You can watch some query tasks piled up waiting on CPU time even when there are other CPU cores sitting around nearly idle.

One way to see the problem as it approaches is to watch CPU % at the operating system level. That metric is really misleading, though, because it gets skewed by other noisy neighbors at the VM level, other applications running on the same OS, and it’s hard to capture at the same granularity that you capture other SQL Server metrics. You can get periodically sampled data from the ring buffers DMV, but that can be pretty inaccurate too.

My preferred way to see it coming is to watch total_cpu_usage_ms in sys.dm_os_schedulers. We expose that as “Total Thread Time (Seconds)” in sp_BlitzFirst, like this:

Thread time

In that screen shot, in a 60-second sample, SQL Server burned 1,109 seconds worth of CPU time, or about 18.5 minutes.

To keep things simple, let’s say our SQL Server has 40 CPU cores. In a 60-second time span, our server could offer up to 2,400 seconds worth of CPU time. (40 cores, each cranking out 60 seconds worth of CPU time, that’s 40 x 60 = 2,400.) If we’re currently burning 1,109 seconds of CPU time per minute, we’re doing fine – we’re not even using 50% of our available CPU time yet.

The higher that thread time number goes, though, the more danger we’re in – and the more we have to start watching sys.dm_os_schedulers at the individual core level. Once SQL Server starts assigning work to cores that are already maxed out, and once a core’s total_cpu_usage_ms starts to approach 1,000 milliseconds of CPU burned every second, consistently over the span of several seconds, the more we’re approaching danger. In a matter of seconds, your server will start to see SOS_SCHEDULER_YIELD waits, and if the workload doesn’t subside or take a break, the problem will get exponentially worse as more queries start to back up on that core (and others).

In my experience, when total thread time starts to approach 80% of the time that your server’s CPUs could theoretically provide, we’re going to hit the CPU wall, and zero waits will suddenly become SOS_SCHEDULER_YIELD pileups that lead to an unresponsive server. For big SQL Servers, 80% is the practical 100%. If you really wanna drive your server to 100% CPU, watch Thomas Kejser’s SQLBits session on designing high scale OLTP systems. I’ve never seen a company put in the kinds of efforts that Thomas describes in that video, although I have no doubt that some shops do it. I’m just saying you and I probably won’t – and for us, 80% is our practical 100%.

I’ve only had a couple of times in my career where I’ve seen this happen (zero waits lead to an unresponsive server within a matter of minutes), but when it’s happened to me, sp_BlitzFirst’s displays of thread time consumption has been the best warning tool in my portfolio. Now hopefully it’ll help some of y’all, too!

Previous Post
SQL ConstantCare Badges
Next Post
[Video] Office Hours at the Hoover Dam

12 Comments. Leave new

  • “You can watch some query tasks piled up waiting on CPU time even when there are other CPU cores sitting around nearly idle.”

    I wonder if this is due to NUMA memory mapping – a query task needs to touch data pages in buffer that are located in RAM that is “local” to a particular NUMA node, so it is placed on a core in that node, even if all the cores in that NUMA node are busy. If the query task was placed on an idle core, the RAM for the data pages may be “remote” to that core and take longer to access. That’s an optimization that would make sense at ~0-85% CPU load, but breaks down at very high CPU load for a given NUMA node.

    Reply
    • To be clear, I’m not saying that’s what’s actually happening. It’s just a guess that would explain that behavior.

      Reply
      • Sure, let’s think through that a little deeper: would that mean that before SQL Server could start executing a query, it would have to predict which pages the query would need to access, and then check to see which NUMA nodes have those pages cached, and then schedule the query to be executed on that NUMA node? And how might parallel queries handle that?

        Reply
        • I’m getting beyond my skis on internals, but I think the buffer access happens at the task level, below the query level. If so, it doesn’t affect the query plan, but can affect task thread-to-core assignment for the task that reads those buffer pages. It might even be a Windows thing instead of a SQL thing at that level.

          Okay, there is guidance from MS that says, “Keep MAXDOP at or under the # of logical processors per NUMA node.” Which could actually mean that there is NOT an affinity used to assign buffer page reader tasks to NUMA nodes, as the recommendation seems to be trying to use MAX DOP to prevent “far” memory reads for parallel tasks. https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-degree-of-parallelism-server-configuration-option?view=sql-server-ver16#considerations

          Thanks for humoring me on this. You got me curious to dig deeper on the internals; now I just need to find time to do it… (Still happy I bought the lifetime subscription during Black Friday Sale.)

          Reply
          • I think the buffer access happens at the task level, below the query level. If so, it doesn’t affect the query plan, but can affect task thread-to-core assignment for the task that reads those buffer pages. It might even be a Windows thing instead of a SQL thing at that level.

            Ah-ha! Sounds like someone needs to hit the SOS_SCHEDULER_YIELD module of the Mastering Server Tuning class. 😉 We discuss how this works in there, but the short story is that no, this kind of in-depth analysis doesn’t happen during query execution.

  • Charles Candale
    May 14, 2025 10:07 pm

    Hi Brent.
    What do you think of this approach if I encounter this:
    Query Tuning: Optimize expensive queries identified by sp_BlitzCache.
    Index Optimization: Ensure indexes are aiding query performance effectively.
    Parallelism Settings:
    Adjust MAXDOP (Maximum Degree of Parallelism) to prevent excessive CPU usage by parallel queries.
    Modify Cost Threshold for Parallelism to control when SQL Server uses parallel plans.
    Hardware Considerations: If optimization doesn’t suffice, consider scaling up CPU resources.

    Thanks.

    Charles.

    Reply
    • Charles – I wish I could do personalized training in the comments, but I hope it’s fair for me to suggest instead that you click Training at the top of the page. Fair?

      Reply
  • Hi Brent, how does hyper-treading/SMT affect this? Especially in VMs where the virtual OS/SOS just gets a bunch of logical cores that might be real cores or just the simulated cores aka fast registers the second thread is suspended in?

    Reply
    • I don’t really focus on that because it’s not something I can control as a VM consumer, and not something SQL Server pays any attention to. For SQL Server in a VM guest, logical processors are all the same.

      Reply
  • Rubén Garrigós
    May 17, 2025 5:06 pm

    For OLTP workloads, there’s normally a strong sensitivity to latency.

    While 80% utilization is an absolute upper limit, queueing theory (and the experience) shows that latency begins to rise significantly well before that point. In fact, once utilization exceeds around 60%, response times tend to be already too high and systems felt sluggish. This is why keeping headroom is essential for maintaining consistent performance… so for me anything above 30% of CPU utilization is already a concern for a pure OLTP latency sensitive load.

    Reply
  • I needed to run this outside of sp_blitzfirst so wrote this. Sharing in case it helps others:

    — Create a temp table to store CPU time at the start
    IF OBJECT_ID(‘tempdb..#cpu_start’) IS NOT NULL DROP TABLE #cpu_start;
    SELECT scheduler_id, cpu_id, total_cpu_usage_ms
    INTO #cpu_start
    FROM sys.dm_os_schedulers
    WHERE status = ‘VISIBLE ONLINE’;

    — Wait for 30 seconds
    WAITFOR DELAY ’00:00:30′;

    — Capture CPU time again
    IF OBJECT_ID(‘tempdb..#cpu_end’) IS NOT NULL DROP TABLE #cpu_end;
    SELECT scheduler_id, cpu_id, total_cpu_usage_ms
    INTO #cpu_end
    FROM sys.dm_os_schedulers
    WHERE status = ‘VISIBLE ONLINE’;

    — Calculate total CPU usage in seconds across all visible online schedulers
    SELECT
    SUM(CAST((e.total_cpu_usage_ms – s.total_cpu_usage_ms) AS BIGINT)) / 1000.0 AS total_cpu_seconds_used
    FROM #cpu_start s
    JOIN #cpu_end e ON s.scheduler_id = e.scheduler_id;

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.