Why is SQL Server Slow Sometimes – But Only Sometimes?

Have you got a SQL Server that usually performs just fine, but every now and then, everything falls to pieces? Users complain about performance, and then minutes – or hours – later, the problem just mysteriously solves itself?

Here’s a few things to look for:

Are data or log files growing? When SQL Server needs to grow a file, it can bring all transactions in that database to a grinding halt. You can mitigate this issue by turning on Instant File Initialization, pre-growing out your data and log files, and using smaller autogrowth sizes. To figure out if this is your root cause and which cure makes the most sense, run sp_Blitz® and look for the warning about data and log file growths that have been taking more than 15 seconds.

Are queries being blocked? If one query takes out a lock that prevents other queries from getting their work done, it presents interesting symptoms. Your SQL Server’s CPU use might look low – but it’s because so many queries are sitting around idly waiting. To diagnose, start by running sp_BlitzFirst® with the @Seconds = 0, @ExpertMode = 1 parameters to check your wait stats since startup. If LCK* waits are significant, then start digging at the database level. Run sp_BlitzIndex® in each database, and look for the Aggressive Indexes warnings about indexes that are a frequent blocking sore spot. Consider dumping indexes you don’t need on those tables (to speed up deletes/updates/inserts), and adding the right indexes that you do need (to speed up other queries).

Sometimes I choose the wrong coffee tap, and I'm predictably slow.

Sometimes I choose the wrong coffee tap, and I’m predictably slow.

Are queries being rolled back? If a query fails – either due to being canceled, or running into an error, or hitting a deadlock – SQL Server has to undo their work. This may cause blocking, or it just may hammer storage. This one’s a little trickier to catch. If you run into it live, then you can see it with sp_BlitzFirst® – it warns about queries that are currently rolling back.

Did a bad plan get into cache? When you run parameterized SQL, SQL Server builds an execution plan based on the first set of parameters it sees. Normally, parameter sniffing is a good thing, but sometimes it goes awry. You can have a plan get pushed out of cache (due to memory pressure or statistics changes) and then get compiled based on the first – possibly unusual – set of parameters that come in. That plan might not work so well for other sets of parameters, and it can degrade performance on the whole box until a new plan is generated for it. You can learn more about solutions in this blog post about parameter sniffing, and to diagnose it, run sp_BlitzCache®. Look at the top resource-consuming queries, and see if an unusual one has suddenly jumped to the top of the charts.

Are shared resources under pressure? In virtualization environments, other guests may be using an increased amount of CPU, memory, or storage throughput, giving your guest less horsepower. In shared storage environments, other servers might be doing backups or disk-intensive workloads. In multi-instance servers, another instance on the same box might be the real culprit. To identify these, check out wait stats with sp_BlitzFirst®.

Looking back at what I’ve written, I’m sure someone’s going to say, “Brent, you’re just trying to pitch tools.” And sure, I am – they’re free. I want you to use the same powerful tools that we use every day in our consulting. Your job is hard enough without reinventing the wheel!

Previous Post
Is My SQL Server Too Big for Virtualization or the Cloud?
Next Post
Brent’s Bad Idea Jeans: Stop People From Changing Tables with This One Trick

10 Comments. Leave new

  • I get the following error when running your script on a SQL 2005 server:

    “Msg 319, Level 15, State 1, Procedure sp_AskBrent, Line 1345
    Incorrect syntax near the keyword ‘with’. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.”

    Unfortunately, I am not able to correct the error.

  • Billy Willoughby
    November 17, 2015 11:12 am

    While I don’t get to work on SQL Server so much in my current job as I have in the past, I just wanted to say THANK YOU for the tools you offer, and thank you for taking the time to tell us all about them!

    Billy

  • Thanks for this post, I do have some concerns about mitigating performance issues that is caused by file growth though. You are advising to set auto growth to smaller sizes I believe your reasoning there is so that SQL don’t spend too much time trying to grow the file especially knowing that IFI does not work with log files. however wouldn’t that cause fragmentation, and performance issues if SQL is always waiting to grow a file?

    Thanks

    • GrumpyDBA – when we say smaller sizes, we’re not talking 1MB. You have to find the balance between long file growth wait times, and high fragmentation. It’s up to you to find the right balance for your workload and hardware.

    • @GrumpyDba, You can used Extended Events or the default trace file to capture autogrowth events, and use them as a historical measure to determine file sizes and growth patterns. This will tell you not only the actual growth sizes by file, but how long the growth event took to complete. HTH!

  • @Brent, for VMs, I am not too sure how wait stats alone will provide resource contention at host level. For example one of our reporting server (SQL 2014 VM, shared CPU and IO and dedicated RAM) I sometimes saw Resource Semaphore and SOS Scheduler waits when facing slow queries and sometimes no waits (sys.dm_os_waiting_tasks didnt have any records for active session), task manager/perfmon also didnt report any contention at CPU, IO or RAM.
    I always complained about this with our IT and think the “automatic” V-motioning is the problem (probably logic around v-motioning), but to me beyond VM metrics are a blackbox. can you suggest something I can further do to find root cause?
    and btw… @GrumpyDBA should really not be a name, its one of our core attribute 🙂

    • Rizwan – root cause analysis is a little beyond what I can do in a blog post comment, but this is exactly the kind of thing we explore in our 5-day SQL Server Performance Tuning classes.

  • Hi ,
    Always on is configured in my Prod server
    When The users are running Select Query’s in the secondary server the querys taking more time when compared to primary. when they are running i can see i\o request taking more than 15 seconds in tempdb files and also in database data files. i have 2017 version sql. can you please suggest to enable any trace???

    Thanks In Advance..

Menu
{"cart_token":"","hash":"","cart_data":""}