Years ago, I came across an article about a gentleman who made a cat camera. Without boring you too much, this camera gave him a detailed view (including GPS info) about what his cats got up to all day. I got excited about the idea of finding out what my pets did all day. Not having the means to build a cat camera of my own and being far too cheap to buy one, I rigged up a web cam at home to watch the cats all day. It turns out that my pets do nothing all day. They slept and ate and slept and ate until I got home. At which point, they kept doing the exact same thing.
In a wonderful bit of quantum boringness, it turns out that I had no idea what kind of chaos my pets were causing until I directly observed them. I had no idea if they were actually moving things around in the house or if I was simply forgetting where I was putting things (turns out I’m very forgetful). TempDB is a lot like a basket of cats – you don’t know for sure that it’s causing your problems, you have some sneaking suspicions, but you’re not sure how to prove anything.
Watching Over TempDB
If TempDB is like a basket of cats, we need to watch what it’s doing; there’s no telling when it’s going to go from adorable to shredding the drapes. Knowing what to watch in TempDB is just as important as knowing that you should even be watching TempDB at all.
How Many Cats Do I Have? (Watching TempDB Free Space)
Excessive TempDB usage isn’t necessarily a sign that TempDB is a problem, but it is an indicator that you have problems worth looking into. When TempDB starts getting full, it’s an indicator that there’s a lot of temporary object creation as well as out of memory sorting and joining going on in the database. None of these things are bad, but they’re indicators that we should be taking a closer look at TempDB.
There’s no hard and fast metric for what you should do when your TempDB data file is large, but it’s a good indicator that you can stand to do one of a few things:
These changes won’t always cure the problem, but they are starting points. Waiting for TempDB to grow can be a cause of performance problems and enabling Instant File Initialization makes it possible to quickly grow TempDB data files. Using multiple TempDB files uses more storage bandwidth, reduces file contention, and adds magical pixie dust to your queries.
What Are My Cats Doing? (Monitoring TempDB Usage)
The next step, after you know how much TempDB you’re using, is to find out how TempDB is being used. TempDB is used for a few distinct things: joins, aggregations, sorting, the version store, temporary tables (and table variables), and table/index spooling. While these are different operations, they all consume TempDB space. Understanding how your applications use TempDB is critical to understanding if TempDB is causing performance problems.
This is where things get more complicated; there’s never a right or wrong answer, but TempDB usage varies heavily by application and workload. Sometimes even the same application, with different customer workloads, can have wildly different TempDB usage characteristics. By monitoring TempDB through a variety of DMO calls, server side traces, and performance counters it’s possible to get an accurate picture of the health and utilization of TempDB over time. Through some careful DMO/DMV scripting it’s even possible trace who the biggest consumers of TempDB are back to the stored procedure or query that’s using TempDB.
Just like trying to watch a basket of cats through a webcam, you can only catch quick glimpses of what’s going on. This process makes it possible to capture a sample of what’s going on inside TempDB at any moment, but it’s only for a quick moment. The DMOs to monitor TempDB only look at the currently running queries, there is no historical record. The best way to get an accurate picture of what’s happening is to sample these DMOs on a regular basis and sample aggressively during peak performance periods. You won’t catch every query this way but you should be able to catch most.
Benchmark, Rinse, Repeat
Whenever I talk about performance tuning or general SQL Server problems, I always advise people to benchmark everything that they can. Having a steady baseline is the only way to verify that changes are having a positive effect on performance. Without a performance baseline in place, all you have to go on is a feeling that things are faster. Unfortunately, feelings don’t translate into quantifiable numbers (unless you’re trying to quantify how you feel about a basket of cats).
Establishing a performance baseline is one of my favorite parts of working with clients. As we go through the health check, I work with our clients to figure out where it hurts and help them build a solution. I cover how they can use the baseline to keep monitoring their system. With these tools in place, it’s easy to monitor a system’s health over time.
Focusing on performance metrics makes it easy to see which parts of an application are causing performance problems. It TempDB usage spikes after a change to a few stored procedures, it’s easy to identify the problem when you have a baseline established.
Determining whether or not TempDB is a performance problem boils down to establishing a baseline, monitoring performance before and after changes, and carefully making changes until acceptable performance levels are reached. This may involve adding more TempDB data files, forcing memory grant allocations, or using solid state drives for TempDB.
Want to learn more? Take my class.
My Fundamentals of TempDB class is for curious folks who want to learn:
- What uses TempDB, and how that affects performance: not just temp tables and table variables, but also triggers, cursors, sorting indexes, workspace spills, the version store, and more
- How to host TempDB depending on how you use it: understanding what goes in the log file, the data files, why the number of data files matter, and whether your particular workload should be on SAN, local storage, or ephemeral storage, and why
- How to monitor it: after you understand what uses it and how to host it, you need to keep tabs on how its performance changes over time using DMVs, Perfmon counters, and the First Responder Kit scripts