You may have read that you need to have more than one data file in SQL Server’s tempdb. This can happen even if you’re using blazing fast storage. If you create a lot of tiny objects in tempdb you may hit a bottleneck on special pages that SQL Server uses internally to allocate all those objects. For certain workloads, adding more tempdb files speeds up SQL Server.
Most database administrators aren’t sure how to tell when they need to add more tempdb files. Here’s how you can find out for yourself.
Tempdb stress: my virtual machine setup
I stressed out my tempdb on a virtual machine with 6GB of memory, 4 virtual CPUs, Windows Server 2012 R2 and SQL Server 2014. The virtual machine is backed by solid state storage. My tempdb was configured with a single data file.
How I ran my stress test against tempdb
I used ostress.exe. This command line utility lets you easily run a query multiple times across many threads. You can get ostress.exe for free by downloading the RML utilities.
Here’s the ostress.exe command that I called:
ostress.exe -E -d"tempdb" -Q"exec dbo.tempdbstress" -n5 -r300 -b -q
The dbo.tempdbstress procedure looks like this:
CREATE PROCEDURE dbo.tempdbstress
SET NOCOUNT ON;
SELECT TOP(5000) a.name, replicate(a.status,4000) as col2
FROM master..spt_values a
CROSS JOIN master..spt_values b OPTION (MAXDOP 1);
Finding GAM and PFS contention in tempdb with sp_BlitzFirst® and Wait Stats
While this is running, look at waits in two ways. In one session, take a sample of waits with sp_BlitzFirst®:
exec sp_BlitzFirst @ExpertMode=1, @Seconds=10;
I see PAGELATCH_UP, PAGELATCH_SH and PAGELATCH_EX waits:
I also have a lot of just plain old CPU wait (the SOS_SCHEDULER_YIELD), because I’m bashing at only 4 virtual CPUs (and I’m running ostress.exe on the same VM as SQL Server).
Finding GAM and PFS contention in tempdb with sp_whoisactive
You can see this in another way: using Adam Machanic‘s free sp_whoisactive tool:
This lets you see the contention if you catch it at the right instant. Bonus: you can confirm exactly what type of pages its occuring on! I can see here that this is explicitly PFS waits. “PFS” means “page free space”: poor tempdb is hitting a hot spot on one of those special pages just recording how much space is available and where.
And here I can see the GAM waits. “GAM” means “Global Allocation Map”. That’s another one of those special pages that can get hot and slow things down in tempdb. It’s tracking where objects are allocated. (There’s also “SGAM” or “Shared Global Allocation Map” pages that you might see.)
Both of these types of waits can be alleviated if you add more data files to tempdb, because then you’ll get more PFS pages and GAM/SGAM pages across the files.
How to tell if you need more tempdb data files
First of all, if you just have one tempdb file, consider adding more files as preventive medicine. Don’t go crazy with this. The formula recommended by Microsoft in KB 2154845 to use one tempdb data file for each logical processor up to 8 processors is a good place to start.
Look at waits that have occurred since startup. Do you have PAGELATCH_UP or PAGELATCH_EX waits? (Note that these don’t have an “IO” in them. PAGEIOLATCH waits are different.)
If you do, that’s not necessarily 100% proof that the issue is in tempdb, but it’s a good indicator. If you don’t, well, it’s probably not a big issue.
If you do find these waits, dig deeper:
- Identify when the PAGELATCH_% waits are growing the most, and dig into those periods with sp_whoisactive. You can log sp_whoisactive results to a table
- Look at your top queries. The queries that are part of the issue might be suitable for caching in an application tier.
Tempdb file configuration reminders
Remember: this only applies to tempdb data files. You only need one tempdb log file.
Also remember that the tempdb data files will be used proportionally with their size. This means that you should keep the files equally sized. You can use trace flag 1117 to force all tempdb files to grow at the same time when one grows, but this will apply that behavior to every database on the instance (not just tempdb).
We recommend pre-growing out your tempdb files when you configure your SQL Server instance.
You can add tempdb files without restarting the SQL Server instance. However, we’ve seen everything from Anti-Virus attacking the new files to unexpected impacts from adding tempdb files. And if you need to shrink an existing file, it may not shrink gracefully when you run DBCC SHRINKFILE. For best results, deploy the configuration first to your pre-production environments, use a change request, and perform the change in a planned maintenance window.
Want to learn more? Take my class.
My Fundamentals of TempDB class is for curious folks who want to learn:
- What uses TempDB, and how that affects performance: not just temp tables and table variables, but also triggers, cursors, sorting indexes, workspace spills, the version store, and more
- How to host TempDB depending on how you use it: understanding what goes in the log file, the data files, why the number of data files matter, and whether your particular workload should be on SAN, local storage, or ephemeral storage, and why
- How to monitor it: after you understand what uses it and how to host it, you need to keep tabs on how its performance changes over time using DMVs, Perfmon counters, and the First Responder Kit scripts
Thanks Kendra – nice method you have put together here. I haven’t done a ton of research on -T1117 – gives me some reading to do today!
Say, what is the general consensus around -T1118 these days? It seems Paul Randal recommends it be turned on for every instance from 2000 forward, as it has no “down-side” (link below – really good info). This is one of my two startup trace flags on all my instances (-T3226 being the other).
I’m sorry Allen, I just have to ask. Are you commenting just to subscribe to comments and answer other questions that come in after this? It’s just a pattern I’ve seen on my posts. 🙂
I do like to interact ya know! I often learn the most from the comments that follow a blog post, don’t you? I see dialogs like we’ve had all the time in blog post comments. I’ve been researching this since I read your post this morning – that’s a good thing isn’t it? I don’t know why asking for feedback -T1118 is a bad thing – those types of fringe questions related to the topic in the post happen all the time.
Feel free to reach out to me directly if you have issues with my interactions – I like to participate, but if by chance my participation is causing issues let me know.
Oh, it’s not bad to ask a question about TF 1118, I just wasn’t sure it was a real question based on the pattern.
I don’t personally know of any problems with TF 1118 after SQL Server 7.0, and it sounds like you’ve researched and tested it in your environment and it works well for you.
Nice listing Kendra but about pre-growing the files on a VM with dynamic disks that doesn’t helps. Any caveats or workaround you know here?
Yup – can you explain a little more about what you mean by pre-growing the files on a VM not helping?
Somehow I’m not receiving email so apology for the late response.
It all started from here…
Currently I’ve tested this on Gen-2 VM’s (OS Win 2012 r2).
What I’m trying to say is -> even if I expand the LDF/MDF/NDF to a said number the Dynamic VHD/VHDx under the VM will not expand that much (unless real data starts to fill in). If you can share an email address I can send some pics / numbers.
More details on why am I doing this are here.
Having multiple data files on tempdb helps prevent a contention that can happen inside the SQL Server on pages that are in *memory* — it’s actually not about IO or anything that happens at the storage level. It’s about the number of logical files the SQL Server sees.
Pre-growing the files will keep them from auto-growing unevenly (which causes uneven round robin usage and might lead you back to contention on those pages in memory).
So the reasons to pre-grow in this case are about management and avoiding a bottleneck in memory. That’s the same even if you’re using something totally wacky like having tempdb on a RAMDISK, etc.
What I was trying to put is, pre-growing is the right way no conflicts there. Though pre-growing / expanding files on a dynamic VHD – doesn’t helps. Likely if you set the tempDB at 10 GB, you will not see the VHD growing that much… until tempDB or for that sake any MDF/NDF/LDF has real stuff in it.
For the purpose of everything in this post, pre-growing the files WOULD help: it ensures the tempdb data files stay an even size. That’s an important part of the configuration.
Most people don’t use the older dynamic VHD format for completely different performance reasons, but that’s a totally different story than what we’re talking about in this post.
That should work the same way as in a non-vm or with a basic style of formatting for the disk in terms of SQL Server. It doesn’t have any special logic for file management for VMs or dynamic disks.
Good One. Thanks for sharing!
Hey Kendra, Sorry if you felt this was an unwanted extension to your post. But where I picked it up from is this line of yours, “That should work the same way as in a non-vm or with a basic style of formatting for the disk in terms of SQL Server. It doesn’t have any special logic for file management for VMs or dynamic disks.”
Further when you said “Most people don’t use the older dynamic VHD format for completely different performance reasons”.
What I’ve seen is Pre-growing the files acts differently when there is a dynamic disk involved be that VHD/VHDx (my statements are specific to Hyper-V uptill ver 3.0).
Oh, I see what you mean! Yes, my phrasing was confusing. I simply meant that from the sql server’s perspective it would see the logical size and round robin & grow (or not grow) accordingly. Makes sense!
ps: it’s not unwanted at all, it’s just to do the topic justice it needs its own post with examples, etc.
I just encountered a very odd tempdb problem today.
It seemed like SQL Server tried to extend tempdb data files where the drive did not have any more space left. I have placed a log file in another location that has drive space. Is it normal behavior for it to try and extend data files in a location that does not have any more drive space instead of using a file that can still grow? That caused some sessions to stay hung up for about 2.5 hours. They disappeared by themselves, eventually, but I did not really understood how….
To top that, the new data file added in the new location grew by 100 GB in one day…… How can I find what caused that?
Also, I see a lot of pageiolatch waits on tempdb. What could this mean?
Yes– if tempdb fills up, it will grow if it can, and it isn’t smart enough to look at available drive space.
Your question references both tempdb data files and log files. Those are pretty different. Which was growing?
Thank you for your response. My question is for tempdb data files. The log files should just rollover, correct?
Both of them can grow, depending on what’s going on.
For tempdb data files, some people choose to not allow auto-growth if there’s no more room. This does mean that anything that fills up tempdb will fail and roll back, but there’s less cleanup.
To know what is using the space, a monitoring tool is very helpful. If you need to roll your own, you can set up an alert based on file growth and have it alert you, or perhaps also run a job that uses a tool like sp_WhoIsActive or sp_AskBrent to log what’s happening to a table for review.
I was able to run sp_AskBrent and sp_whoisactive.
sp_whoisactive showed a lot of pagelatch waits on the tempdb file that could not be extended. The sessions stayed that way for about 2 hours and then, it disappeared by itself. I am not sure how to explain that. It also frequently showed pagelatch and pageiolatch waits for new sessions after that.
sp_AskBrent showed 2 top waits (CXPACKET and Buffer). The buffer would suggest slow I/O?
Using the default trace, I found that the tempdb file had grown by about 100GB in a span of one hour… Looking at the objects created in that hour, it seemed to be indexes on temp tables….. Does that mean we have giant temp tables or that we are indexing wide columns in our temp tables?
Tempdb use could be from temporary tables and indexes, or it could also be from sorts/joins from long running queries. So you do need to check for both– it can be a bit complicated.
Thanks for reminding us that sometimes we just have to clean up the code.
Thanks so much for getting back to me.
I am going to follow your suggestion to log the results of the tools to a table so we can learn more about this problem. Seems like it is the best next step for now =)
Thank you very much for all the info you have shared.
Does simply rolling a clustered SQL server cause the TempDB files for recreate if they were set to a new size, or is an SQL restart necessary to make this happen?
Could you please reword/paraphrase your question? I’m not sure what you mean, just having a hard time figuring out exactly what you’re asking.
Should I add new files (for tempdb) on separate disks or it will be also / less good to add them on the same disk ?
Thank you very much for your post.
Can round-robin allocation be deliberately avoided?
I have a scenario with SQL Server 2012 Standard running in a VM where a (single) tempdb data file is stored on a space-limited RAM drive which is adequately sized for normal operation, however I want to configure an additional tempdb file on a SAN to cater for transient, unanticipated increase in tempdb requirements.
The reason for this configuration is that the SAN is too slow for the demand the application places on tempdb, and local storage is not an option due to the configuration of the VMWare platform hosting the VM.
I obviously wish to ensure that the file on the RAM drive is used before spilling into the file located on the SAN. Do you know if it is possible to avoid round-robin allocation?
Just a bit of a reality check– you’re in a bad spot. While Microsoft technically supports these mapped memory drives for tempdb, you’re now reliant on a 3rd party driver and performance may be very spotty. If you run into performance issues, the 3rd party driver and Microsoft are likely to believe that the problem lies with the other party. You don’t want to get more creative here, if performance is important you want to fix the root cause of not having good enough storage for your SQL Server’s needs.
Thanks for the feedback. Of course we are planning to get out of this situation but the reality is, after much deliberation and discussion, that we consider this to be our best short-term approach to get things moving again and with minimal disruption and cost.
If further performance issues arise then we will deal with those in due course with the understanding that the problem may be down to the 3rd-party driver.
To be honest the root cause is actually poorly-written queries that make excessive use of tempdb, however this is a vendor-supplied product with code that we cannot change. If it was an internally-written application then I would be all over it, believe me…
I guess the question remains, though, as to whether the round-robin usage of the files can be avoided. If not then we’ll just have to monitor tempdb carefully whilst this configuration is active.
Thanks again for your comments and advice.
The only way I know of is just having one tempdb file– but I’ve never researched it.
I know this is an old article, but I have found it very helpful. I would like to know a little more information about these ‘Hot Spots’ in TempDB. We had an issue where we were getting blocking because of the PFS type wait on TempDB. We added more and more and more and eventually it stopped. But I believe there are still some issues. Our TempDB files are 99.9% empty, yet I am getting errors that temp tables cannot get created, and during certain times we see more of the PFS type waits, even though, still, there is over 99% free space in TempDB.
How can one identify what the hot spot is?
Eduard — for Q&A, head on over to dba.stackexchange.com and post as much detail as possible.
Hi Kendra it great post and had a quick question I have scenario where every day around a particular time we say heaving blocking and the wait resource points to Temp DB GAM page PAGELAtch-up waits and we have around 17 tempdb files. We have tried enabling 1118 and the issue has not gone away not sure what else we can do reduce . The other thing is every time these blocking happen they only seems to happen on one of the file ID so not sure why adding additional files help :EX : 2:11:02 PAGELATCH_UP 780 PAGELATCH_UP wait
Santosh – for questions, head to a Q&A site like https://dba.stackexchange.com.
We have a terrible problem with a new machine and SQL Server 2019, and perhaps it is the PFS new file contention avoiding system in tempdb that causes the disaster…
I posted the trouble in my own blog :
Do you know how to disable it ?
For general Q&A, head to a Q&A site like https://dba.stackexchange.com. Thanks!
done it : https://dba.stackexchange.com/questions/275487/2019-enterprise-nolock-without-nolock-page-latch-time-out-on-tempdb-and-final
Ah, I see by Paul White’s comment that you’ve also already tried publishing it to the MVP DL. You probably want to open a call to Microsoft – this is pretty far beyond what someone’s going to be able to answer for free.