I recently presented to the Salem Oregon SQL Server user group. During my visit I heard an interesting story: once upon a time in a land not so far way, a high severity production incident was caused by adding an additional data file to the tempdb database.
Multiple Tempdb Data Files: Commonly Recommended
Using multiple database files in tempdb is often recommended as a best practice in SQL Server. A single page can become a bottleneck in tempdb when many operations want to allocate space. Adding multiple tempdb files can help alleviate the problem. Adam Machanic gives a great summary of how to reproduce and identify this issue.
Many people suggest you should only add extra files to tempdb when you see specific metadata contention on that instance. However, most SQL Server DBAs who’ve been bit by this problem– and many who’ve read about it– configure multiple files when they’re building out a SQL Server instance as a preventive measure. This is because DBAs love uniform configuration and managers hate finding out a major incident could have been prevented. Once a team gets burned by a particular issue, if the fix does not have a negative impact they prefer to apply that fix everywhere. It keeps the egg off their faces.
What Could Possibly Go Wrong?
I usually classify adding extra tempdb data files as an easy, low risk change. As long as you don’t have a very large existing single file that you need to shrink, adding additional files and bringing the files to a uniform size is typically a simple operation. It’s important to use change control and the appropriate change windows, but this isn’t the type of change I worry much about.
But, as with all things, there’s a way this could cause surprising results.
We Added a Tempdb Data File, and Data Was Wrong
The story begins shortly after a new file was added to tempdb on an existing SQL Server instance. Suddenly, some new data was identified as incorrect. Jobs were running successfully. Some newly processed data was correct. But some was not.
Nobody likes incorrect data.
My first question was just to make sure I’d understood—- there were no code changes?
Nope, no code changes. Just an additional tempdb physical file.
The Plot Thickens
This type of story gets me really curious. I guess I’ve gotten myself into enough trouble to really love a good story of a change gone wrong. I had to know more.
Were temp tables being used? Yes, data was being bulk loaded into temp tables.
Were the rows in the temporary table itself wrong? Were values from different rows somehow being mixed together? (I didn’t know of a way for this to happen, but it seemed the first thing to clarify.)
The rows still held the expected data, I learned. But the rows were in the wrong order.
Aha! I yelped like a dog. That makes sense.
A Quick Example
It’s fun to reproduce this issue and see how tempdb behaves.
First, start with a test (not production) SQL Server instance with a single tempdb data file. To count the amount of data files in tempdb, you can run:
select file_id,name, physical_name
where type=0 — data files
If you need to remove some of the data files in tempdb for your test, you can do so with code like this. Just substitute in the actual file name.
alter database tempdb remove file tempdev2
Now we want to create a flat file with a bunch of rows in it. We want something that will be easy to tell if it’s in order or not, so for our test we’re going to create a file with rows of numbers from 1 to 50K.
First create a global temporary table holding the sorted list of numbers. In this example we’re using a clever method to create a numbers table using CTEs attributed to Itzik Ben-Gan.
if object_id(‘tempdb..##imReallyOrdered’) is not null
drop table ##imReallyOrdered
create table ##imReallyOrdered (N int not null);
WITH N1 AS ( SELECT 1 as N UNION SELECT 0 ),
N2 AS ( SELECT 1 as N FROM N1 CROSS JOIN N1 as N1B ),
N3 AS ( SELECT 1 as N FROM N2 CROSS JOIN N2 as N2B ),
N4 AS ( SELECT 1 as N FROM N3 CROSS JOIN N3 as N3B ),
N5 AS ( SELECT 1 as N FROM N4 CROSS JOIN N4 as N4B ),
Numbers as (SELECT ROW_NUMBER() OVER (ORDER BY N) as N FROM N5) –65536
SELECT TOP 50000 N
order by N –I’m really ordered! Really!
Now we can use this table to create a flat file.
You can do this from SQL Server Management Studio if you enable SQLCMD mode and then run the following (using a location of your choice).
–Export the data
!!bcp ##imReallyOrdered out "c:\test\Numbers.dat" -T -n
–Create a format file
!!bcp ##imReallyOrdered format nul -T -n -f "c:\test\Numbers.fmt"
Now we’re set up to test our import. We should have a single tempdb file and our files prepared to bulk load.
Run the following loop to test your bulk load. This loop creates a temporary table and then bulk inserts the data from our flat file without specifying a sort order. It then adds an identity column to the table and uses that identity column to verify the order of the data. It does this 50 times and tracks the results in a temp table. This may take a few minutes to run on a test system, depending how much power it’s got.
I’ve run this several times on SQL Server Denali CTP 3 and SQL Server 2008 R2, and each time I’ve found the data to be sorted in numeric order– as long as I’m using 50K rows with this file. Adjust the path to the file and format file you created if you placed them elsewhere.
USE tempdb ;
IF OBJECT_ID(‘tempdb..#results’) IS NOT NULL
DROP TABLE #results ;
CREATE TABLE #results (
NumberOutOfOrder INT NOT NULL ) ;
IF OBJECT_ID(‘tempdb..#temp’) IS NOT NULL
DROP TABLE #temp ;
CREATE TABLE #temp ( N INT ) ;
BULK INSERT #temp
FROM ‘C:\test\Numbers.dat’ WITH (FORMATFILE=’c:\test\Numbers.fmt’, FIRSTROW=1, TABLOCK) ;
ALTER TABLE #temp ADD i INT IDENTITY NOT NULL ;
INSERT #results( NumberOutOfOrder)
WHERE i <> N ;
SELECT SUM( CASE WHEN NumberOutOfOrder > 0 THEN 1 ELSE 0 END) AS OutOfOrder,
SUM( CASE WHEN NumberOutOfOrder = 0 THEN 1 ELSE 0 END) AS InOrder
FROM #results ;
Now add a second file to tempdb. Use code similar to this, but set the file location for your instance. Set the size, filegrowth, and maxsize properties to match your existing file.
alter database tempdb
add file (name=’tempdev2′, filename=’C:\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev2.ndf’, size=64MB)
With the second tempdb file, re-run the loop above to drop, create, and bulk load data into the #temp table, then test the order of the rows. You should see that periodically the final query finds a significant amount of rows which are not in order.
If you don’t see the order changing, try restarting your test instance and running the loop again.
Why Aren’t the Rows Always in Order?
The rows aren’t always in order because we haven’t said they need to be in order. In SQL Server, bulk imports of all types assume that the data in a file is unordered.
Although we see see rows maintain the order in some cases when we insert them into our temporary table when SQL Server has one tempdb file, we shouldn’t rely on that behavior. It’s not guaranteed.
My test runs show that with a single tempdb data file, I will see order preserved for some sizes of data files, and that order will change when I add a tempdb file. For larger files, I see that the order is not preserved with either one or multiple tempdb files. (The level at which order changes on a single file varies by SQL Server version, which made writing the repro scripts an interesting challenge!)
But in short: since we haven’t asked for the data to be stored in a particular order, SQL Server gives the data back to us as quickly as possible— and it may or may not be in order.
What Does this All Mean? Should We Still Use Multiple Files for TempDB?
Sure, multiple files for tempdb is still fine. This story really just reminded me about the fragility of the custom applications we all work with every day. Every application is based on some assumptions, you see.
The story has two morals:
- We can never assume a production change is harmless.
- Configuration matters, and should be set identically in development, test, and production environments.
When you’re making a change to a system, unless you’ve written or recently reviewed all of the code, you never know what assumptions the code is using. It’s easy for bits of code to creep in that rely on behavior that isn’t guaranteed.
Even for changes like this one which seem like they’re production-only, the change should run through a testing cycle before making its way live. And for most accurate results, that testing cycle should use production-sized datasets– because size does matter.