Even Simple Changes Can Cause Trouble: A Tale of TempDb

I recently presented to the Salem Oregon SQL Server user group. During my visit I heard an interesting story: once upon a time in a land not so far way, a high severity production incident was caused by adding an additional data file to the tempdb database.

Multiple Tempdb Data Files: Commonly Recommended

Each quadrant of the bento is a TempDB data file

Using multiple database files in tempdb is often recommended as a best practice in SQL Server. A single page can become a bottleneck in tempdb when many operations want to allocate space. Adding multiple tempdb files can help alleviate the problem. Adam Machanic gives a great summary of how to reproduce and identify this issue.

Creating multiple files in tempdb is recommended over hill and dale in Books Online, on Technet, and in many blogs.

Many people suggest you should only add extra files to tempdb when you see specific metadata contention on that instance. However, most SQL Server DBAs who’ve been bit by this problem– and many who’ve read about it– configure multiple files when they’re building out a SQL Server instance as a preventive measure. This is because DBAs love uniform configuration and managers hate finding out a major incident could have been prevented. Once a team gets burned by a particular issue, if the fix does not have a negative impact they prefer to apply that fix everywhere. It keeps the egg off their faces.

What Could Possibly Go Wrong?

I usually classify adding extra tempdb data files as an easy, low risk change. As long as you don’t have a very large existing single file that you need to shrink, adding additional files and bringing the files to a uniform size is typically a simple operation. It’s important to use change control and the appropriate change windows, but this isn’t the type of change I worry much about.

But, as with all things, there’s a way this could cause surprising results.

We Added a Tempdb Data File, and Data Was Wrong

The story begins shortly after a new file was added to tempdb on an existing SQL Server instance. Suddenly, some new data was identified as incorrect. Jobs were running successfully. Some newly processed data was correct. But some was not.

Nobody likes incorrect data.

This is how bad data feels.

My first question was just to make sure I’d understood—- there were no code changes?

Nope, no code changes. Just an additional tempdb physical file.

The Plot Thickens

This type of story gets me really curious. I guess I’ve gotten myself into enough trouble to really love a good story of a change gone wrong. I had to know more.

Were temp tables being used? Yes, data was being bulk loaded into temp tables.

Were the rows in the temporary table itself wrong? Were values from different rows somehow being mixed together? (I didn’t know of a way for this to happen, but it seemed the first thing to clarify.)

The rows still held the expected data, I learned. But the rows were in the wrong order.

Aha! I yelped like a dog. That makes sense.

A Quick Example

It’s fun to reproduce this issue and see how tempdb behaves.

First, start with a test (not production) SQL Server instance with a single tempdb data file. To count the amount of data files in tempdb, you can run:

select file_id,name, physical_name
from tempdb.sys.database_files
where type=0 -- data files

If you need to remove some of the data files in tempdb for your test, you can do so with code like this. Just substitute in the actual file name.

dbcc shrinkfile(tempdev2,emptyfile)
go
alter database tempdb remove file tempdev2
go

Now we want to create a flat file with a bunch of rows in it. We want something that will be easy to tell if it’s in order or not, so for our test we’re going to create a file with rows of numbers from 1 to 50K.

First create a global temporary table holding the sorted list of numbers. In this example we’re using a clever method to create a numbers table using CTEs attributed to Itzik Ben-Gan.

if object_id('tempdb..##imReallyOrdered') is not null
	drop table ##imReallyOrdered
go

create table ##imReallyOrdered (N int not null);
go

WITH N1 AS ( SELECT 1 as N UNION SELECT 0 ),
	N2 AS ( SELECT 1 as N FROM N1 CROSS JOIN N1 as N1B ),
    N3 AS ( SELECT 1 as N FROM N2 CROSS JOIN N2 as N2B ),
    N4 AS ( SELECT 1 as N FROM N3 CROSS JOIN N3 as N3B ),
    N5 AS ( SELECT 1 as N FROM N4 CROSS JOIN N4 as N4B ),
    Numbers as (SELECT ROW_NUMBER() OVER (ORDER BY N) as N FROM N5) --65536
insert ##imReallyOrdered(N)
SELECT TOP 50000 N
FROM Numbers
order by N --I'm really ordered! Really!
GO

Now we can use this table to create a flat file.

You can do this from SQL Server Management Studio if you enable SQLCMD mode and then run the following (using a location of your choice).

--Export the data
!!bcp ##imReallyOrdered out "c:\test\Numbers.dat" -T -n

--Create a format file
!!bcp ##imReallyOrdered format nul -T -n  -f "c:\test\Numbers.fmt"

Now we’re set up to test our import. We should have a single tempdb file and our files prepared to bulk load.

Run the following loop to test your bulk load. This loop creates a temporary table and then bulk inserts the data from our flat file without specifying a sort order. It then adds an identity column to the table and uses that identity column to verify the order of the data. It does this 50 times and tracks the results in a temp table. This may take a few minutes to run on a test system, depending how much power it’s got.

I’ve run this several times on SQL Server Denali CTP 3 and SQL Server 2008 R2, and each time I’ve found the data to be sorted in numeric order– as long as I’m using 50K rows with this file. Adjust the path to the file and format file you created if you placed them elsewhere.

USE tempdb ;
go

IF OBJECT_ID('tempdb..#results') IS NOT NULL
    DROP TABLE #results ;
go

CREATE TABLE #results (
    NumberOutOfOrder INT NOT NULL ) ;
go

BEGIN
    IF OBJECT_ID('tempdb..#temp') IS NOT NULL
        DROP TABLE #temp ;

    CREATE TABLE #temp ( N INT ) ;

    BULK INSERT #temp
    FROM 'C:\test\Numbers.dat' WITH (FORMATFILE='c:\test\Numbers.fmt', FIRSTROW=1, TABLOCK) ;

    ALTER TABLE #temp ADD i INT IDENTITY NOT NULL ;

    INSERT  #results( NumberOutOfOrder)
            SELECT  COUNT(*)
            FROM    #temp
            WHERE   i <> N ;

END
GO 50

SELECT  SUM( CASE WHEN NumberOutOfOrder > 0 THEN 1 ELSE 0 END) AS OutOfOrder,
	 SUM( CASE WHEN NumberOutOfOrder = 0 THEN 1 ELSE 0 END) AS InOrder
FROM #results ;
go

Now add a second file to tempdb. Use code similar to this, but set the file location for your instance. Set the size, filegrowth, and maxsize properties to match your existing file.

alter database tempdb
add file (name='tempdev2', filename='C:\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev2.ndf', size=64MB)
go

With the second tempdb file,  re-run the loop above to drop, create, and bulk load data into the #temp table, then test the order of the rows. You should see that periodically the final query finds a significant amount of rows which are not in order.

If you don’t see the order changing, try restarting your test instance and running the loop again.

Why Aren’t the Rows Always in Order?

The rows aren’t always in order because we haven’t said they need to be in order. In SQL Server, bulk imports of all types assume that the data in a file is unordered.

Although we see see rows maintain the order in some cases when we insert them into our temporary table when SQL Server has one tempdb file, we shouldn’t rely on that behavior. It’s not guaranteed.

My test runs show that with a single tempdb data file, I will see order preserved for some sizes of data files, and that order will change when I add a tempdb file. For larger files, I see that the order is not preserved with either one or multiple tempdb files. (The level at which order changes on a single file varies by SQL Server version, which made writing the repro scripts an interesting challenge!)

But in short: since we haven’t asked for the data to be stored in a particular order, SQL Server gives the data back to us as quickly as possible— and it may or may not be in order.

What Does this All Mean? Should We Still Use Multiple Files for TempDB?

I *think* it's safe...

Sure, multiple files for tempdb is still fine. This story really just reminded me about the fragility of the custom applications we all work with every day. Every application is based on some assumptions, you see.

The story has two morals:

  • We can never assume a production change is harmless.
  • Configuration matters, and should be set identically in development, test, and production environments.

When you’re making a change to a system, unless you’ve written or recently reviewed all of the code, you never know what assumptions the code is using. It’s easy for bits of code to creep in that rely on behavior that isn’t guaranteed.

Even for changes like this one which seem like they’re production-only, the change should run through a testing cycle before making its way live. And for most accurate results, that testing cycle should use production-sized datasets– because size does matter.

Kendra Little

Kendra specializes in high availability and performance tuning. She is a Microsoft Certified Master in SQL Server-- the highest technical SQL Server Certification available. Kendra loves databases and software development more than long walks on the beach. Those cartoons in her blog posts? She draws 'em all. Read more and contact Kendra.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

14 Responses to Even Simple Changes Can Cause Trouble: A Tale of TempDb
  1. Thirster42
    August 24, 2011 | 10:11 AM

    The Real WTF is relying on SQL Server to return data in a consistant order without specifying it. I learned that my first year. :-D

    • Kendra Little
      August 24, 2011 | 6:57 PM

      Heh– well to stick up for the coder out there somewhere, it can be more obvious in the case of doing inserts within TSQL rather than in bulk loading in some cases. Sometimes you have to load data files that don’t have an obvious good field to sort by, but have to be in order. When you get stuck in that situation and you test things and it looks like the order is working, you might be tempted to go with what (you think) works!

      So in terms of the pattern that happened to rely on what looked like the apparent behavior (although it wasn’t guaranteed), I’ve seen that play out in different environments a lot of times. It’s one of those hard parts of the SDLC to work with.

  2. Peter Maloof
    August 24, 2011 | 10:14 AM

    Kendra:

    Really entertaining article. Technically though, the ‘problem’ was an incorrect assumption by the client, right?

    • Kendra Little
      August 24, 2011 | 6:54 PM

      Yep, the root cause for this example was an incorrect assumption in an older piece of code.

  3. Rob Boek
    August 24, 2011 | 11:59 AM

    So, all you DBAs out there. Make sure you test every piece of code in your system before you add a file to TempDb.

    I’d argue that this is really something that should have been caught in the application code reviews and testing. It was a *bug* that could actually occur in other scenarios, besides adding a file to TempDb. I would never fault a DBA on this one.

    • Kendra Little
      August 24, 2011 | 12:06 PM

      Oh, I definitely wouldn’t fault the DBA on this.

      The problem you have when you’re a DBA who ran a change that had an unintended consequence is that you get a lot of overtime no matter what! You need to dig in and help identify the problematic data that was generated and help write and deploy the cleanup. And then you don’t really want to get into the finger pointing thing too much about the cause, because that won’t help you in the long run.

      So by routing even configuration changes through a testing cycle, as a DBA you just have a better chance of helping save a bunch of work. It also gives you the chance to talk about SQL Server configuration and its impact on an environment, which can help bridge the gap between Dev and IT.

  4. Adam Machanic
    August 24, 2011 | 7:51 PM

    Relying on the order of the rows in the temp table? Classic, and too funny! Thanks for sharing this tale of woe :-)

  5. Claire
    August 25, 2011 | 9:53 AM

    That is the nicest-looking bento box I have ever seen.
    I’ve learned through my own hastiness and general recalcitrance to use staging servers that one wee line of code and bring a whole front end down; this story reinforces that, albeit from a different angle.

  6. [...] Even Simple Changes Can Cause Trouble: A Tale of TempDb - A fascinating real world troubleshooting case involving the tempdb and having multiple data files from Kendra Little (Blog|Twitter). It’s not what you might expect. [...]

  7. question about the sample code
    August 28, 2011 | 4:17 AM

    Hi,
    I try to use your sample code to re-produce the issue, but I got a stranger error when I ran the following code –
    “Bulk insert failed due to a schema change of the target table.”

    actually, I just follow your code, did not make a change.
    so anything wrong, what should I do?
    thanks!

    declare @ct smallint=1;declare @results table (NumberOutOfOrder int); while @ct < 50begin if object_id('tempdb..#temp') is not null drop table #temp; create table #temp (N int); BULK INSERT #temp FROM 'C:\test\Numbers.dat' with (FORMATFILE='c:\test\Numbers.fmt', FIRSTROW=1, TABLOCK); alter table #temp add i int identity not null; insert @results(NumberOutOfOrder) SELECT count(*) FROM #temp where i N; set @ct += 1;end Select * from @results;go

    • Kendra Little
      August 28, 2011 | 1:10 PM

      Hi Jacky,

      I’m not sure which version of SQL Server you’re using, but I’m able to reproduce that error message in SQL 2008 R2. My initial version ran for me perfectly on Denali CTP3, so looks like this got fixed!

      There’s some weirdness in SQL 2008 R2 in recreating and bulk loading into a temp table within a loop. All the commands work just fine outside of the loop when run in batches, even if run multiple times. The quickest way I can see to get around this is to use the ‘GO ###’ shortcut. That’s not something I use in production code, but since this is just demoing the sort order issue, that’s OK by me. :)

      I edited the script in the post and think that should work for you— at least if you’re running 2008 R2.

      *Note*: I tweaked the number of rows in the file to import to get the order /unordered repro to work in 2008 R2 as well.

      • question about the sample code
        August 29, 2011 | 9:06 AM

        Thanks for your help, I was working on a SQL Server 2008 R2, so it may be a issue of SQL Server 2008 R2. whatever, the problem has been resloved by your new code.
        Thanks for your efforts again.

  8. Jeffry Altamar Thalliens
    August 29, 2011 | 4:30 PM

    Great Post,

    keep like this, you all are the best for people who are begining in the DataBase world like me, everyday learning from the best.

    thanks you all.

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Notify me of followup comments via e-mail. You can also subscribe without commenting.