Even Simple Changes Can Cause Trouble: A Tale of TempDb

TempDB
34 Comments

I recently presented to the Salem Oregon SQL Server user group. During my visit I heard an interesting story: once upon a time in a land not so far way, a high severity production incident was caused by adding an additional data file to the tempdb database.

Multiple Tempdb Data Files: Commonly Recommended

Each quadrant of the bento is a TempDB data file

Using multiple database files in tempdb is often recommended as a best practice in SQL Server. A single page can become a bottleneck in tempdb when many operations want to allocate space. Adding multiple tempdb files can help alleviate the problem. Adam Machanic gives a great summary of how to reproduce and identify this issue.

Creating multiple files in tempdb is recommended over hill and dale in Books Online, on Technet, and in many blogs.

Many people suggest you should only add extra files to tempdb when you see specific metadata contention on that instance. However, most SQL Server DBAs who’ve been bit by this problem– and many who’ve read about it– configure multiple files when they’re building out a SQL Server instance as a preventive measure. This is because DBAs love uniform configuration and managers hate finding out a major incident could have been prevented. Once a team gets burned by a particular issue, if the fix does not have a negative impact they prefer to apply that fix everywhere. It keeps the egg off their faces.

What Could Possibly Go Wrong?

I usually classify adding extra tempdb data files as an easy, low risk change. As long as you don’t have a very large existing single file that you need to shrink, adding additional files and bringing the files to a uniform size is typically a simple operation. It’s important to use change control and the appropriate change windows, but this isn’t the type of change I worry much about.

But, as with all things, there’s a way this could cause surprising results.

We Added a Tempdb Data File, and Data Was Wrong

The story begins shortly after a new file was added to tempdb on an existing SQL Server instance. Suddenly, some new data was identified as incorrect. Jobs were running successfully. Some newly processed data was correct. But some was not.

Nobody likes incorrect data.

This is how bad data feels.

My first question was just to make sure I’d understood—- there were no code changes?

Nope, no code changes. Just an additional tempdb physical file.

The Plot Thickens

This type of story gets me really curious. I guess I’ve gotten myself into enough trouble to really love a good story of a change gone wrong. I had to know more.

Were temp tables being used? Yes, data was being bulk loaded into temp tables.

Were the rows in the temporary table itself wrong? Were values from different rows somehow being mixed together? (I didn’t know of a way for this to happen, but it seemed the first thing to clarify.)

The rows still held the expected data, I learned. But the rows were in the wrong order.

Aha! I yelped like a dog. That makes sense.

A Quick Example

It’s fun to reproduce this issue and see how tempdb behaves.

First, start with a test (not production) SQL Server instance with a single tempdb data file. To count the amount of data files in tempdb, you can run:

If you need to remove some of the data files in tempdb for your test, you can do so with code like this. Just substitute in the actual file name.

Now we want to create a flat file with a bunch of rows in it. We want something that will be easy to tell if it’s in order or not, so for our test we’re going to create a file with rows of numbers from 1 to 50K.

First create a global temporary table holding the sorted list of numbers. In this example we’re using a clever method to create a numbers table using CTEs attributed to Itzik Ben-Gan.

Now we can use this table to create a flat file.

You can do this from SQL Server Management Studio if you enable SQLCMD mode and then run the following (using a location of your choice).

Now we’re set up to test our import. We should have a single tempdb file and our files prepared to bulk load.

Run the following loop to test your bulk load. This loop creates a temporary table and then bulk inserts the data from our flat file without specifying a sort order. It then adds an identity column to the table and uses that identity column to verify the order of the data. It does this 50 times and tracks the results in a temp table. This may take a few minutes to run on a test system, depending how much power it’s got.

I’ve run this several times on SQL Server Denali CTP 3 and SQL Server 2008 R2, and each time I’ve found the data to be sorted in numeric order– as long as I’m using 50K rows with this file. Adjust the path to the file and format file you created if you placed them elsewhere.

Now add a second file to tempdb. Use code similar to this, but set the file location for your instance. Set the size, filegrowth, and maxsize properties to match your existing file.

With the second tempdb file,  re-run the loop above to drop, create, and bulk load data into the #temp table, then test the order of the rows. You should see that periodically the final query finds a significant amount of rows which are not in order.

If you don’t see the order changing, try restarting your test instance and running the loop again.

Why Aren’t the Rows Always in Order?

The rows aren’t always in order because we haven’t said they need to be in order. In SQL Server, bulk imports of all types assume that the data in a file is unordered.

Although we see see rows maintain the order in some cases when we insert them into our temporary table when SQL Server has one tempdb file, we shouldn’t rely on that behavior. It’s not guaranteed.

My test runs show that with a single tempdb data file, I will see order preserved for some sizes of data files, and that order will change when I add a tempdb file. For larger files, I see that the order is not preserved with either one or multiple tempdb files. (The level at which order changes on a single file varies by SQL Server version, which made writing the repro scripts an interesting challenge!)

But in short: since we haven’t asked for the data to be stored in a particular order, SQL Server gives the data back to us as quickly as possible— and it may or may not be in order.

What Does this All Mean? Should We Still Use Multiple Files for TempDB?

I *think* it's safe...

Sure, multiple files for tempdb is still fine. This story really just reminded me about the fragility of the custom applications we all work with every day. Every application is based on some assumptions, you see.

The story has two morals:

  • We can never assume a production change is harmless.
  • Configuration matters, and should be set identically in development, test, and production environments.

When you’re making a change to a system, unless you’ve written or recently reviewed all of the code, you never know what assumptions the code is using. It’s easy for bits of code to creep in that rely on behavior that isn’t guaranteed.

Even for changes like this one which seem like they’re production-only, the change should run through a testing cycle before making its way live. And for most accurate results, that testing cycle should use production-sized datasets– because size does matter.

Previous Post
Five Ways to Fight Blocking Video
Next Post
You’ll be Reunited with an Old Friend in 2012

34 Comments. Leave new

  • The Real WTF is relying on SQL Server to return data in a consistant order without specifying it. I learned that my first year. 😀

    Reply
    • Heh– well to stick up for the coder out there somewhere, it can be more obvious in the case of doing inserts within TSQL rather than in bulk loading in some cases. Sometimes you have to load data files that don’t have an obvious good field to sort by, but have to be in order. When you get stuck in that situation and you test things and it looks like the order is working, you might be tempted to go with what (you think) works!

      So in terms of the pattern that happened to rely on what looked like the apparent behavior (although it wasn’t guaranteed), I’ve seen that play out in different environments a lot of times. It’s one of those hard parts of the SDLC to work with.

      Reply
  • Kendra:

    Really entertaining article. Technically though, the ‘problem’ was an incorrect assumption by the client, right?

    Reply
  • So, all you DBAs out there. Make sure you test every piece of code in your system before you add a file to TempDb.

    I’d argue that this is really something that should have been caught in the application code reviews and testing. It was a *bug* that could actually occur in other scenarios, besides adding a file to TempDb. I would never fault a DBA on this one.

    Reply
    • Oh, I definitely wouldn’t fault the DBA on this.

      The problem you have when you’re a DBA who ran a change that had an unintended consequence is that you get a lot of overtime no matter what! You need to dig in and help identify the problematic data that was generated and help write and deploy the cleanup. And then you don’t really want to get into the finger pointing thing too much about the cause, because that won’t help you in the long run.

      So by routing even configuration changes through a testing cycle, as a DBA you just have a better chance of helping save a bunch of work. It also gives you the chance to talk about SQL Server configuration and its impact on an environment, which can help bridge the gap between Dev and IT.

      Reply
  • Relying on the order of the rows in the temp table? Classic, and too funny! Thanks for sharing this tale of woe 🙂

    Reply
  • That is the nicest-looking bento box I have ever seen.
    I’ve learned through my own hastiness and general recalcitrance to use staging servers that one wee line of code and bring a whole front end down; this story reinforces that, albeit from a different angle.

    Reply
  • question about the sample code
    August 28, 2011 4:17 am

    Hi,
    I try to use your sample code to re-produce the issue, but I got a stranger error when I ran the following code –
    “Bulk insert failed due to a schema change of the target table.”

    actually, I just follow your code, did not make a change.
    so anything wrong, what should I do?
    thanks!

    declare @ct smallint=1;declare @results table (NumberOutOfOrder int); while @ct < 50begin if object_id('tempdb..#temp') is not null drop table #temp; create table #temp (N int); BULK INSERT #temp FROM 'C:\test\Numbers.dat' with (FORMATFILE='c:\test\Numbers.fmt', FIRSTROW=1, TABLOCK); alter table #temp add i int identity not null; insert @results(NumberOutOfOrder) SELECT count(*) FROM #temp where i N; set @ct += 1;end Select * from @results;go

    Reply
    • Hi Jacky,

      I’m not sure which version of SQL Server you’re using, but I’m able to reproduce that error message in SQL 2008 R2. My initial version ran for me perfectly on Denali CTP3, so looks like this got fixed!

      There’s some weirdness in SQL 2008 R2 in recreating and bulk loading into a temp table within a loop. All the commands work just fine outside of the loop when run in batches, even if run multiple times. The quickest way I can see to get around this is to use the ‘GO ###’ shortcut. That’s not something I use in production code, but since this is just demoing the sort order issue, that’s OK by me. 🙂

      I edited the script in the post and think that should work for you— at least if you’re running 2008 R2.

      *Note*: I tweaked the number of rows in the file to import to get the order /unordered repro to work in 2008 R2 as well.

      Reply
      • question about the sample code
        August 29, 2011 9:06 am

        Thanks for your help, I was working on a SQL Server 2008 R2, so it may be a issue of SQL Server 2008 R2. whatever, the problem has been resloved by your new code.
        Thanks for your efforts again.

        Reply
    • “Bulk insert failed due to a schema change of the target table.”

      Oh dear Lord! They’ve started to protect programmers with ANSI functionality. I really hope they don’t take this type of “protection” to the hilt like they did in Oracle because I count on being able to do things similar to this (changing the schema of a table while in-process).

      Reply
  • Great Post,

    keep like this, you all are the best for people who are begining in the DataBase world like me, everyday learning from the best.

    thanks you all.

    Reply
  • Thank you ………. 🙂

    Reply
  • The order might be get different or incorrect because of multiple files(multiple cpu core ) or parallelism . just a thought.

    Reply
  • @Kendra,

    This is a wonderful article and the two “morals” at the end of the article are absolutely spot on.

    This article also has a huge number of unintended and unstated lessons to be learned not the least of which are…

    1. Expect the unexpected in your code. Take the time to make it bullet proof. There’s an easy fix for this “import sort order” problem but the developer who wrote the code didn’t take the time to make the code bullet proof.

    2. Before you (not “you Kendra”… I mean anyone) try something esoteric in code, make sure that you understand the tools that you’re using. For example, the only reason why the “out of order” really happened is because adding an IDENTITY column is NOT guaranteed to do so in any given order. The addition of the extra TempDB file just pushed this fact to demonstrate that it’s true.

    3. If you don’t have the time to do item 1 above, shame on your manager. If you don’t have the time to do item 2 above, shame on you. 😉

    The following has nothing to do with the quality (which is absolutely superb IMHO) of this article…

    According to some of the posts I’m beginning to see here and there, some folks have misinterpreted what is being taught because they haven’t done enough of Item #2 above. For example, here’s the starting quote from one such post.

    “In one of the tempdb blog Here is link {the link leads to this article}, it is stated that sometimes when we add a new file to tempdb to improve the performane {sic} it might lead to data incorrection {sic} (in context of data sorting or order) ”

    As previously stated, nothing could be further from the truth. The only way for me to prove that would be to post some code that proves that the problem is actually the misuse of adding the IDENTITY column after the fact. With that thought in mind, what is the best way to send you good folks some documented code that demonstrates this fact so you can preview it and verify it before it’s posted (and I don’t mind if you post it instead of me)?

    Shifting gears back to the article… Kendra, it’s not my job to judge but you did a wonderful job on this article to demonstrate and emphasize the extreme importance of the two “morals of the story”. Thank you so much for taking the time to write such an awesome article.

    Reply
  • This was an excellent article. Thanks so much for the information.

    Reply
  • I heard this is only an issue if your tempdb files are split across different drives, but if you have them all on the same drive it will be okay. Is this true?

    Reply
    • Kendra Little
      May 14, 2013 4:44 pm

      That should not be true. But to be fair, I wouldn’t count on this working even if you have a *single* tempdb file.

      I don’t recall all the details of my tests, but I wrote “For larger files, I see that the order is not preserved with either one or multiple tempdb files.” So even with a single file, you could hit this issue, regardless of where the file is located. (But again, this isn’t a bug– if you don’t ask for the data to be sorted in a given order, it’s fair game for SQL Server to do whatever it needs to get you the data back in the fastest way possible, because you’ve said you don’t care about a sort order.)

      Reply
      • Thanks for the response, Kendra. I understand what you’re saying, it makes sense. I just wanted to avoid reviewing and testing 100+ stored procs before creating multiple tempdbs. The returned data is always in the correct order, but any new risk through going multi-tempdb is a concern. Great blog posts by the way, thanks for sharing your expertise and views.

        Reply
  • A good article! We are contemplating adding files to tempdb, and reading this made my heart sink at the thought of all the extra testing that would be required.

    I tried the scripts, but I found (as Jeff Moden noted above) that it is adding an identity column to a table which is the problem. I ran some different tests (script shown below) with the data being BCPed into a table that already has an identity column. After running for some minutes, there were no occasions where Num Id. Which is very reassuring.

    Having said that, SELECT statements with no ORDER BY clause will still be a problem. I am working on getting a demonstration of this issue. It would be very nice if there was a tool out there which could parse a stored procedure and report whether it contains such SELECTs. If anyone knows one, please don’t hesitate to give details!

    /*
    tmp50000.txt is a text file with numbers 1 to 50000
    tmp50000.fmt is the BCP format file, as shown below.

    10.0
    1
    1 SQLCHAR 0 12 “\r\n” 1 Num “”
    */
    use tempdb
    go
    create table tmp50000 (Num int, Id int identity not null )

    WHILE (@@ROWCOUNT = 0)
    BEGIN
    truncate table tmp50000

    exec xp_cmdshell ‘bcp tempdb.dbo.tmp50000 in 50000.txt -T -c -f 50000.fmt’, no_output

    if exists(select 1 from tmp50000 where Id Num) select * from tmp50000

    END

    Reply
    • Luis Patricio
      May 22, 2014 5:32 pm

      Hi

      First of all great article.

      Answering on the tool for detecting forced order SPs, that makes no sense in my opinion. As a basic rule of thumb, ordering should be done in the application when it’s needed (even on some big resultsets). That being said, a SP with no forced order isn’t a sign of ordering not being done somewhere else.

      This is also a lesson from this article and the comments:
      Don’t assume you are not missassuming.

      Do you agree?

      Reply
    • Steve,

      you may want to try this script (I am sorry, I cannot attribute it but my thanks to the author):

      select name from sys.procedures
      where OBJECT_DEFINITION(object_id) like ‘%order by%’

      Reply
  • Cristian Casanova
    November 26, 2015 3:35 pm

    I really found this article very helpful, thanks again.

    Reply
  • Hi Kendra ,
    I have a query which get calls almost 5 times in a same second and 4 of them takes 30-200 ms(which is good and fast enough) and one takes 6000ms( which is slow). And during the time of the long “slow call” of 6000ms the other web server calls the exact same query with exact same parameters a few times but is fast again and only take 50ms and still the first one which took 60000ms hasn’t been completed. I checked profiler an d the queries all the same , same parameter same reads , same CPU and even same execution plan .since queries have order by (select top(1)……..order by ) do you think that it might be temp db ? If so how can I figure it out that?

    Reply
  • few days back we have moved to 2014 from 2012 server and we kept the same settings as earlier like we are using 8 datafiles in Tempdb for better performance and all placed in single drive.As this is datawarehouse environment batch loads are running continuously and tempdb drive is getting full.Even though 98% free space in Database it could not able reuse the space and throwing disk is full.

    Is there any specific settings for tempdb to avoid these issues in SQL Sever 2014??

    Insufficient space in tempdb to hold row versions. Need to shrink the version store to free up some space in tempdb. Transaction (id=239368387 xsn=1273322 spid=126 elapsed_time=1590) has been marked as victim and it will be rolled back if it accesses the version store. If the problem persists, the likely cause is improperly sized tempdb or long running transactions. Please refer to BOL on how to configure tempdb for versioning.

    Any recommendation for tempdb performance improvement like enabling trace flag -T1118 ??

    Our server specifications are below.

    SQL Server detected 4 sockets with 15 cores per socket and 30 logical processors per socket, 120 total logical processors; using 120 logical processors based on SQL Server licensing.512 GB RAM.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.