Performance Tuning

You Probably Shouldn’t Index Your Temp Tables.

By Brent Ozar · August 19, 2021 · 47 comments

When you’ve got a process that uses temp tables, and you want to speed it up, it can be tempting to index the temp table to help work get done more quickly. However, in most cases – not all, but most – that’s a bad idea.

To explain why, I’m going to take a large Stack Overflow database and write a stored procedure:

CREATE OR ALTER PROC dbo.usp_TempTable_Heap 
	@DisplayName NVARCHAR(40) AS
	BEGIN
	CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR(40));
	INSERT INTO #MyUsers (Id, DisplayName)
		SELECT Id, DisplayName
		FROM dbo.Users;

	SELECT Id, DisplayName
	FROM #MyUsers
	WHERE DisplayName = @DisplayName;
	END
GO

CREATE OR ALTER PROC dbo.usp_TempTable_Heap

@DisplayName NVARCHAR(40) AS

BEGIN

CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR(40));

INSERT INTO #MyUsers (Id, DisplayName)

SELECT Id, DisplayName

FROM dbo.Users;

SELECT Id, DisplayName

FROM #MyUsers

WHERE DisplayName = @DisplayName;

END

The first statement loads about 10 million rows into a temp table, and the second statement only pulls out the rows that match who we’re looking for. Now, that probably seems dumb: you’re over there yelling, “Brent, why would you put millions of rows into a temp table that you don’t need?” The answer is that I need to write a blog post quickly in order to explain a concept that I’ve had to teach clients a few times. I can’t copy/paste their real code here. That would be Bad™. Instead, let’s just keep going with this game for a minute.

I’ll turn on time statistics so I can get a rough idea of where SQL Server is spending its time – the first statement or the second:

SET STATISTICS TIME ON;
EXEC dbo.usp_TempTable_Heap N'Brent Ozar';

1 2	SET STATISTICS TIME ON; EXEC dbo.usp_TempTable_Heap N'Brent Ozar';

I don’t often use time statistics here on the blog, so a quick explanation: in the Messages tab in SSMS, you get a line for each statement in the batch, plus a total:

 SQL Server Execution Times:
   CPU time = 4453 ms,  elapsed time = 4458 ms.

 SQL Server Execution Times:
   CPU time = 1031 ms,  elapsed time = 1017 ms.

 SQL Server Execution Times:
   CPU time = 5484 ms,  elapsed time = 5477 ms.

SQL Server Execution Times:

CPU time = 4453 ms, elapsed time = 4458 ms.

SQL Server Execution Times:

CPU time = 1031 ms, elapsed time = 1017 ms.

SQL Server Execution Times:

CPU time = 5484 ms, elapsed time = 5477 ms.

The first statement – loading the temp table – took about 4.5 seconds.

The second statement – finding Brent – took about a second.

Could an index on DisplayName speed up the second query?

To find out, let’s add a new version of our stored procedure – this time one that creates an index on DisplayName before our data is loaded:

CREATE OR ALTER PROC dbo.usp_TempTable_IndexAddedBeforeLoad
	@DisplayName NVARCHAR(40) AS
	BEGIN
	CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR(40));
	CREATE INDEX DisplayName ON #MyUsers(DisplayName); /* THIS IS NEW */
	INSERT INTO #MyUsers (Id, DisplayName)
		SELECT Id, DisplayName
		FROM dbo.Users;

	SELECT Id, DisplayName
	FROM #MyUsers
	WHERE DisplayName = @DisplayName;
	END
GO

CREATE OR ALTER PROC dbo.usp_TempTable_IndexAddedBeforeLoad

@DisplayName NVARCHAR(40) AS

BEGIN

CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR(40));

CREATE INDEX DisplayName ON #MyUsers(DisplayName); /* THIS IS NEW */

INSERT INTO #MyUsers (Id, DisplayName)

SELECT Id, DisplayName

FROM dbo.Users;

SELECT Id, DisplayName

FROM #MyUsers

WHERE DisplayName = @DisplayName;

END

Now, when we run the new query:

EXEC dbo.usp_TempTable_IndexAddedBeforeLoad N'Brent Ozar';

1	EXEC dbo.usp_TempTable_IndexAddedBeforeLoad N'Brent Ozar';

The time statistics paint a horrifying picture:

 SQL Server Execution Times:
   CPU time = 56594 ms,  elapsed time = 56772 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 75 ms.

 SQL Server Execution Times:
   CPU time = 56594 ms,  elapsed time = 56850 ms.

SQL Server Execution Times:

CPU time = 56594 ms, elapsed time = 56772 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 75 ms.

SQL Server Execution Times:

CPU time = 56594 ms, elapsed time = 56850 ms.

Sure, the second statement drops from 1,017 milliseconds to 75, but…who cares?!?! The additional overhead of building the index is much, much higher, making the query take ten times longer overall.

Part of the problem is that our heap (table) has to be loaded first, and then the data has to be sorted by DisplayName, and then an index has to be created on DisplayName. We can’t do them both in parallel at the same time because the nonclustered index has to be able to point back to a specific row in the heap – and to do that, we need its physical location, like I talk about in How to Think Like the Engine.

Fine. What about a clustered index?

We can reduce the overhead of the process by only having one structure to store the data. Instead of a heap plus a nonclustered index on DisplayName, we can just define a single structure for the temp table: a clustered index on DisplayName. It can’t be a unique index because multiple users share the same DisplayName, but that’s okay. Here we go:

CREATE OR ALTER PROC dbo.usp_TempTable_ClusteredIndex
	@DisplayName NVARCHAR(40) AS
	BEGIN
	CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR(40));
	CREATE CLUSTERED INDEX DisplayName ON #MyUsers(DisplayName); /* THIS IS NEW */
	INSERT INTO #MyUsers (Id, DisplayName)
		SELECT Id, DisplayName
		FROM dbo.Users;

	SELECT Id, DisplayName
	FROM #MyUsers
	WHERE DisplayName = @DisplayName;
	END
GO

CREATE OR ALTER PROC dbo.usp_TempTable_ClusteredIndex

@DisplayName NVARCHAR(40) AS

BEGIN

CREATE TABLE #MyUsers (Id INT, DisplayName NVARCHAR(40));

CREATE CLUSTERED INDEX DisplayName ON #MyUsers(DisplayName); /* THIS IS NEW */

INSERT INTO #MyUsers (Id, DisplayName)

SELECT Id, DisplayName

FROM dbo.Users;

SELECT Id, DisplayName

FROM #MyUsers

WHERE DisplayName = @DisplayName;

END

And when we execute it:

EXEC dbo.usp_TempTable_ClusteredIndex N'Brent Ozar';

1	EXEC dbo.usp_TempTable_ClusteredIndex N'Brent Ozar';

At first, it looks way faster than the last method:

 SQL Server Execution Times:
   CPU time = 42107 ms,  elapsed time = 18071 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 42107 ms,  elapsed time = 18145 ms.

SQL Server Execution Times:

CPU time = 42107 ms, elapsed time = 18071 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 42107 ms, elapsed time = 18145 ms.

Execution time dropped from 57 seconds down to 18 seconds, but there’s a catch. A big part of the reason why it’s faster is that now the query is going parallel. Note that CPU time is higher than elapsed time – that’s your clue that the query went parallel across more CPU cores. Now we’re going to have a CPU problem if a bunch of these queries run simultaneously.

And it’s still not as fast as our original solution, the heap.

So how do we make temp tables faster?

Only load them with the data you actually need. When loading temp tables – or any objects, really – be ruthless about filtering as early as possible.

If you’re only going to access the data once, leave it as a heap. Indexes make the most sense when the temp table is going to be reused repeatedly across lots of statements that all do filtering or joining or sorting using the same keys.

Try CTEs instead. If you have a multi-step process that involves a lot of filtering and joining, let SQL Server recalculate where the filtering logic should happen. I am by no means saying that CTEs are always better than temp tables – often it’s the reverse – but if you’re hitting a performance tuning wall on queries that use temp tables, try converting them to CTEs. SQL Server will reorder operations – it doesn’t have to execute the first CTE first, then the second CTE second, and so forth.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

47 comments

Nic Neufeld

August 19, 2021 at 3:32 pm

As is often the case with your advice, you deftly bait me into wanting to argue specific exceptions (never shrink a db! index frag doesn’t matter! and other greatest hits) when I actually agree with the principal, haha. I guess it depends how much control you have on the codebase…if your temp table is literally just going to be scanned and joined to another table, then yeah, indexes are useless overhead. Although sometimes we might have a large many-step proc with loads of temp tables…suboptimal design usually, but occasionally improved with careful indexing. Lipstick on a pig, maybe, but sometimes the index ends up improving things (when a major refactor isn’t practical).

I’ve also seen instances where we’ve had to break apart a monstrosity of a single statement (hundreds/thousands of lines with CTEs and subqueries) into more atomic steps with temp tables because the Optimizer just didn’t do great picking a “good enough” plan given the complexity. So there’s a balance to be struck with either artificially breaking apart a query into steps (not trusting the Optimizer and forcing the logic into separate stages) and collapsing it into one logical statement and trusting the Optimizer to do its thing. I’ve had good results moving things in either direction…depending on the query and the problems involved. But again, here we are in violent agreement as you acknowledge either might be superior, depending on the situation…and I agree with the principal of trying to let the Optimizer have the most freedom to do its thing (see also query hints) as a default, before trying to force its hand…it is usually better at it than us (until eventually it isn’t). As an aside, I wonder if some of the sequential temptable1-temptable2-temptable3 type coding is more a result of the developer trying to wrap their head around the problem logically, and that’s just how they work through it…even if it is suboptimal in performance (which likely they aren’t even thinking about at that point).

Reply
1. marcscirri VIP Student since 2019
  
  August 19, 2021 at 3:54 pm
  
  We have definitely had to refactor some larger T-SQL with multiple CTEs for performance. The rule that we follow is that if you are going to reference a CTE more than twice, or if you ever join a CTE to itself later down the chain, it’s time for a temp table!
  
  Reply
2. Brent Ozar
  
  August 19, 2021 at 4:00 pm
  
  Nic – I wonder if there’s anything in the title of the post itself that suggests that there might be exceptions.
  
  I mean, I know a lot of people won’t read the entire body of the blog post – but I wonder if we’ll someday get to the point where people will read the entire title?
  
  I guess we’ll never know. 😉
  
  Reply
  1. Jacob H
    
    August 19, 2021 at 4:07 pm
    
    Brent, you don’t understand.
    
    Someone is wrong on the internet.
    
    https://xkcd.com/386/
    
    Reply
  2. Nic
    
    August 19, 2021 at 5:13 pm
    
    Honestly I was making fun of my own fault there, not yours…as may not have been obvious, I suppose. Just that you have a funny gift for making things sound sweeping (like the examples I mentioned) which ignites the pedantic types (cough cough) into argument mode, when you actually did allow for exceptions…and I ended up realizing, yeah I agree. I did read your post…I was actually referring to your last paragraph where you say “often it’s the reverse” in my last paragraph. Ah well, tone can be hard to convey on the internet…I meant no criticism, was just commenting on the way it made me (incorrectly) start to think argumentatively…I will say, that isn’t a bad social media gift! (steak-umm bless) 😉
    
    Reply
    1. Brent Ozar
      
      August 19, 2021 at 5:15 pm
      
      AH! Hahaha, got it.
      
      I do love using the grab-’em-by-the-shoulders-and-shake-’em headline style because it makes people rethink their assumptions. If I titled it, “The Pros and Cons of Indexing Temp Tables,” it would sail through the night without anyone reading it.
      
      Like, uh…the product documentation, sadly. (sigh)
      
      Reply
      1. Nic
        
        August 19, 2021 at 5:35 pm
        
        Exactly…its like a benign sort of “motte and bailey”, the bailey being a slightly more controversial sounding title, the motte being its uncontroversial and nuanced contents. If you write “Never Ever Use DBCC Shrinkfile Again So Help Me God I Will Find You” or “DBAs Who Still Use Profiler Should Shampoo My …” [err nevermind, Jack Nicholson quote], it drives engagement with a bunch of us argumentative types that might have glossed over articles titled “Excessive Use of Shrinkfile is Costing You Fragmentation and IO” or “Time to Get On Board the Extended Events Train You Guys!”….even if the article itself is pretty nuanced and balanced. Anyway, it got me here…which made me laugh at myself, as I fell for the same thing…I’m just saying you’re good at it 🙂 Cheers…
Alex Zhuravel

August 19, 2021 at 3:41 pm

Thanks Brent,
brief and to the point.

Reply
marcscirri VIP Student since 2019

August 19, 2021 at 3:51 pm

Typically I only ever index temp tables if I am going to query those columns more than once. And when I do create an index, I do it AFTER the table is already loaded. It is much more efficient that way.

Reply
Paul Nielsen

August 19, 2021 at 4:17 pm

I’ve found the same thing. I tend to want to index my (rare) temp tables, but then once I’m in testing with significant data, I find that the index costs more than it benefits.

Reply
Bob Frasca

August 19, 2021 at 4:28 pm

I would index the temp table AFTER it is loaded.

Reply
1. Brent Ozar
  
  August 19, 2021 at 4:29 pm
  
  Great. Do the above demo with that technique and see how it goes.
  
  (Spoiler alert: it isn’t faster than the heap approach.)
  
  Reply
  1. Bob Frasca
    
    August 19, 2021 at 4:33 pm
    
    Chuckle. I’ll take your word for it. However, it is faster than indexing before you load.
    
    Reply
    1. Brent Ozar
      
      August 19, 2021 at 4:36 pm
      
      Wow, twice in a row you’ve made a statement as fact without testing it and showing documentation, despite someone building the entire demo for you and giving you a free database to test it with. Interesting strategy.
      
      Tell me, why do you bother reading blogs when you’ve clearly got all of the correct answers?
      
      Reply
      1. Glen Knickerbocker
        
        February 20, 2025 at 8:11 pm
        
        Sorry, but while I appreciate the content, it’s a gross oversight to omit post-population indexing in your “entire” demo. I have optimized procs by an order of magnitude using indexes on temp tables in SQL Server.
      2. Brent Ozar
        
        February 21, 2025 at 4:58 am
        
        Ah, you’re about to discover something amazing! Adding an index afterwards simply causes recompiles, which improves performance for a totally different reason: https://www.brentozar.com/archive/2020/11/paul-white-explains-temp-table-caching-3-ways/ I bet you’ll love discovering that! Enjoy.
Jesse

August 19, 2021 at 5:48 pm

I agree! Why index a temporary table when you should filter before it gets there? Surely the source has indexes already created that would speed up fetching applicable rows.

In regards to CTE’s: What is your view on joining CTEs within CTEs? I don’t know how to exactly describe this, almost like nesting them in a fall-through manner, where your first CTE is used as a join in the second to filter and onward, perhaps even including row_number() to select first instances, etc…

Reply
1. Brent Ozar
  
  August 19, 2021 at 5:50 pm
  
  CTEs are pretty far outside the scope of this blog post, but I do talk about ’em in my Mastering Query Tuning class. (Just got done with that module yesterday afternoon so it’s fresh in my mind.)
  
  Reply
  1. Jesse
    
    August 19, 2021 at 5:56 pm
    
    Ah, I asked because you mentioned using CTEs in place of indexed temporary tables. I have watched nearly all of your videos and just got signed up for the free classes that were appended to this blog notification email. Very excited for those! They aren’t the mastering classes, but I enjoy the fundamentals and look forward to the refresher. Perhaps next years training budget can be used to get my hands on some of those mastering classes.
    
    Reply
Nic

August 19, 2021 at 5:54 pm

Just for grins and giggles, I ran a quick audit of a couple of a custom code dbs (where in house developers put their stored procs):

select object_name(object_id),
SUBSTRING(definition,PATINDEX(‘% INDEX %’,UPPER(definition))-20,100)
from sys.sql_modules where UPPER(definition) like ‘% INDEX %’

Spotchecking it, looks like close to 300 procs have at least one temp table index creation! I ordered it by LEN(definition) ascending to pick a small, simple one, and I’ll play around with it, see where the index falls on the futility-usefulness scale. Fun exercise, thanks Brent!

Reply
1. Nic
  
  August 19, 2021 at 6:02 pm
  
  In my sample proc…the time difference wasn’t really noticeable and the logical reads diff was slight (112k vs 113k…index creation added only 1k reads), but the real painful thing: TABLE SCAN in the plan. Optimizer threw that index straight into the trash and went for the heap instead. Oh well, its the thought that counts right? LOL
  
  Reply
  1. Brent Ozar
    
    August 19, 2021 at 6:53 pm
    
    Nic – dang! Don’t you hate it when that happens?
    
    Reply
Bob Frasca

August 19, 2021 at 5:55 pm

I don’t mind being proven wrong and I would say so if I were; however, I don’t have that database on my machine(s) to test it. You, on the other hand could move your create index after the insert fairly quickly and run the test. I’m the first to admit that what I’ve learned over 25 years sometimes seems totally inadequate which is why I read your blog. Most of what you say confirms what I’ve learned through trial and error dating back to SQL Server 6.5 and usually I don’t have any argument with anything you say. I wasn’t really arguing with you here and I meant no disrespect. I just know that I have used the strategy that I described, successfully, with order of magnitude improvements in performance over the years. Sometimes it doesn’t work worth a damn in which case I just skip the index altogether as you suggested.

Thank you for continuing to share your knowledge, I have a great deal of respect for what you have to say.

Reply
1. Brent Ozar
  
  August 19, 2021 at 5:57 pm
  
  Bob – the database is freely available from the very first hyperlink in the post.
  
  I wrote the scripts, and gave them to you. I gave you the database. I packaged the whole thing up in a blog post that’s all completely free for you to read.
  
  I draw the line at running your own idea queries for you. Put some skin in the game.
  
  Reply
  1. Dan B
    
    August 23, 2021 at 1:30 pm
    
    I used the StackOverflow 50gb database, and added 2 stored procedures with index created AFTER for both the nonclustered and clustered approaches.
    
    Heap is definitely the fastest. But it looks like usp_TempTable_IndexAFTER was able to take advantage of parallelization way better.
    
    usp_TempTable_Heap
    CPU time = 1154 ms, elapsed time = 752 ms
    
    usp_TempTable_Index
    CPU time = 9126 ms, elapsed time = 8972 ms
    usp_TempTable_IndexAFTER
    CPU time = 7345 ms, elapsed time = 1681 ms
    
    usp_TempTable_ClusteredIndex
    CPU time = 6250 ms, elapsed time = 1764 ms
    usp_TempTable_ClusteredIndexAFTER
    CPU time = 7156 ms, elapsed time = 1679 ms
    
    —————–
    I sometimes put a PK on my temp tables to catch a data error, if I need to and if it doesn’t cost too much in performance… But that’s it.
    
    Reply
    1. Brent Ozar
      
      August 23, 2021 at 1:34 pm
      
      Very good! Just to recap: even in the best case scenario, the index still doubled the query runtime and used 3-4x more CPU.
      
      Reply
Jeff Student since 2017

August 19, 2021 at 6:07 pm

Well, I’ve just emerged from a rabbit hole this post sent me down. But it was a good one because I found an tuning opportunity because of this post. In our work (i.e. our Bad™) I see cases where we’re creating temp tables with ID as Primary Key with Identity! Here’s my test proc:

CREATE OR ALTER PROC dbo.usp_TempTable_Heap2
@DisplayName NVARCHAR(40) AS
BEGIN
CREATE TABLE #MyUsers2 (Id INT IDENTITY (1, 1) Primary key, DisplayName NVARCHAR(40));
INSERT INTO #MyUsers2 (DisplayName)
SELECT DisplayName
FROM dbo.Users;

SELECT Id, DisplayName
FROM #MyUsers2
WHERE DisplayName = @DisplayName;
END
GO
SET STATISTICS TIME ON;
EXEC dbo.usp_TempTable_Heap2 N’Brent Ozar’;
SQL Server Execution Times:
CPU time = 983 ms, elapsed time = 364 ms.

I found that if I compared that with yours I’d see

CPU time = 1062 ms, elapsed time = 299 ms.

so a 65ms difference. Not earth shattering but better, but maybe even more important I read “be ruthless about filtering as early as possible” a few times and booyah I started seeing opportunities to improve our (anti)pattern above. In fact, why generate an identity when the table you’re selecting from already has one! And I see other places where we put fields into a temp table and honestly, they’re not even used (maybe once they were, but not presently). Good times.

Thank you Brent! Off to refactor-town!

Reply
1. Brent Ozar
  
  August 19, 2021 at 6:53 pm
  
  Jeff – woohoo, glad I could help!
  
  Reply
Wojciech Sawicki

August 20, 2021 at 6:00 am

Thank you Brent
I have learned a lot.
I am just proposing to slightly change the rule.
If you’re only going to access the data once, leave it as a heap.
On
If mssql only going to access the data once, leave it as a heap.
If select will be

SELECT m.Id, m.DisplayName,p.AnswerCount
FROM #MyUsers m
inner join [dbo].[Posts] p on p.OwnerUserId=m.Id
WHERE DisplayName = @DisplayName;

and argument =John
The index version wins.

Reply
1. Brent Ozar
  
  August 20, 2021 at 6:32 am
  
  You’re welcome, Wojciech. Now, rather than just saying something wins, write out the full repro like I do above with the T-SQL and the statistics for each version. You wouldn’t just want to make unproven statements, right? Thanks.
  
  Reply
Varun Deshpande Student since 2021

August 20, 2021 at 8:47 am

The devs at my friend’s office use temp tables so much that he ended up telling them not to create any indexes on the base tables as no queries run on the actual tables, lol. Needless to say their tempdb was in terabytes.

Reply
Hila DG

August 22, 2021 at 1:56 am

A true story:
In a long proc, the optimiser recommended adding an index on a temp table. I vividly recall the results: exactly the same length of time to run the two versions, the one with the Index took even slightly longer. Thanks – not only for the explanation, but also for providing other possible solutions.

Reply
Andrew Hill

August 23, 2021 at 6:34 am

my main use of temp tables is to prevent multiple sorting or complex queries;

WITH myData as(select things)
SELECT t1 from myData d1 join myData d2 on d1.it = d2.that

and then note that the myData sub-query is always run twice 🙁
It is much worse when you have a second step

WITH myData as(select things),
myData2 as (select t1 from myData d1 join myData d2 on d1.it = d2.that)
SELECT allTheThings from MyData2 d1 join myData2 d2 on this=that

will run the MyData subquery 4 times.

Reply
Alex Friedman

August 23, 2021 at 2:45 pm

Ha, I was sure you were going to talk about recompilations. But yeah, totally. Show people how to do performance tuning on a specific case by breaking apart a huge query, materializing the most selective filtering first into a temp table — and next thing you know, they’re splitting up every query into temp tables stages because the DBAs said that’s the way to do performance tuning! Explain why that doesn’t work, and here come the monster queries again. Circle of life.

Reply
1. Brent Ozar
  
  August 23, 2021 at 2:49 pm
  
  I can’t do justice to that one in a single post, and I’ve covered it before by linking to Paul White’s excellent series on plan reuse. 😉
  
  Reply
  1. Alex Friedman
    
    August 23, 2021 at 2:53 pm
    
    Oh for sure, and if anything the case you described is the more common and more painful. I was actually pleasantly surprised 🙂
    
    Reply
    1. Brent Ozar
      
      August 23, 2021 at 2:55 pm
      
      Woohoo! Thanks man!
      
      Reply
The Midnight Crooner

August 22, 2022 at 9:23 pm

sp_BlitzFirst @ExpertMode = 1 may be leading us into temptation by asking us to put indexes on some of our temp tables with this finding:

Forwarded Fetches/Sec High: TempDB Object
https://www.brentozar.com/go/fetch/

Assuming we can find all the sprocs creating temp tables, I guess if we do that just need to measure sproc performance before and after applying the new NC index on temp table to see if it was worth it .

Danke Schoen

Reply
1. Brent Ozar
  
  August 22, 2022 at 9:25 pm
  
  Nonclustered indexes don’t fix forwarded fetches. Time to read the docs a little more closely, or report to my Mastering Index Tuning class. Cheers!
  
  Reply
Jeff

August 22, 2022 at 10:18 pm

I wonder if there is a big difference between MySQL vs SQL Server here. I’ve recently found a lot of success by moving data into temp tables and running updates on the temp tables rather than trying to do ad hoc reports with joins. Indexes improve this significantly. So instead of 2 or 3 subqueries I end up with a 3 or 4 step query with temp tables, including adding an index. This whole process seems to be several orders of magnitude faster. Like 2 seconds vs 2 hours.

An example would be something like: instead of:
SELECT fields from a left join b on a.email = b.email where b is null

instead do:
INSERT INTO #tmp (SELECT fields from a);
add index on email to #tmp;
delete from #tmp where email in (b.email);
SELECT * FROM #tmp;

Sorry for the butchered code there, trying to keep it abstract rather than get into too many specifics.

This is one of those things where clearly something about the way the join is working behind the scenes is not efficient. It takes way too long. So I rewrote it the way I want the engine to do the join and sure enough it’s way faster. It might just be the data I’m working with.

Am I misunderstanding what you are saying, or do you think this is a MySQL vs SQL server thing, or do you think it might just be the particular dataset I am dealing with? I know it’s hard to say without having the data in front of you I was just wondering why the lesson I’ve learned most recently is almost the exact opposite of the title here 🙂

Reply
1. Brent Ozar
  
  August 22, 2022 at 10:23 pm
  
  Just to make sure I understand: you’re asking me about a database I don’t blog about, on a dataset I don’t have?
  
  And you want me to work on that for free?
  
  Reply
Andrei

August 23, 2022 at 7:45 pm

I’ve noticed you rarely get higher performance out of indexing temp tables, however, I recently ran across an instance where I did. it was a view with about 400-500 columns, with about 60-70 other smaller views joined in (go figure they were having CPU issues compiling this monstrosity every time).

Due to archaic application design, they weren’t able to change the call to the view into a call to a stored proc, so the overall thing still had to be a view. I ended up making a temp table with the one key that would join most of those views/tables, and all of the smaller views I turned into CTEs. Joining the temp table into the CTEs further reduced the execution time for each CTE – but indexing the temp table did improve it further and the sort operators in each of those 60+ CTEs either went away or took a lot less time. Pretty much on par with well, the reverse of your recommendation – it was a temp table that was very filtered (1 int column with about 10k rows), reused about 60-70 times, and so on.

That being said, I never want to see one of those views in my life ever again.

Reply
1. Brent Ozar
  
  August 23, 2022 at 7:55 pm
  
  Nice! Yeah, “reused about 60-70 times” is a great sign where indexing makes a lot of sense. Good call. Most of the times when I see folks indexing temp tables, they’re only hitting the table once or twice.
  
  Reply
Gene B. Torres VIP Student since 2023

September 29, 2023 at 2:21 pm

Isn’t it the rule to always put TABLOCK when loading temp tables that are created explicitly. The performance is much better when you load the data.

Reply
1. Brent Ozar
  
  September 29, 2023 at 4:47 pm
  
  That’s a new one on me – after all, you’re the only one who could be accessing your temp table anyway – but I’d love to read about your experiments to prove that statement. That’d be really interesting.
  
  Reply
Thomas VIP Student since 2024

October 13, 2024 at 8:34 am

Would it be better to create the index after all the data has been loaded into the temp table?
Wouldn’t creating the index on the temp table will have the latest statistics being updated and thereby making the query run with an optimal plan which could be better than not having an index even for using the temp table once?

Reply
1. Brent Ozar
  
  October 13, 2024 at 12:34 pm
  
  Give it a shot and find out! That’s why I use open source databases and publish my scripts here on the blog, so you can run your own experiments to test your ideas. Go for it!
  
  Reply

You Probably Shouldn’t Index Your Temp Tables.

Could an index on DisplayName speed up the second query?

Fine. What about a clustered index?

So how do we make temp tables faster?

Get my new posts by email

Keep digging

47 comments

Leave a comment Cancel reply