Indexing for Deletes

Last Updated 7 years ago

If you’re only indexing to make reads faster, you need to think again. Indexes can make every operation in the database faster, even deletes.

The Problem: Deletes are Very Slow

I was lazily researching new development techniques one day when I received an email from a client asking why deletes could be slow. I rattled off a few quick possibilities and promised that I’d look into it as soon as I was able to. Due to the vagaries of travel, it took me more time than I’d expected to dig into the problem, but I found something interesting that I should have come to mind right from the start.

This database contains some hierarchical data. My initial thought was that there was a cascading delete taking place in the hierarchy. After some initial checks into cascading deletes, fragmentation, statistics, and triggers, I ran an actual delete on a test system and found something very interesting – almost all of the time spent deleting the row was spent performing a clustered index scan on a different table.

What Happens During a Delete?

When you try to delete a row, a few things happen. SQL Server says “OK, let’s make sure that we can actually delete this row, what else depends on it?” SQL Server will check for dependent rows by examining all foreign keys. It will then check any related tables for data. If there is an index, SQL Server will use that index to check for related data. If there isn’t an index, though, SQL Server will have to scan the table for data.

Deletes and Table Scans

Don’t believe me? Try this out yourself.

Make a new database. Copy data in from the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables in AdventureWorks. I use the Import Data wizard to quickly copy data from one database to another.

ALTER TABLE Sales.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader PRIMARY KEY (SalesOrderID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT PK_SalesOrderDetail PRIMARY KEY (SalesOrderDetailID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader
FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)
ON DELETE CASCADE;

ALTER TABLE Sales.SalesOrderHeader

ADD CONSTRAINT PK_SalesOrderHeader PRIMARY KEY (SalesOrderID);

ALTER TABLE Sales.SalesOrderDetail

ADD CONSTRAINT PK_SalesOrderDetail PRIMARY KEY (SalesOrderDetailID);

ALTER TABLE Sales.SalesOrderDetail

ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader

FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)

ON DELETE CASCADE;

With these three statements in place, we’re able to create a situation where SQL Server has to perform a full table scan just to delete a single row. Make sure you’ve told SQL Server to include the actual execution plan and run this:

DELETE FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 51721;

1 2	DELETE FROM Sales.SalesOrderHeader WHERE SalesOrderID = 51721;

Once that query runs, the execution plan is going to look a bit like the execution plan below. If you add it up, 99% of the work comes from finding the rows to delete in the SalesOrderDetail table and then actually deleting them.

A Delete — Who knew that deleting a row was so much work?

Making Deletes Faster

How would we go about making deletes like this faster? By adding an index, of course. Astute readers will have noticed the missing index information in that execution plan I took a screenshot of. In this case, the missing index looks something like this:

CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderID
ON Sales.SalesOrderDetail(SalesOrderID);

1 2	CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderID ON Sales.SalesOrderDetail(SalesOrderID);

Before adding the index, the query had a cost of 2.35678. After adding the index, the delete has a cost of 0.0373635. To put it another way: adding one index made the delete operation 63 times faster. When you have a busy environment, even tiny changes like this one can make it faster to find and delete records in the database.

What’s All of This Mean?

When you’re looking into database performance problems, remember that you aren’t always reading just to return data to the user, sometimes you need to find data in order to delete it. Even when we’re trying to get rid of data, it can be helpful to have an index to make deletes go faster.

Update: Also Use Fast Ordered Deletes – the Microsoft SQL Customer Advisory Team wrote about the Fast Ordered Delete technique of using a view or a CTE.

The Dedicated Admin Connection: Why You Want It, When You Need It, and How To Tell Who’s Using It

How Do You Mask Data for Secure Testing?

21 Comments. Leave new

Jack Corbett
September 1, 2011 8:27 am

Dealing with this exact issue on a project right now. I have a table that has relations to 48 other tables. Correct indexes on the related tables makes the delete orders of magnitude faster.

Reply
Ally
September 1, 2011 10:14 am

Not tested it but I would I be right in thinking a filtered index could improve things even more?

Reply
- Jeremiah Peschka
  September 1, 2011 10:18 am
  
  As long as you can identify a good filtered index, absolutely. If you’re deleting every incomplete order that’s more than 30 days old, for instance, you could create a filtered index like so: CREATE INDEX IX_OrdersThatNeedToGo ON dbo.Orders (OrderId) WHERE Complete = 0 AND OrderDate < '2011-04-31' There are all kinds of fun things that you can do to make things run faster. The Rube Goldberg machine is yours to create 😉
  
  Reply
Michael J Swart
September 1, 2011 12:18 pm

This is awesome Jeremiah, I tell people about indexing for deletes all the time. I’m glad to see others pushing it too.

If you’ll let me plug a script I wrote a while ago: http://michaeljswart.com/2009/04/indexing-foreign-keys/ It identifies tables with foreign keys that are not indexed and measures (roughly) how expensive a delete statement for a row on that table would be.

Reply
- Ruud van de Beeten
  August 25, 2012 5:51 am
  
  Michael J Swart, I like your article a lot. It’s a clear and simple post to read and understand. I think your query that uses DMV’s to check for ‘missing indexes’ on foreign keys could be of a great help.
  
  Reply
  - Jeremiah Peschka
    August 25, 2012 9:10 am
    
    Something tells me that you meant to comment on Finding Your Missing Keys.
    
    Reply
    - Michael J Swart
      August 28, 2012 3:12 pm
      
      That’s a new one. Don’t you usually go by Brent when posting on this site?
      
      Reply
Garry Bargsley
September 1, 2011 3:20 pm

I am working through your example and had an issue creating the foreign key. I modified your query to get it to work.

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader
FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)
ON DELETE CASCADE;

Reply
- Jeremiah Peschka
  September 1, 2011 4:45 pm
  
  Thanks for catching that. I’ve updated the post with the change.
  
  Reply
Mark Freeman
September 1, 2011 4:53 pm

It is common wisdom that you should create an index for every foreign key. It is not obvious why this is a good idea if you run a missing indexes query and don’t find any. However, like you showed, these FK indexes make a huge difference for deletes. If it is a table where very few deletes are ever done and you don’t mind them being slow, don’t worry about it. Otherwise, this is a big win.

Reply
Mark Freeman
September 2, 2011 12:46 pm

A reasonable next step would be to find the foreign keys that don’t have indexes, and to then create ones that seem useful. I put together a brief blog post (my very first!) around a script I’ve been using to help with that. What do you think of this?

http://m60freeman.blogspot.com/2011/09/finding-foreign-keys-without-indexes.html

Reply
- Jeremiah Peschka
  September 2, 2011 4:00 pm
  
  That’s some good stuff, Mr Freeman! Keep it up.
  
  Reply
Nakul Vachhrajani
September 11, 2011 11:42 am

Great, simple article – but, a lot of value!

I myself had encountered a need to delete large amounts of data from a heap, and used a non-clustered index to speed things up. Later, I wrote about my entire research on SSC here – http://www.sqlservercentral.com/articles/T-SQL/72606/

Reply
Pavel Nefyodov
September 26, 2011 3:58 am

Hi Jeremiah,

I am just wondering if the same principle discussed in your atricle applies to table UPDATEs? Is non-clustered index still a good idea for performance improvement or heap will do? It would be interesting to know it in context performance vs data storage.

Reply
- Jeremiah Peschka
  September 26, 2011 8:12 am
  
  Foreign keys will have to be checked on an update, too. Indexes on other tables will definitely help update performance. Indexes on a heap may not help you as much as you’d like due to row forwarding pointers that are created whenever you update a row in a heap. It’s best (for OLTP applications at least) to have a clustered index on all tables that you’re using in regular querying.
  
  Reply
Shane Thompson
October 17, 2012 1:46 pm

Hey Jeremiah,

This was a great article. We had an incident today where a single delete which should have been quick and efficient was taking almost a minute to execute. Turns out it was because there were about 12 other tables that had an ID column in this table as a FK relationship.

Anyway, as I was thinking about this it brought up a question. Do you know how the locking works when this is going on? Does SQL take out a X lock on the row it wants to delete WHILE it’s traversing the other tables to determine if the delete can be done? Or does it take out something less restrictive while it figures out if the delete is valid?

Reply
- Jeremiah Peschka
  October 19, 2012 8:31 am
  
  By default, SQL Server is going to take out an X lock when it begins a delete. There may be a brief IX lock to tell everyone else to get out of the way before the delete goes through, but that’s about it. You may see key range locks on the child rows of the foreign key as SQL Server decides this is the best locking mechanism (as opposed to row locks). This is all done to make sure that nobody inserts rows into child tables that are about to be deleted.
  
  Reply
Kent McConnell
August 22, 2014 11:22 am

I also was able to reproduce this issue in SQL 2008 even if you don’t have the on Delete Cascade property set on the foriegn key. It still checks the other tables on a deletion and can be helped by adding the appropriate indexes.

Reply
Ned Bakelman
April 30, 2019 8:21 am

This is excellent. Thank you for a clear and concise explanation and example.

Reply
Tooraj Azizi
September 29, 2019 3:50 am

for a very big table I tried to delete some data , it took long time to complete while it has some indexes on it,
after REBUILDING indexes the delete goes faster, tell me WHY?

Reply
- Brent Ozar
  September 29, 2019 11:26 pm
  
  Tooraj – for questions, head to a Q&A site like https://dba.stackexchange.com.
  
  Reply