Blog

If you’re only indexing to make reads faster, you need to think again. Indexes can make every operation in the database faster, even deletes.

The Problem: Deletes are Very Slow

I was lazily researching new development techniques one day when I received an email from a client asking why deletes could be slow. I rattled off a few quick possibilities and promised that I’d look into it as soon as I was able to. Due to the vagaries of travel, it took me more time than I’d expected to dig into the problem, but I found something interesting that I should have come to mind right from the start.

This database contains some hierarchical data. My initial thought was that there was a cascading delete taking place in the hierarchy. After some initial checks into cascading deletes, fragmentation, statistics, and triggers, I ran an actual delete on a test system and found something very interesting – almost all of the time spent deleting the row was spent performing a clustered index scan on a different table.

What Happens During a Delete?

When you try to delete a row, a few things happen. SQL Server says “OK, let’s make sure that we can actually delete this row, what else depends on it?” SQL Server will check for dependent rows by examining all foreign keys. It will then check any related tables for data. If there is an index, SQL Server will use that index to check for related data. If there isn’t an index, though, SQL Server will have to scan the table for data.

Deletes and Table Scans

Don’t believe me? Try this out yourself.

Make a new database. Copy data in from the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables in AdventureWorks. I use the Import Data wizard to quickly copy data from one database to another.

ALTER TABLE Sales.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader PRIMARY KEY (SalesOrderID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT PK_SalesOrderDetail PRIMARY KEY (SalesOrderDetailID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader
FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)
ON DELETE CASCADE;

With these three statements in place, we’re able to create a situation where SQL Server has to perform a full table scan just to delete a single row. Make sure you’ve told SQL Server to include the actual execution plan and run this:

DELETE FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 51721;

Once that query runs, the execution plan is going to look a bit like the execution plan below. If you add it up, 99% of the work comes from finding the rows to delete in the SalesOrderDetail table and then actually deleting them.

Who knew that deleting a row was so much work?

Making Deletes Faster

How would we go about making deletes like this faster? By adding an index, of course. Astute readers will have noticed the missing index information in that execution plan I took a screenshot of. In this case, the missing index looks something like this:

CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderID
ON Sales.SalesOrderDetail(SalesOrderID);

Before adding the index, the query had a cost of 2.35678. After adding the index, the delete has a cost of 0.0373635. To put it another way: adding one index made the delete operation 63 times faster. When you have a busy environment, even tiny changes like this one can make it faster to find and delete records in the database.

What’s All of This Mean?

When you’re looking into database performance problems, remember that you aren’t always reading just to return data to the user, sometimes you need to find data in order to delete it. Even when we’re trying to get rid of data, it can be helpful to have an index to make deletes go faster.

Update: Also Use Fast Ordered Deletes – the Microsoft SQL Customer Advisory Team wrote about the Fast Ordered Delete technique of using a view or a CTE.

↑ Back to top
  1. Dealing with this exact issue on a project right now. I have a table that has relations to 48 other tables. Correct indexes on the related tables makes the delete orders of magnitude faster.

  2. Not tested it but I would I be right in thinking a filtered index could improve things even more?

    • As long as you can identify a good filtered index, absolutely. If you’re deleting every incomplete order that’s more than 30 days old, for instance, you could create a filtered index like so: CREATE INDEX IX_OrdersThatNeedToGo ON dbo.Orders (OrderId) WHERE Complete = 0 AND OrderDate < ’2011-04-31′

      There are all kinds of fun things that you can do to make things run faster. The Rube Goldberg machine is yours to create ;)

  3. This is awesome Jeremiah, I tell people about indexing for deletes all the time. I’m glad to see others pushing it too.

    If you’ll let me plug a script I wrote a while ago: http://michaeljswart.com/2009/04/indexing-foreign-keys/ It identifies tables with foreign keys that are not indexed and measures (roughly) how expensive a delete statement for a row on that table would be.

  4. I am working through your example and had an issue creating the foreign key. I modified your query to get it to work.

    ALTER TABLE Sales.SalesOrderDetail
    ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader
    FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)
    ON DELETE CASCADE;

  5. It is common wisdom that you should create an index for every foreign key. It is not obvious why this is a good idea if you run a missing indexes query and don’t find any. However, like you showed, these FK indexes make a huge difference for deletes. If it is a table where very few deletes are ever done and you don’t mind them being slow, don’t worry about it. Otherwise, this is a big win.

  6. Pingback: Something for the Weekend – SQL Server Links 02/09/11

  7. A reasonable next step would be to find the foreign keys that don’t have indexes, and to then create ones that seem useful. I put together a brief blog post (my very first!) around a script I’ve been using to help with that. What do you think of this?

    http://m60freeman.blogspot.com/2011/09/finding-foreign-keys-without-indexes.html

  8. Great, simple article – but, a lot of value!

    I myself had encountered a need to delete large amounts of data from a heap, and used a non-clustered index to speed things up. Later, I wrote about my entire research on SSC here – http://www.sqlservercentral.com/articles/T-SQL/72606/

  9. Hi Jeremiah,

    I am just wondering if the same principle discussed in your atricle applies to table UPDATEs? Is non-clustered index still a good idea for performance improvement or heap will do? It would be interesting to know it in context performance vs data storage.

    • Foreign keys will have to be checked on an update, too. Indexes on other tables will definitely help update performance. Indexes on a heap may not help you as much as you’d like due to row forwarding pointers that are created whenever you update a row in a heap. It’s best (for OLTP applications at least) to have a clustered index on all tables that you’re using in regular querying.

  10. Pingback: Webcast Recording – DBA Darwin Awards: Index Edition | Brent Ozar PLF | Brent Ozar PLF

  11. Hey Jeremiah,

    This was a great article. We had an incident today where a single delete which should have been quick and efficient was taking almost a minute to execute. Turns out it was because there were about 12 other tables that had an ID column in this table as a FK relationship.

    Anyway, as I was thinking about this it brought up a question. Do you know how the locking works when this is going on? Does SQL take out a X lock on the row it wants to delete WHILE it’s traversing the other tables to determine if the delete can be done? Or does it take out something less restrictive while it figures out if the delete is valid?

    • By default, SQL Server is going to take out an X lock when it begins a delete. There may be a brief IX lock to tell everyone else to get out of the way before the delete goes through, but that’s about it. You may see key range locks on the child rows of the foreign key as SQL Server decides this is the best locking mechanism (as opposed to row locks). This is all done to make sure that nobody inserts rows into child tables that are about to be deleted.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php