Normally when you run a DELETE statement in a database, the data’s gone.
With the soft delete design pattern, you add a bit column like IsDeleted, IsActive, or IsArchived to the table, and instead of deleting rows, you flip the bit column. This can buy you a few advantages:
- Easier/faster undeletes
- History tracking (keeping deleted rows around for auditing purposes, although the method I’m going to show here doesn’t do anything fancy like track who did the delete)
- Easier reconciliation during disaster recovery failovers (refer to the Senior DBA class module on recovering from failovers for more info)
- Lower workload for Availability Groups secondaries (neat scenario from Aaron Bertrand)
I’ll illustrate it by implementing soft deletes on the Stack Overflow Users table. (Any version/size will do, but if you want to replicate my exact query metrics, I’m using the 2018-06 version.)
First, we need to add the IsDeleted column:
ALTER TABLE dbo.Users
ADD IsDeleted BIT NOT NULL DEFAULT 0;
No matter how big the Users table is, as long as there’s no one else in the table at the moment, that change takes effect instantly – no matter how large the table is! Hot dog, I really like SQL Server. Makes these kinds of things easy – it’s implemented as a metadata-only change.
Stop deletes with a trigger.
We can use an “instead of” trigger that will fire when someone tries to delete a row, and we’ll do something different. I’m using the “deleted” virtual table to grab the rows that are supposed to be deleted.
CREATE OR ALTER TRIGGER SoftDelete_Users ON dbo.Users
INSTEAD OF DELETE AS
SET NOCOUNT ON;
SET IsDeleted = 1
WHERE Id IN (SELECT Id FROM deleted);
When someone tries to delete a row, we can see that it wasn’t actually deleted:
The slick part of this approach is that it doesn’t require changing your existing DELETE code – it just thinks it’s successfully deleting rows. However, we still have a pretty big problem: apps are still querying the table, and they’re not expecting to see deleted rows.
Filtering selects is a little harder.
Ideally, if you can change the queries, you add “WHERE IsDeleted = 0” everywhere. The hard part there is changing every query, obviously, so a workaround is to rename the Users table to something new, and then put a view behind in its place:
EXEC sp_rename 'dbo.Users', 'Users_tbl';
CREATE OR ALTER VIEW dbo.Users AS
WHERE IsDeleted = 0;
(Quick note: you probably don’t want to use SELECT * in a view like this. Under the covers, SQL Server builds the code for the view once, and then if you add more columns to dbo.Users_tbl, the new columns won’t show up in queries until you call sp_refreshview. I’m just using the * here for simplicity.)
Doing the table-to-view switcheroo means the soft-deleted rows are instantly filtered out from my queries.
Index design is a little harder, too.
Whether use the view approach or add IsDeleted = 0 to all your queries, your execution plans are going to have something new: a filter on IsDeleted = 0. That also means that your missing index recommendations are going to ask for IsDeleted in the key of your index:
When you first implement soft deletes, this isn’t a huge concern for most queries because none of your data can be excluded (because no rows have IsDeleted = 1 yet.) However, the worst case for performance is a query that used to have a covering index, but now has to do a key lookup for every single row in order to double-check that the row hasn’t been deleted yet.
One workaround is to add IsDeleted to every index. In the beginning, it doesn’t matter if IsDeleted is part of the key or the includes – because it’s not filtering anything out yet. Over time, it starts to become more important. You’re going to think that someday, if the majority of your records are deleted, then you might even want IsDeleted to be the leading key in all of your indexes. That would be incorrect – you’d be better off with the next solution.
Another workaround is to use filtered indexes. A filtered index is just an index with a WHERE clause, like IsDeleted = 0:
CREATE INDEX IX_DisplayName
WHERE IsDeleted = 0;
When a majority of your rows have IsDeleted = 1, this filtered index has a nifty benefit: it’s physically smaller because it doesn’t contain the deleted rows. It takes up less space, meaning shorter backups/restores/CHECKDB, and it also has less concurrency overhead if some kind of bulk operation needs to be done to the deleted rows.
Yet another workaround is to split the table up. If you swap a view in, there’s no reason you can’t have that view union across a couple of tables: one with active rows, and one with deleted rows. Well, there’s one reason you can’t do it: this style of storage becomes harder, and whaddya know, would you look at the time? I’ve spent an entirely long time writing out this blog post, and I’ll leave that split scenario for another post.