Say you’ve got a table with millions or billions of rows, and you need to delete some rows. Deleting ALL of them is fast and easy – just do TRUNCATE TABLE – but things get much harder when you need to delete a small percentage of them, say 5%.
It’s especially painful if you need to do regular archiving jobs, like deleting the oldest 30 days of data from a table with 10 years of data in it.
The trick is making a view that contains the top, say, 1,000 rows that you want to delete:
1 2 3 4 5 |
CREATE VIEW dbo.Comments_ToBeDeleted AS SELECT TOP 1000 * FROM dbo.Comments ORDER BY CreationDate; GO |
Make sure that there’s an index to support your view:
1 |
CREATE INDEX IX_CreationDate ON dbo.Comments(CreationDate); |
And then deleting from the view, not the table:
1 2 |
DELETE dbo.Comments_ToBeDeleted WHERE CreationDate < '2010-01-01'; |
This lets you nibble off deletes in faster, smaller chunks, all while avoiding ugly table locks. Just keep running the DELETE statement until no rows are left that match. It won’t necessarily be faster overall than just taking one lock and calling it a day, but it’ll be much more concurrency-friendly.
Wanna see it in action? No? Then just copy/paste my code, put it straight into production like you always do, and get back to work. For the rest of you, keep reading.
Demoing Fast Ordered Deletes
To demo this technique, I’m going to use the cloud setup for our Mastering Query Tuning classes:
- An 8-core, 60GB RAM VM with the data & log files on ephemeral (fast) SSD
- The Stack Overflow public database as of 2017-Aug
- The dbo.Comments table – which has 60M rows, 20GB in the clustered index
- I’ve created 5 nonclustered indexes that total about 5GB of space (to make the deletes a little tougher and more like real-world tables)
The Comments table has a CreationDate field, and let’s say I need to delete the oldest comments – we’re going to delete all the ones from 2008 and 2009:

2008 & 2009 had a total of 1,387,218 comments – but that’s only about 2.3% of the table’s overall rows.
First, the plain ol’ DELETE.
I could try just deleting them outright:
1 2 |
DELETE dbo.Comments WHERE CreationDate < '2010-01-01'; |
It takes 39 seconds. Here’s what the actual execution plan (PasteThePlan) looks like:

It’s what we call a “wide” execution plan, something I first heard from Bart Duncan’s post and then later Paul White explained in much more detail. Because we’re deleting so many rows, SQL Server does a bunch of sorting, and those sorts even end up spilling to TempDB.
Plus, it’s taking a big table lock as it works. That’s no good, especially on big tables.
If you can get away with a 39-second table lock and activity in TempDB, the plain ol’ DELETE technique is fine. But let’s pretend you’re working in a mission-critical environment where a 39-second table lock is out of the question, and you need a faster background technique.
Demoing Fast Ordered Deletes
Like we talked about at the start of this odyssey, create a view:
1 2 3 4 5 |
CREATE VIEW dbo.Comments_ToBeDeleted AS SELECT TOP 1000 * FROM dbo.Comments ORDER BY CreationDate; GO |
Make sure that there’s an index to support your view:
1 |
CREATE INDEX IX_CreationDate ON dbo.Comments(CreationDate); |
And then deleting from the view, not the table:
1 2 |
DELETE dbo.Comments_ToBeDeleted WHERE CreationDate < '2010-01-01'; |
It runs nearly instantly (because we’ve got an index to support it), and here’s the plan:

At first, it looks the same as the plain DELETE plan, but look closer, and there’s something missing:

There’s no yellow bangs because there’s fewer sort operators and they’re not spilling to disk. Similarly, the memory grant on this query is way lower:
- Plain DELETE memory grant: 118MB (only 64MB of which gets used, but it spills to disk anyway because not every operator can leverage the full grant – you can learn more about grant fractions from Joe Obbish)
- Fast Ordered Delete memory grant: 1.8MB (only 472KB of which got used)
The grants are lower because we’re handling less data, which is also evidenced by the STATISTICS IO output:
- Plain DELETE logical reads: 25,022,799 on the Comments table (plus another 4.1M on the worktables)
- Fast Ordered Delete logical reads: 24,732 on the Comments table, plus 2K on the worktables – but that’s with me using TOP 1,000 in the view. If I change it to TOP 10,000, then the reads jump to 209,163. Still way better than 25,022,799 though, but it brings up a good point….
If you need to do this regularly, tune it.
You can play around with:
- The number of rows in the view (say, 1K, 5K, 10K, etc, keeping in mind the lock escalation threshold)
- The delay time between deletions
That way you can find the sweet spot for your own deletes based on your server’s horsepower, concurrency demands from other queries (some of which might be trying to take table locks themselves), the amount of data you need to delete, etc. Use the techniques Michael J. Swart describes in Take Care When Scripting Batches.
For more learning on this topic, read Microsoft SQLCat on Fast Ordered Deletes – Wayback machine copy because Microsoft deleted a lot of pages during one of their annual corporate shuffles. You can tell it’s old because…MySpace, yeah.
75 Comments. Leave new
This is a pretty interesting strategy. When do you prefer it over creating partitions and dropping the oldest partition as it expires?
Brendan – thanks! Partitioning often makes queries slower instead of faster, sadly. Here’s a good starting point for that:
https://www.brentozar.com/archive/2015/09/why-is-this-partitioned-query-slower/
What about set rowcount? Find some number where you can delete data without blocking apps. Set it up as a job and run it every few minutes.
Alen – sure, try that with the Stack Overflow databases and compare how the query plans, duration, and logical reads look. That’s one of the things I love about using that database for demos – if there’s a technique you think will work well, you can totally try it out! Give it a shot and see if performance matches what you expect. (Hint: it doesn’t, heh.)
The other thing you’ll run into with SET ROWCOUNT is that it’s removed as a method for limiting the number of rows effected by DELETE, INSERT, and UPDATE statements as of SQL Server 2014 and above (https://go.microsoft.com/fwlink/?LinkId=798563) — though, functionally, DELETE TOP (1000) FROM dbo.table ORDER BY sortColumn is the replacement for the older SET ROWCOUNT syntax
What about the overall added cost/time on the front end for updating the index for all new inserts into the base table? Those are one at a time and add a fractional bit more cost/time overall. That’s probably a lot more survivable/less of a problem than the locking/blocking on the delete side though.
Steve – yep, that’s why I put this in big print at the end of the post, “If you need to do this regularly, tune it.”
Thank you for this demo. I’d been seeing those “wide” execution plans on deletes but hadn’t looked at it in detail. Would the big table lock be less of the problem when using read committed snapshot isolation?
RussellH – you’re welcome! It would be less of a problem for selects, but when folks run into this problem, it’s usually because they’re also trying to do deletes/updates/inserts at the same time. RCSI doesn’t help you as much when 2 people are trying to write to the same row(s).
When you say keep deleting records, until there are no more left, do you means something like:
select ‘prime the row count’
while @@ROWCOUNT0
DELETE dbo.Comments_ToBeDeleted WHERE CreationDate < '2010-01-01'
I would guess so. There’s an example in the T-SQL Querying book here: https://books.google.co.uk/books?id=rgq3BgAAQBAJ&pg=PA362&lpg=PA362#v=onepage&q&f=true
I’m sure Brent will reply if he had something different in mind.
I figured I’d stop as the blog post finished. I can only write so much in one post. 😀 The rest is left as an exercise for the reader.
That sounds more practical than making one blog post that covers everything.
Roger that, I was extending the post so you didn’t have to do all the work yourself 🙂
I use this technique currently for nightly deletes. The piece of your post that spoke about delays, made me start thinking about server load with such a tight loop. It would be easy enough to add the begin/end with a waitfor to give it some breathing room.
I loooove the waitfor technique too because it lets database mirroring and AGs have a better chance of keeping up.
Very clever, Brent. Thanks!
Rebecca – thanks! I can’t claim credit for it by any means – Microsoft’s posting of it was the first time I’d seen it, but just wanted to get a fresh, solid version out there since their version disappeared in the continuous SQLCAT blog post reorgs.
I know this may not work for everybody – but we copy the rows we want to keep into a temporary or work table, truncate the original table and then copy the rows back
Robin – right, the key is at the start of this post when I say, “things get much harder when you need to delete a small percentage of them, say 5%.” You wouldn’t really want to copy 95% of a table out, then copy it back in – your transaction log would explode, major blocking, etc.
Your technique totally makes sense in the opposite scenario – when you only want to KEEP 5% of the rows, and you wanna delete the other 95%. I love it then.
Oops- I’m going to blame my inability to read properly on it being Friday evening
Is this any more efficient than just doing DELETE TOP (1000) FROM dbo.Comments WHERE CreationDate < '2010-01-01' until it doesn't delete anything? I see how it's better than deleting everything in one batch, but is the view doing anything magical that a TOP clause in the delete wouldn't do?
Wyatt – go ahead and try it with the Stack Overflow database and take a look. The view indeed does something magical. That’s why I work with publicly available databases so y’all can do this stuff yourself instead of asking me to do it. (I’m not mad – just want y’all to get up and do some investigation yourself so you can see this stuff and learn.)
Or rather, to be more specific – if you think you’ve got a better way of doing it, try – instead of asking someone else to do it for you.
You can’t create a view with ORDER BY, it violates the relational model. Also, it is a nondeterministic, because there is no tier breaker in the ORDER BY.
This should be specified in the blog, I guess
Calin – if you read the post, I *did* create a view with order by. 😉 You’re absolutely right about it not being deterministic, but for the purposes of this blog post (remember – read the title first, then work down) it works great.
I really do wish I could cover every topic in every blog post, but there’s only so much I can cover per post.
You can also use a CTE.
WITH Comments_ToBeDeleted AS (
SELECT TOP 1000 *
FROM dbo.Comments
ORDER BY CreationDate
)
DELETE FROM Comments_ToBeDeleted
WHERE CreationDate < '2010-01-01';
I don't have the stack overflow database, but in my tests if you move the where clause inside the cte, it changes the non clustered index scan to a seek. But that didn't seem to affect the performance.
Nicholas – I just tested that in Stack, and it’s a nonclustered index scan (not seek), but it’s a good scan in the sense that it doesn’t read the entire table – only enough rows to achieve the goal. Estimated number of rows to be read is in the tens of millions, but the actual number of rows is only 1000 – so in this case, the scan is fine. I’d be totally fine with the CTE. Nice work!
Yup we use a CTE too, works like a charm.
Do not forget you can do groups with Kenneth Fisher’s GO “batches”
DELETE dbo.Comments_ToBeDeleted
WHERE CreationDate < '2011-01-01';
GO 401
(https://sqlstudies.com/2018/04/12/deleting-a-lot-of-data/)
Ricky – yes, but with a few big exceptions. That only works in SSMS, it will repeatedly retry the query even when there’s no rows left to be deleted – not a great fit for something you want to automate – and there’s no error handling. (I like Kenneth’s post, though.)
To be fair I was dealing with a very specific set of circumstances. Limited log space, no index, one time delete and an environment that wasn’t in use yet. I also had an exact count of the number of rows I’d be deleting so could calculate how many iterations I needed ahead of time 🙂
We do something similar but without the view (and constant clicking). This gets it done in one swoop without taking huge locks. You can stop the query any time you need to and continue it until it’s done. I’ll test it on the SO but would imagine it uses the exact same query plan.
declare @rowCount int = -1;
while(@rowCount 0) begin
delete top 1000 dbo.Comments
where CreationDate < '2010-01-01';
set @rowCount = @@rowCount;
end
Robert – yeah, the problem I’ve run into with that is that someone tweaks that TOP number, goes past 5000 thinking they’re going to get ‘er done faster, and whammo, you get table locks. If you put it in a view, you make it less likely that someone’s going to change the object (assuming it’s locked down for permissions) and it forces them to keep their locks small. I like the idea though!
Appreciate it and yes, maybe it’s time to stop reading blog posts diagonally at 2 AM 🙂
Brent,
If I have the disk space for an extra copy of the table, then I find that it is sometimes better for concurrency to copy all the rows you want to keep, and then do two sp_rename operations. The caveat is that it requires two Schema Modification locks, (which are quick once they are allowed to run).
Henrik – yes, but not usually when we’re talking only deleting 5% of rows from a really big table. If you’re copying 95% of a really big table to the transaction log, that presents its own challenges.
hi Brent, yes, transaction log will be big, _if_ you do not run it as smallish batches, and the database is in SIMPLE mode.
Brent,
I hate the IX_CreationDate index.
Bit here is a case where “Know your data” applies.
Often there is a correlation between CommentsID and CreationDate.
This can be used if you just want to trim down your table.
Something like this:
DECLARE @lower BIGINT
SELECT TOP (1) @lower = CommentsId FROM dbo.Comments ORDER BY CommentsId
DELETE TOP (1000) FROM dbo.Comments WITH (ROWLOCK)
WHERE CommentsId>=@lower AND CommentsId < @lower+1000
AND CreationDate< DATEADD(YEAR, -3, GETUTCDATE())
Then you do not need that extra index on CreationDate, which I find is an important save of I/Os.
Henrik – yep, that’s true too! We cover that in our Mastering Query Tuning classes. (Just only so deep I can go in one blog post – holy cow, y’all want me to write a book on this, apparently, hahaha.)
I’m being thick, why does the view help over just a good index thanks Geoff
Geoff – may wanna reread the post. The first example had an index on it. You can also download the database yourself to experiment.
[…] Brent Ozar has a tip for deleting a relatively small percentage of data from a very large table: […]
“Wanna see it in action? No? Then just copy/paste my code, put it straight into production like you always do, and get back to work. For the rest of you, keep reading.”
To be fair, I always cite you as the source when it blows up in production. 😉
I expect nothing less. DBA – Default Blame Acceptor, hahaha.
The best bit is in the comments: “if there’s a technique you think will work well, you can totally try it out”.
I’ve got a “consulting line” that I’m beginning to use more and more. “I don’t know. I can’t tell by inspection whether X is true, measure it and tell me how it goes”.
I mean, I can sometimes guess how SQL Server will behave, But the best one to tell how SQL Server will behave is SQL Server.
(And thanks for the plug again!)
Hahaha, thanks sir. Yeah, it gets a little frustrating when I get dozens of comments that say, “You should try ___ and write about how it works.” What the hell, people, go start your own blog and pay things forward, hahaha.
But at the same time, I gotta be thankful that I have such a vocal audience. That’s awesome – at least I’m not in silence heh.
Nice Post Brent, Like Always !
One concern though, if we need to delete from multiple tables , which does not happen to be related whatsoever, then we need to create multiple views and indexes for them (if doesn’t exist one). Then, probably create SP to delete records off of those tables at once?
[…] Concurrency Week: How to Delete Just Some Rows from a Really Big Table Deleting a batch of rows might be painful. Check this Brent Ozar’s post how to relieve this pain. […]
Nice post Bret,
I tried the solution but found some issues:
– view must be schema bound
– index views do not allow use of TOP keyword
– cannot create a nonclustered index for a view if there is no unique clustered index
Am I missing something that could be the cause of all the issues above?
Daniel – this post isn’t about indexed views. Reread the demo syntax carefully – we’re not creating indexes on the views, just the table.
(Let ‘s see if this posts the code properly) Here’s the version we use. The idea was someone else’s (I wish I knew who, so I could cite), but works overall quite well.
It uses the CTE to do the delete as mentioned by Nicholas above, but with the added advantage that it’s constantly narrowing the window it queries.
IF OBJECT_ID(‘tempdb..#mydeleted’) IS NOT NULL
DROP TABLE #mydeleted
CREATE TABLE #mydeleted
(insert_datetime DATETIME)
CREATE CLUSTERED INDEX cidx_mydeleted_insert_datetime ON #mydeleted (insert_datetime)
DECLARE @msg NVARCHAR(50),
@insert_datetime DATETIME
SELECT ‘Starting’
–sets @@ROWCOUNT
WHILE (
@@ROWCOUNT 0
AND GETDATE() = @insert_datetime
ORDER BY insert_datetime
)
DELETE FROM cte
OUTPUT deleted.Insert_Datetime
INTO #mydeleted
END
(Nope, that didn’t post properly; Brent, feel free to delete that one)
Here’s the version, based on a couple concepts (CTE, narrowing the window) that others I can’t find, pointed out.
IF OBJECT_ID(‘tempdb..#mydeleted’) IS NOT NULL
DROP TABLE #mydeleted
CREATE TABLE #mydeleted
(insert_datetime DATETIME)
CREATE CLUSTERED INDEX cidx_mydeleted_insert_datetime ON #mydeleted (insert_datetime)
DECLARE @msg NVARCHAR(50),
@insert_datetime DATETIME
SELECT ‘Starting’
WHILE (
@@ROWCOUNT 0
AND GETDATE() = @insert_datetime
ORDER BY insert_datetime
)
DELETE FROM cte
OUTPUT deleted.Insert_Datetime
INTO #mydeleted
END
(gods, neither time did the code post correctly – Brent, please save us from my crappy posts)
Here’s a way I came up with, based off other smarter peoples’ ideas. I can’t post the code here, so instead you get a link. http://thebakingdba.blogspot.com/2015/01/t-sql-more-efficient-delete-using-top.html
Basically it uses the CTE like Nicholas’ example, but it also takes the results (via #DELETED) and uses that on the next loop to narrow the window, since you know you’ve deleted up to a certain point. So it won’t slow down over time as it runs, since it would find fewer and fewer matches.
Hope this helps!
Hi! what about Transaction Log with this technique?, if i need delete a several millon records?, how do you apply the commits using the view(top 1000 or n), i would need to use a cursor?
Pablo – give ‘er a shot and you can see based on your own workloads. Or, if you’d like personalized help, click Consulting at the top of the screen.
Hi, need your help guys
I need to keep only top 50 rows from each group after sorting in an unsorted table and delete the rest. Can someone help me on this.
Unfortunately, we don’t do free consulting here in the comments.
This is an amazing and super simple way to do deletes, I really like it.
One thing I wanted to ask or would like you to expand on in your post, you briefly mentioned it, Is the delays between calling the delete? How does one implement this?
lets say I don’t know how many rows are older than date x. And I want to delete all that data. I could have top 1000 in my view, but now surely I should have a delay between each 1000 deletes, just to let other work not the database carry on? As well as spreading out the work load over time? After all deletes are not time sensitive, I don’t mind if they take 5 hours in the background to delete. They must just not affect the workings of the live table.
Waitfor delay ‘00:00:01’
Second Question,
We already have an index on the table, however it is a compound index of time and
SensorId
? How do I pass theSensorId
into the view so it can filter by it? To add a whole new index just on theDateTime
field when there is already an existing one doesn’t seem to make sense to me. Filtering on the SensorId will also help filter rows faster..Dylan – thanks, glad you liked the post. For personalized help with specific database problems, that’s where my consulting comes in. You can click Consulting at the top of the site to see how to get started.
Thanks for sharing Brent, another option could be to create a staging table with the schema structure matching the source table, insert the records we want to keep into the staging table and use an ALTER TABLE staging SWITCH to source statement.
Antonio – right, but that’s a logged operation that increases the size of the database and requires a lock the whole time you’re working. Other rows in the table can’t be changing while you do that.
The staging table is only going to be used as a temp table in the insert statment, a select statment will be run on the source -you can use tablock).
Errr, so in a post called “Concurrency Week”, you’re going to suggest using a tablock hint? Interesting.
I just wanted to share an option to delete records from a very large table that worked for me, delete 100 millions out of 400 millions without locking and minimal logging
Thanks
Oh definitely, feel free to share that method that didn’t use locking because the one you shared here required an exclusive table level lock hint. 😉
You are correct, thanks to point this out, I should say a schema-modify lock is acquired on both the source and target tables to make sure that no other connections reference the tables during the change.
Sorry if I missed this, but is it possible to add a filtered index that only exists on rows that are over a certain age, then just loop through that in blocks until none are left? (like deleting anything over 1 year old)
Feel free to give ‘er a shot and find out. That’s why I use open source databases and give you all the demo scripts – so you can test your ideas out.
Hi Brent, thanks for sharing this nice way to manage big delete however as per MSDN link: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql?redirectedfrom=MSDN&view=sql-server-ver16
TOP operator in view is no longer valid for Delete operation. Just tried this on Azure MI and can;t delete records from view if TOP operator is used in view definition.
I instead manage it by removing TOP from view and using TOP in Delete statement.
Thanks,
Ash – the Books Online page doesn’t say that. If you’re having problems with a query, by all means, feel free to post the query, the error messages you’re getting, and the plan at a Q&A site like https://dba.stackexchange.com. Hope that helps!
I can understand how this works for situations where the total duration of the delete is less important than the unavailability of the table. But what about when you need to delete somewhere from 20% to 80% of a table (dynamically) and you do not care about the table being locked, but finishing the delete as fast as possible is critical.
For personalized help on things that aren’t covered in the post, click Consulting at the top of the site.
Excellent post, Brent. Weve got a ridiculously large table that must remain accessible at all times, and about 80% must be deleted. There are no other options. HUGE table, the index itself is 25G in size. The table is accessed regularly and table locks are not an option. The view is remarkable. 1k, 2k, 5k — no locking, no impact to concurrent activity, extremely efficient. This thing has been running since yesterday and I have seen zero contention. Once the historical data is gone, this will be used for an automated daily routine. Really, really good stuff. Many thanks!