If you’ve got a regular rowstore table and you need to modify thousands of rows, you can use the fast ordered delete technique to delete rows in batches without hitting the lock escalation threshold. That’s great for rowstore indexes, but…columnstore indexes are different.
To demo this technique, I’m going to use the setup from my Fundamentals of Columnstore class:
- An 4-core, 30GB RAM VM with the data & log files on ephemeral (fast) SSD
- SQL Server 2019 Cumulative Update 11
- The Stack Overflow public database as of 2018-06 (about 180GB)
- The dbo.Votes table – which has 151M rows
I’ll create a new copy of the Votes table with a clustered columnstore index:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
DROP TABLE IF EXISTS [dbo].[Votes_columnstore]; GO CREATE TABLE [dbo].[Votes_columnstore]( [Id] [int] IDENTITY(1,1) NOT NULL, [PostId] [int] NOT NULL, [UserId] [int] NULL, [BountyAmount] [int] NULL, [VoteTypeId] [int] NOT NULL, [CreationDate] [datetime] NOT NULL, INDEX CCI CLUSTERED COLUMNSTORE ); GO SET IDENTITY_INSERT dbo.Votes_columnstore ON; GO INSERT INTO dbo.Votes_columnstore (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate) SELECT Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate FROM dbo.Votes; GO SET IDENTITY_INSERT dbo.Votes_columnstore OFF; GO |
Let’s say we’ve decided to no longer use VoteTypeId 12, Spam, and we need to delete those 29,852 rows.
The bad way: plain ol’ delete
I’m doing this inside a transaction only so we can slow it down to see the locks that are held open:
1 2 3 |
BEGIN TRAN DELETE dbo.Votes_columnstore WHERE VoteTypeId = 12; |
The locks output from sp_WhoIsActive @get_locks = 1 looks long, but only because there are so many rowgroups involved. I’ve trimmed it here to just show a few rowgroups:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
<Database name="StackOverflow"> <Locks> <Lock request_mode="S" request_status="GRANT" request_count="1" /> </Locks> <Objects> <Object name="(null)"> <Locks> <Lock resource_type="HOBT" request_mode="X" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:1" request_mode="UIX" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:2" request_mode="UIX" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:3" request_mode="UIX" request_status="GRANT" request_count="1" /> </Locks> </Object> <Object name="Votes_columnstore" schema_name="dbo"> <Locks> <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="1" /> <Lock resource_type="OBJECT.INDEX_OPERATION" request_mode="S" request_status="GRANT" request_count="1" /> </Locks> </Object> </Objects> </Database> |
The important lock to be aware of is the HOBT X: that’s an exclusive lock on the table. That’s the part that I need to avoid: in high concurrency situations, it may not be easy to get that kind of a lock on the table.
The good way: Fast Ordered Deletes
After rolling back that transaction, let’s try again with the fast ordered deletes technique. This time around, I’m using a CTE rather than the view technique I used in that last blog post:
1 2 3 4 5 6 7 8 |
BEGIN TRAN; WITH Votes_ToBeDeleted AS ( SELECT TOP 1000 * FROM dbo.Votes_columnstore WHERE VoteTypeId = 12 ) DELETE FROM Votes_ToBeDeleted WHERE VoteTypeId = 12; |
This time, I don’t even have to trim sp_WhoIsActive’s output. It’s way shorter because there are less rowgroups involved:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
<Database name="StackOverflow"> <Locks> <Lock request_mode="S" request_status="GRANT" request_count="1" /> </Locks> <Objects> <Object name="(null)"> <Locks> <Lock resource_type="HOBT" request_mode="IX" request_status="GRANT" request_count="1" /> <Lock resource_type="KEY" request_mode="X" request_status="GRANT" request_count="1000" /> <Lock resource_type="PAGE" page_type="*" request_mode="IX" request_status="GRANT" request_count="2" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:140" request_mode="UIX" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:141" request_mode="UIX" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:142" request_mode="UIX" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:143" request_mode="UIX" request_status="GRANT" request_count="1" /> <Lock resource_type="ROWGROUP" resource_description="ROWGROUP: 7:100000000750000:144" request_mode="UIX" request_status="GRANT" request_count="1" /> </Locks> </Object> <Object name="Votes_columnstore" schema_name="dbo"> <Locks> <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="1" /> <Lock resource_type="OBJECT.INDEX_OPERATION" request_mode="S" request_status="GRANT" request_count="1" /> </Locks> </Object> </Objects> </Database> |
Note these lines in particular:
1 2 |
<Lock resource_type="HOBT" request_mode="IX" request_status="GRANT" request_count="1" /> <Lock resource_type="KEY" request_mode="X" request_status="GRANT" request_count="1000" /> |
Awww yeah: we only have intent-exclusive locks on the object, and the exclusive locks only involve 1,000 of the keys. Yay! We’ve avoided table-level locking, so now we get better concurrency on the object.
But…it doesn’t really work.
The real test of the fast ordered delete technique is the ability to run queries on the other 150,783,380 rows that we’re not deleting. (Remember, we’re only deleting 1,000 rows.) And here’s the bummer: all of these queries are blocked:
1 2 3 4 5 6 7 8 |
SELECT TOP 10 * FROM dbo.Votes_columnstore ORDER BY Id; SELECT TOP 10 * FROM dbo.Votes_columnstore ORDER BY Id DESC; SELECT TOP 10 * FROM dbo.Votes_columnstore ORDER BY PostId; SELECT TOP 10 * FROM dbo.Votes_columnstore ORDER BY PostId DESC; SELECT TOP 10 * FROM dbo.Votes_columnstore ORDER BY CreationDate; SELECT TOP 10 * FROM dbo.Votes_columnstore ORDER BY CreationDate DESC; SELECT COUNT(*) FROM dbo.Votes_columnstore WHERE VoteTypeId = 1; SELECT COUNT(*) FROM dbo.Votes_columnstore WHERE VoteTypeId <> 12; |
I can get single rows if I specify the Id, and the row isn’t in a locked rowgroup:
1 2 3 4 5 6 7 8 |
SELECT * FROM dbo.Votes_columnstore WHERE Id = 1; SELECT * FROM dbo.Votes_columnstore WHERE Id = 10000000; SELECT * FROM dbo.Votes_columnstore WHERE Id = 20000000; SELECT * FROM dbo.Votes_columnstore WHERE Id = 30000000; SELECT * FROM dbo.Votes_columnstore WHERE Id = 40000000; SELECT * FROM dbo.Votes_columnstore WHERE Id = 50000000; SELECT * FROM dbo.Votes_columnstore WHERE Id = 60000000; SELECT * FROM dbo.Votes_columnstore WHERE Id = 70000000; |
But if I specify a row in a locked filegroup, that row is locked – even if the VoteTypeId isn’t 12. This is blocked:
1 |
SELECT * FROM dbo.Votes_columnstore WHERE Id = 160000000; |
I can prove that it isn’t VoteTypeId by running it with NOLOCK:
Columnstore lock escalation gets ugly.
As far as our users are concerned, even with just 1,000 rows involved in a transaction, the entire table is practically locked for most kinds of queries. Queries can only succeed if SQL Server can do rowgroup elimination to bypass the locked rowgroups altogether – but even that is practically impossible because even the simplest queries, like top 10 by Id ascending, won’t eliminate rowgroups here.
This is one of the many reasons columnstore isn’t really a good fit for most transactional workloads. Don’t get me wrong – I looove columnstore indexes – but if your workloads involve concurrency, and even more so if they involve modifying thousands of rows at a time, you’re going to need to do special table design like partitioning and sliding window loads in order to get the performance you’re already used to with rowstore indexes.
- Locking in Columnstore by Niko Neugebauer
- Clustered Columnstore Isolation Levels by Niko Neugebauer
- How Snapshot Isolation Slows Down Columnstore by 8x by sql.sasquatch – because if you think you’re going to work around this locking limitation with different isolation levels or offloading reads to a readable replica, you’re in for another big surprise.
- And of course, my Fundamentals of Columnstore class.
1 Comment. Leave new
Yep Brent,
I tried out NonCLCStore when it came out back in 2012 on a few “highly contentious” tables with a bunch of regular rowstore NCCIX already on the table “just to speed things up”…….mmmmm…..bang
Columnstore:
Reporting,
Cubes,
Tabular,
Star Schemas
Power BI
Synapse
not good ole OLTP