Data compression can have many uses and advantages, but it also has its drawbacks. It’s definitely not a one-size-fits-all strategy. One of the things to be aware of is that initial compression of a table or index can take quite some time, and will be resource-intensive. It also is an offline operation, so the object can’t be accessed while it’s being compressed or uncompressed. (Clarification: by default, an ALTER TABLE statement is an offline operation. You can declare it online, but, as David notes in the comments, “Although the operation is ONLINE, it’s not completely idiot-proof.”)
So what would happen if your SQL Server service or server restarted while you were in the middle of (or, as it usually goes, 90% of the way through) compressing an index? Let’s investigate.
I have a table that is 1.1 GB in size, with a 1.0 GB nonclustered index.
Table name | Index name | total_pages | PagesMB |
bigTransactionHistory | pk_bigTransactionHistory | 143667 | 1122 |
bigTransactionHistory | IX_ProductId_TransactionDate | 131836 | 1029 |
I need to reduce the size of the nonclustered index, so I decide to compress it. After using sp_estimate_data_compression_savings, I determine that I will benefit more from page compression.
I apply page compression, and allow my server to work.
1 2 3 |
ALTER INDEX IX_ProductId_TransactionDate ON dbo.bigTransactionHistoryPage REBUILD WITH (DATA_COMPRESSION = PAGE); |
Now, let’s say there is an emergency during this operation. Perhaps a component in the server breaks; maybe the data center loses power. (Or I restart the SQL Server service.) Uh-oh! What happened?
When the service is running, I check SSMS. I see the following error.
1 2 3 |
Msg 109, Level 20, State 0, Line 0 A transport-level error has occurred when receiving results from the server. (provider: Shared Memory Provider, error: 0 - The pipe has been ended.) |
What happened to the data? Is compression all or nothing, or is it possible that some pages are compressed and others aren’t? I first check the index size.
Table name | Index name | total_pages | PagesMB |
bigTransactionHistory | pk_bigTransactionHistory | 143667 | 1122 |
bigTransactionHistory | IX_ProductId_TransactionDate | 131836 | 1029 |
Nothing has changed – the index is not one byte smaller. This tells me the operation was not successful.
I also check the error log to see what information it can provide.
There are 2 transactions that are rolled back. As I am the only person in this instance right now (the benefits of a test environment), I know those were my transactions.
SQL Server has treated my data compression operation as a transaction. If there is a restart at any point, the operation will be rolled back to maintain data integrity.
10 Comments. Leave new
I believe your object’s data is ONLINE if your ALTER is properly declared ONLINE, follows the rules for what can be rebuilt online, and you are running enterprise.
We’ve run into problems during online index rebuilds where the data was online but other areas of the system were effectively “blocked”. We had an ONLINE index rebuild run for 40 hours before the process was killed by a DBA because the tran log was so large that it effectively shutdown the log reader agent due to blocking. Same with CDC and likely anything else that is tranlog-intensive. In one case we ran out of disk space because the tran log grew and consumed the entire LUN. Although the operation is ONLINE, it’s not completely idiot-proof.
Anecdotally, with really HUGE tables it seems like the best strategy is to REORG the index before applying compression if the index you are compressing shows even a modicum of fragmentation. Since a REORG is “fully online” (and is killable without rolling back all of its work), the more pages you can release from the structure before you compress, the better.
Great blog post…keep up the good work…thank you.
David, you are correct – by default, the operation is OFFLINE, but I could make it ONLINE. Also, “Although the operation is ONLINE, it’s not completely idiot-proof” may be my new favorite sentence! Thanks for reading!
According to http://technet.microsoft.com/en-us/library/cc280449.aspx (and my experience), data compression can be performed as an online operation.
Yep, thanks for clarifying. It should say, “By default, it’s an offline operation.”
PAGE compression is a superset of ROW compression, hence it is expected that PAGE compression will give more compression than ROW compression.
I would rather follow this white paper on how to decided whether to use row or page compression.
http://msdn.microsoft.com/en-us/library/dd894051.aspx
Great link! I’ve seen cases where row compression saves 40% and page compression only saves another 10%; I’ve seen other cases where row gives 10% and page gives 40% over that. It’s very dependent on data types and what that data looks like.
I’m scratching my head on how Row compression would result in better compression than Page, since Page compression essentially does Row compression first. I never use Row personally, but then again none of my servers are CPU bound. But the whole concept of “CPU intensity” is rather misleading because the fewer IOs and more data stored in RAM relieves pressure on the CPU.
Allen, totally dependent on data types. Here’s a blog I wrote a while back that gives more in-depth information on it: http://blogs.lessthandot.com/index.php/datamgmt/dbprogramming/how-sql-server-data-compression/.
Interesting – I haven’t found a use case for Row thus far, but it’s good to fully understand the differences “under the hood” – nice post!