SQL Server Data Compression: It’s a Party!

58 Comments

When I was in high school, Dad and I lived with his mom, my Grandma Ozar, for a couple of years.  We took care of things around the house and made sure her coffee pot was always full.  She could really down that coffee – at least two pots a day.  (Looking back, if we could have reduced her caffeine consumption, she probably wouldn’t have needed so much Valium.)

I Can Quit Anytime I Want*
I Can Quit Anytime I Want*

Grandma and some friends took a road trip to Las Vegas, and while they were gone, I threw an epic party.  We’re talking 30 gallon garbage cans filled with homemade Jungle Juice.  (I’m one of those reasons she couldn’t completely eliminate the Valium.) A couple hundred of my closest friends had a good old time.

A couple of my less-than-closest friends had a little too good of a time at my expense and started trashing the house.  They walked up the staircase smashing the picture frames of every family photo, then started to throw a couch off the second floor balcony.  My security guys (I’m telling you, it was that good of a party) carried them out before they got too carried away.

Cleaning Up After The Party

The next morning, the very-closest-friends did a fantastic job of getting things back to normal.  We replaced all the picture frame glass, got the dirt out of the sofa, vacuumed the place top to bottom, and finished the Jungle Juice.  I thought we’d hidden all our tracks, but we got busted by the tiniest of clues.

Someone had left beer bottle caps on top of door jambs all around the house.

Amazing!  Who thinks of this stuff?  You know they did it on purpose, too – they were just itching to get me into trouble.  One beer cap on one door jamb, I could understand, but all over the house?  Damn.

I was disinherited for that particular shindig.

I learned a valuable lesson: if you’re not absolutely sure you can clean up every trace of everybody else’s messes, you shouldn’t throw parties.  Sooner or later, somebody you can’t trust is going to show up at your party, and they’re going to do something that’ll get you in trouble.  Come to think of it, it’s just like being a DBA.

SQL Server 2008 Data Compression: No Inheritance Either

Microsoft SQL Server 2008’s Data Compression feature lets you compress objects – not just tables, but individual indexes.  This compression does incur a little extra CPU power to handle the compression, but that extra overhead is more than offset by increased IO performance.  Generally speaking, the database server is sitting around waiting on disk subsystems.  Adding a little CPU work while dramatically reducing IO needs results in faster query return times.  You need to test compression to see if it works well in your environment, because it may not work well in heavy-insert databases.

Today, though, I’m going to focus on the dark side of compression: a complete lack of inheritance.

When you compress tables and indexes, it’s a one-time action.  You’re only taking care of what exists today.  If someone (or even you) turns around and creates an index on that same table tomorrow, it won’t be compressed by default.  Whoever creates the index has to make sure that it’s compressed, and there’s nothing in SQL Server Management Studio that will hint to them that other parts of that same object are compressed.

To make matters worse, your development, QA and production environments might all have different compression settings, and you’d never notice it at a glance.  Compression is transparent to applications, so your developers won’t know why one environment performs much differently than another even though they have the same hardware, same indexes, same statistics.

Implementing compression is a multi-step process:

  • Figure out what objects you should compress
  • Plan to handle all of your environments (dev, QA, production)
  • Compress them during a low-activity window
  • Regularly patrol your environments checking for added objects that weren’t compressed
  • Keep your environments in sync

If you don’t stay on top of all of these, you’ll need Valium too.

* – Yes, about that picture.  No, it’s not mine.  Yes, it’s licensed with Creative Commons.  No, it wasn’t even the worst jungle juice picture I could find licensed with Creative Commons.  Yes, I too am amazed that people upload their party pictures to Flickr, let alone license them with Creative Commons.

Previous Post
Blog Better Week: Strunk & White’s Elements of Style
Next Post
SQL Server 2008 R2 Frequently Asked Questions

58 Comments. Leave new

  • Would it be feasible to schedule semi-regular decompression and recompression times to “clean up” these stragglers?

    Is there a basic DB size range where you would recommend that a DBA look at compression?

    Reply
    • About the decompression/recompression times – no, actually, you just need to run queries regularly to see what objects aren’t compressed, and then decide whether or not to compress ’em. Unfortunately, there’s no built-in tools to do that – the DBA has to roll their own.

      I would recommend that anybody using SQL Server 2008 Enterprise look at compression, regardless of the size. If you care enough to spend extra for Enterprise Edition, you should check out that feature, because it can be free performance for no extra cost.

      Reply
  • What a great way to start the blog post! Well done, my friend, well done. Who said data compression had to be a dry subject. hahahaha….

    Reply
  • Too bad you have to by Enterprise version to get compression. You know I had a few parties like that many years ago. I remember it’s hard to hide holes in the wall also.

    Reply
  • Great points! Didn’t think about inheritance. I wrote on this subject with several code examples on SSC last year, with even a presentation given in Vermont, last Spring.

    It was early 2009 I posted, after a great project in previous Spring of 2008 using SQL 2005 Entreprise Ed.’s VarDecimal conversion – highly effective if you are still in SQL 2005 and your databases are heavy financial decimal data type dependant.

    If you are using 2005/8 Developer Edition, you have all the features of Enterprise, therefore you can compress with ease – u just cannot use Standard Ed. for this function.

    More code examples for SQL Server 2008.

    Reply
    • Hugo – yep, and you also have to make sure you don’t have Standard Edition in your disaster recovery environment. That makes for an ugly surprise when you try to recover from a production outage only to find you don’t have the right edition available, and you can’t restore your databases. Whoops!

      Reply
  • At what cost is compression ?

    Being in this field for a number of years, there is one thing I learned “You get nuttin’ for free, there is always a cost”. I am ready to migrate one of our risk systems to 2008 from 2005, and compression looks like gold (gold, Jerry gold – Banya), especially for some of the huge, flattened out fact tables that are in the system, I’m talking a 7K row. I did not realize that you can use BOTH row and page compression on a table.

    Reply
    • Paul – howdy, sir. The cost is usually heavy on highly written tables. If something’s being constantly inserted/updated/deleted, the CPU cost might overcome the IO benefits. The best scenario is large tables that get rarely updated, like data warehouse fact tables.

      Reply
  • Rudy Panigas
    April 19, 2012 2:34 pm

    Nice article, thanks for the information!
    Do you know where I can more information on compression and decompress for SQL 2005? Our vendor is recommending using compression and I’ve not used this feature before.

    Here are some of the questions I need to understand first.
    Does compress work when you are doing a select from one compressed table to another compressed table, or from a production server to a development server?
    Can you revert back to an uncompressed database?

    Thanks again for the great website! Always lots to read and discuss here 🙂

    Reply
    • Rudy – compression is totally invisible to queries. It only affects how the database engine writes stuff to disk. You can indeed revert by taking away the compression, but keep in mind that databases with compressed objects can only be attached to SQL Server instances running Developer or Enterprise Edition.

      Reply
  • Rudy Panigas
    April 20, 2012 6:59 am

    Thank you Brent! One last comment, is there any reason why you wouldn’t use compression on a database?

    Reply
    • Yes, there’s several. If my disaster recovery server is Standard Edition, if I’m dealing with very frequently updated data, if I’m dealing with data that doesn’t compress well, etc. You want to test it out in each case and monitor it closely to make sure it’s solving a problem – like any other feature, don’t implement it unless it’s the best solution to a problem you’re trying to solve.

      Reply
  • Rudy Panigas
    April 20, 2012 7:13 am

    Once again, Brent (Bacon Man) Ozar saves the day!

    Thank you for all your great information and help!

    Reply
  • Gilbert Gamache
    October 15, 2013 1:45 pm

    In our production environnment, our databases are not compressed. But we are looking to compress them in our (multiple, virtual) developpment environnement. Did someone ever tried that configuration, would they recommend it.
    I see problem with Database Project and also with performance comparison (although not an issue in development)

    Reply
  • Thanks for the entertaining article Brent! a bit different from reading BOL. One question, after applying compression to table are newly inserted rows compressed like the existing data, or will compression need to be replied at some stage in the future to pick up these new rows?

    Reply
    • Matt – thanks! You’ll want to dig into BOL for details – it’s in the When Compression Occurs section here: http://msdn.microsoft.com/en-us/library/cc280464.aspx

      Reply
      • Brent,
        thanks for the informative articles. One question came up the other day, I keep reading that compression simply compresses the data in the index and table, I don’t see anything anywhere that also “rebuild/reorganizes” the index in the same processes. Is it mis-information to think a compression will also incur a rebuild of your indexes in the process?

        Reply
        • Francisco – if you’re enabling compression in order to perform an index rebuild, I’d humbly suggest that you just rebuild an index if that’s what you need.

          Reply
          • Brent,
            I did find a line of text from one of the MS articles outlining that performing a database compression will also rebuild the indexes in the table. however, the reason the current dba did this was because there are thousands of tables in this vendor db. Reading your other articles on performance makes me think there are several tables he may wish to pull out of compression for speed alone. btw, this database was over 2tb before the compression.

  • How are columnstore indexes affected in SQL 2012? I see that they added extra compression features for 2014+ versions but not seeing any mention of 2012. Does the columnstore just use its own compression? Thanks!

    Reply
    • Finally see the light lol… since there is no inheritance it’s not like the compression would “trickle down” to a nonclustered columnstore anyway. On the other hand, a clustered columnstore in 2014+ would get compressed with a table perhaps? Am running up the learning curve on Enterprise features…. 😉

      Reply
  • Hi Brent.

    If I may… If you have a big table or a big index with only integers in it, would it make sense to apply the page compression option to it? Or row compression would be more sensible?

    Thanks.

    Best,
    Darek

    Reply
  • Donald R. Cavin
    February 19, 2016 10:18 am

    Ok, got a weird one Brent. @@version (10.0.5520.0, yeah I know old as the hills, not most recent sp, let along cu, mea culpa)…

    …but…

    …have you ever seen page compression (clustered pk only index) cause dupicate key violations on insert?

    …i started page compressing to speed things up with little RAM…all of a sudden random compressed tables started having dup pk errors…I removed compression…the errors went away…so far on my searches, I have found doodly squat…I’m still working on it though…it makes no sense…

    Reply
    • Donald – no, sorry. I’d probably start with troubleshooting the basics – making sure it’s not actual duplicate primary key errors, or race conditions in the code. Beyond what I can troubleshoot in a blog post comment though.

      Reply
      • Donald R. Cavin
        November 9, 2016 8:10 pm

        Brent. Follow-up…it was bad code. Page splits using select distinct (nolock) with guid as pk from very active oltp system…we got duplicates from the source. Developer had to recode and I got to reapply data_compression=page and have FABULOUS performance again. Thanks again.

        Reply
  • Hi Brent-

    Why does sp_BlitzIndex categorize compressed indexes as “Abnormal Psychology”? Is it simply alerting to their presence, or has it somehow determined that those indexes shouldn’t be compressed?

    Reply
  • Brent, do you see any problems migrating from 2008 to 2012 with tables that are compressed?

    Thanks

    Reply
  • Flávio Germano
    August 9, 2016 6:13 pm

    Is it possible to compress a partitionated table ?

    Reply
  • Jeremy Bennett
    August 16, 2016 2:36 pm

    When you run the compression wizard or alter command like on a table’s only partition (not indexes) does the object compress before the statement completes? if the statement comes back with command completed is it done? I tried it and ran some tsql to check the space the rows take up and nothing changed. thanks in advance.

    Reply
  • Hi Brent. Thank you for the brilliant article as always. My question regarding index compression is; how do you monitor/measure how the index gets decompressed in memory. I know that the whole page/row (compressed) is read into memory and only the data that is being queried gets uncompressed, but I am not sure what to look for showing this behavior.
    Please…?

    Reply
  • Hi Brent,

    I know it is old post, but wondering what is your thought about SAN storage compression VS SQL server data compression

    Reply
  • Yes, here is the story. We have SQL 2012 database that app team did mostly page level compression. They ran their performance test and noticed CPU hike. So requested to add more CPU. So our infrastructure team came back saying that we are already doing compression on the SAN level, so there is no need to do DB compression at SQL level.
    My question is does SAN level compression really does compression at the mdf file? I know it does for file server.
    Thanks Brent

    Reply
    • Vanrani – let’s take a step back. What problem was the app team trying to solve by enabling page level compression?

      If their primary goal was to reduce space, I’d start by using sp_BlitzIndex to identify which indexes are consuming the most space, make sure they’re actually being used, dedupe nearly redundant indexes, etc.

      If their primary goal was to improve performance, and then their performance tests noted that their performance went down, I would tell them to roll the compression back, go back to their original settings, and measure their primary wait type.

      Reply
  • Thanks Brent, we are going to do exactly what you said. Roll back the compression and run the test again. I will try sp_BlitzIndex. Thanks!

    Reply
  • Marvel Mayfield
    October 29, 2018 1:25 pm

    Reviving an older post to ask a new question. I’ve seen very little posted on which type of compression is the right type of compression. The only clear statements I’ve seen have been by Thomas l Rock, who says PAGE level is best for tables with lots of writes while ROW is better for tables with lots of scans. I’d be interested in hearing your thoughts on this approach.

    Reply
  • Brent – thanks for the post!
    The discussion is even more relevant today, since Page / Row compression is available in Standard Edition starting with SQL 2016 SP1.
    With regard to the questions about SAN compression vs. SQL compression: both will reduce disk I/O when reading from disk, but SQL compression will also increase the amount of data that remains in the servers RAM buffer pool, since the table/index data remains compressed in memory.
    If the DBA team can collaborate with the SAN team, I’d suggest they disable SAN compression and let the DBAs manage it in a more granular fashion. They should also test performance for compressed SQL objects on compressed SAN volumes vs. compressed SQL objects on uncompressed SAN volumes. It may be worth the complexity for the SAN team to provide compressed and uncompressed volumes to the DBA team, and the DBAs could create file groups on either (compressed SQL objects on uncompressed SAN volumes, and vice-versa).

    Reply
  • Hi Brent.

    Does this mean that incorrect datatypes are no longer an issue? Choosing bigint over int(the appropriate one) is not going to end up occupying more space if we have data_compression which would compress it accordingly to its value?

    Reply
  • Hi Brent,

    I am going to implement page level data compression in all Database tables, could you please confirm, is it one time activity? Or do I need to schedule it weekly basis ?
    After doing data compression, will it be also apply for upcoming or New data ?

    Reply
  • if i was at the party then i know wgho tried to nick your house prehap if i said vasalu you may know them.

    Reply
  • Generall;y data compressing is only use to save memory.
    ory. or did you need some more beer.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.