Blog

Has your SAN admin or CIO been telling you not to compress your SQL Server backups because they’re doing it for you with a dedupe tool like EMC’s Data Domain?  I’ve been hearing from a lot of DBAs who’ve been getting this bad advice, and it’s time to set some records straight.

The Basics of Storage Deduplication

Dedupe appliances basically sit between a server and storage, and compress the storage.  Sometimes they do it by identifying duplicate files, and sometimes they do it by identifying duplicate blocks inside files.  It isn’t like traditional compression because the process is totally transparent to anything that stores files on a deduped file share – you can save an Excel file to one of these things without knowing anything about dedupe or installing any special drivers.  In theory, this makes dedupe great because it works with everything.

The key thing to know about dedupe, though, is that the magic doesn’t happen until the files are written to disk.  If you want to store a 100 megabyte file on a dedupe appliance, you have to store 100 megabytes – and then after you’re done, the dedupe tool will shrink it.  But by that point, you’ve already pushed 100 megabytes over the network, and that’s where the problem comes in for SQL Server.

Dedupe Slows Down SQL Server Backups

In almost every scenario I’ve ever seen, the SQL Server backup bottleneck is the network interface or the drives we’re writing to.  We DBAs purposely set up our SQL Servers so that they can read an awful lot of data very fast, but our backup drives have trouble keeping up.

That’s why Quest LiteSpeed (and SQL 2008’s upcoming backup compression) uses CPU cycles to compress the data before it leaves the server, and that’s why in the vast majority of scenarios, compressed backups are faster than uncompressed backups.  People who haven’t used LiteSpeed before think, “Oh, it must be slower because it has to compress the data first,” but that’s almost never the case.  Backups run faster because the CPUs were sitting around idle anyway, waiting for the backup drive to be ready to accept the next write.  (This will really ring true for folks who sat through Dr. DeWitt’s excellent keynote at PASS about CPU performance versus storage performance.)

With dedupe, you have to write the full-size, uncompressed backup over the network.  This takes longer – plain and simple.

Dedupe Slows Down Restores Too

The same problem happens again when we need to restore a database.  At the worst possible time, just when you’re under pressure to do a restore as fast as possible, you have to wait for that full-size file to be streamed across the network.  It’s not unusual for LiteSpeed customers to see 80-90% compression rates, meaning they can pull restores 5-10 faster across the network when they’re compressed – or in comparison, deduped restores will take 5-10 times longer to copy across the network.  Ouch.

It gets worse if you verify your backups after you finish.  You’re incurring the speed penalty both ways every time you do a backup!

And heaven help you if you’re doing log shipping.  That’s the worst dedupe candidate of all: log shipping does restores across one or more SQL servers, all of which are hammering the network to copy these full size backups back and forth.

So Why Do SAN Admins Keep Pushing Dedupe?

Dedupe makes great sense for applications that don’t compress their own data, like file servers.  Dedupe can save a ton of backup space by compressing those files, saving expensive SAN space.

SAN admins see these incoming SQL Server backups and get frustrated because they don’t compress.  Everybody else’s backups shrink by a lot, but not our databases.  As a result, they complain to us and say, “Whatever you’re doing with your backups, you’re doing it wrong, and you need to do it the other way so my dedupe works.”  When we turn off our backup compression, suddenly they see 80-90% compression rates on the dedupe reports, and they think everything’s great.

They’re wrong, and you can prove it.

They don’t notice the fact that we’re storing 5-10x more data than we stored before, and our backups are taking 5-10x longer.  Do an uncompressed backup to deduped storage, then do a compressed backup to regular storage, and record the time differences.  Show the results to your SAN administrator – and perhaps their manager – you’ll be able to explain why your SQL Server backups shouldn’t go to dedupe storage.

In a nutshell, DBAs should use SQL Server backup compression because it makes for 80-90% faster backups and restores.  When faced with backing up to a dedupe appliance, back up to a plain file share instead.  Save the deduped storage space for servers that really need it – especially since dedupe storage is so expensive.

↑ Back to top
  1. Brent,

    I completely agree that backup compression is a huge win in nearly every scenario (unless you are under heavy CPU pressure). It really makes initializing a database mirror much quicker and easier. SQL Server 2008 Enterprise Edition already has native backup compression, while SQL Server 2008 R2 Standard Edition will also get it. Of course, SQL Server native backup compression does not have the flexibility of LiteSpeed, it is either on or off.

    • Glenn – yep, I covered R2’s new inclusion of backup compression in Std Edition last week here, and I think that’ll make the dedupe conversation even more common. It’s gonna get ugly with dedupe vendors over this one!

  2. Good stuff, I just sat through the Data Domain dedupe seminar. It really sounds good for file shares,etc as you mention. The DBAs in the room were all thinking that this makes little sense in the database world. I can point to your article if the SAN guys get ornery!

  3. Thank’s for writing this Brent. When new and expensive technology comes out, there always seems to be a push to use it. The “I just bought this 5 million dollar deduped storage system and you better use it” attitude can wreak havok on a backup infastructure. Isn’t the reason why we backup our databases, so that we can restore them in the event of a disaster? I would think that any technology that dramatically increases our time to recovery would be a negative, but I’m finding more and more DBA’s struggling to fight that battle. The end result of deduped hardware vs. compression may be a reduction in storage utilization, but the efficiency of getting to that end result is significantly more efficient (and cost effective) on both backup and restore.

    Just my two cents…

  4. To be fair to the dedup appliances, you shouldn’t just time the first full backup to it. You need to get a few backups to the dedup appliance completed and then start timing what time regular backups take. In the case of at least the DataDomain appliance, they should become more efficient once they have backups to de-duplicate against.

    I do also have to say that some of the snapshotting technology DD can do is pretty cool. I will note that I have not tested restoring from those snapshots, but it sure sounds good. ;)

    • Kendra – the dedupe happens as SQL pushes the backup into Data Domain. The full size of every backup has to be copied across the wire, no matter whether it’s the first backup or the tenth. Dedupe makes SQL backups smaller, but not faster. If you have a system that works otherwise, I’d love to see it.

  5. Hey Brent – I just went through this exact issue this year with my team. I would qualify this a little, by phrasing it this way: “the common advice about NOT using compression for SQL backups that go into a dedupe store is probably bad advice.” We do use *both* backup compression and a dedupe archive (+ remote mirror)together, and it works well. Here’s the thing: compressed data fed into the dedupe process generally doesn’t deduplicate as well as other data, because it’s very unlikely to match up against other unrelated bits that are already in there (which is how dedupe works). BUT, it can dedupe against prior versions of the compressed backup files that haven’t changed entirely. Example: take a full, compressed SQL backup on Monday, then another on Tuesday where only a small portion of the database has changed – lots of blocks in the file do match up the second time. So you take a big hit for the first file, but maybe not quite so bad the second and later iterations. We had to actually prove this out by testing it with real data from real compressed SQL backups over a couple of weeks when we first got the dedupe system.

    Also, related to the speed issue: you are exactly correct. We always back up to a disk available directly to Windows (local or SAN attached), with compression, to get a fast backup. The resulting files are then archived in a dedupe store, by backup software, before being deleted from the local disk at some later time. If we need to perform a restore, we have recent files sitting right at the server. Only older backups would need to come out of the archive and take the restore speed hit.

    • +1 to Merrill, I backup databases with native compression and they are written to a Data Domain. No the deduping isn’t as efficient as if they were uncompressed but it still does an amazingly impressive job.

  6. Brent,

    Great read!

    Just one quick correction and a comment. DataDomain actually does Dedupe on the fly “inline Deduplication” so it only writes Dedupe data to disk. While other products like Avamar do a post base Dedupe after the data is written to disk. Even though DataDomain is a inline Dedupe it has been throttled to only transfer data as fast as it can Dedupe or cache it in memory. I’m also in 100% agreement with you regarding backing up directly to Dedupe targets. My initial backup is to a large Raid 10 (rdm) attached to a VM, then CommVault Simpana 8 runs nightly to backup the raid 10 (rdm) and off-site it and Dedupes it a bit.

    • Nick – when you say online dedupe, are you saying SQL won’t have to push 100 megs of data across the wire? That’s not how I understand Data Domain to work. That would require running a driver or app on the SQL Server and deduping the data on the SQL Server, correct?

  7. With an inline or a most post-base Deduplication you would still need to transfer all the data across the line. Datadomain Dedupes the data as it enters the unit and only writes Dedupe data to disk. While most post-base Dedupe sends all the data to disk and then Dedupes it after the data is living on the target disk. A big pitfall with the post-base it will require a lot more disk to keep the full copy while it creates the Dedupe copy, also restore times are terrible. The nice thing about DataDomain is their fixed bandwidth rate includes Dedupe time up and down but it is limited. We use CommVault Simpana 8 here with their new Dedupe feature. I’m not going to say it’s the cat’s meow but I would strongly recommend it over Avamar.

    With Avamar you would not need to push the whole 100 megs over the wire, but it also requires a agent on the box constantly taking to the master Dedupe data base. Also has some of the worst restore times. If restore times are not important to you then this is a wonderful product.

    Nick

    • Okay, great. Just to close the loop on this so future readers understand, you’re saying you agree with my statements that backing up to Data Domain is not faster, right? And in fact, if the DBA is required to send unvompressed data to Data Domain, it’ll be even longer than if they were doing compressed backups to a regular file share, right? Just wanna make it absolutely clear to readers – I understand what you’re saying about Data Domain deduping “inline”, but that means after the full size is already sent over the network, and at that point it’s too late for the DBA.

  8. I’m in 100% full agreement with you!

  9. i ran deduped SQL backups for almost a year in parallel with tape backups. In the end we dumped the dedupe system and spent $$$ to upgrade to LTO-4 tape. the dedupe had some amazing compression and it was a lot better than tape. but LTO-4 tapes are $55 for 1.6TB tapes which in the real world come out to almost 3TB of storage per tape.

  10. forgot to mention, we used a post dedupe system and it actually used SQL 2005 Express as its backend. even had replication functionality built in. and my favorite was when the dedupe tasks ran, it would create a storm of blocking and a lot of the dedupe tasks would fail resulting in more space used. i tried to schedule it and it kind of worked, but never completely.

  11. cost was part of it. Over 3 years I think the estimate was 100TB – 200TB of disk at two locations. and us DBA’s never trusted the disk and trusted tapes a lot more than disk.

    with netbackup a restore to a different server was a 1 step process. with i365 it was 2 steps and several hours longer than with Netbackup.

  12. Pingback: Data Deduplication Technology – New Article on DBTA | Kevin E. Kline

  13. Same here…

    We were offered de-dupe technologies… about a year back… and when I went into the technicals… i thought to myself… that it is not de-dupe… but dupe…

    I then went ahead with an open source version of quest litespeed… called lzop compression….

    it basically reduces a 25 gb full / transactional backup to 6 gb within a matter of 3-4 minutes…

    http://www.lzop.org/

    There are windows binary links from those sites…

    • Anand – so that we’re clear, LZOP is in no way, shape, or form an “open source version of Quest LiteSpeed.” All it is is a file compression engine, and LiteSpeed has a lot more than that.

      Who do you call for support?

      Does the author keep it up to date? I read the “News” section of the site, which says:

      As of Oct 2006 there has not been any problem report, and 1.02rc1 will get released as official version 1.02 whenever I find some spare time.

      That doesn’t sound like the kind of product I want to trust my mission-critical SQL Server backups with, and frankly, if I was hiring a DBA, I would avoid candidates who made choices like that. Something to think about…

  14. Brent – I do agree that lzop is only a compression engine and is not an exact copy of quest litespeed… but if you take the concept of fast compression (less than 1/2 the time taken by 7z) for multiple number of databases… it is much better to do that… and then send it over the wire to a secondary location, than to send a whole database or transaction logs…

    I wonder anybody calls the creators of zip software for basic support… all of these compression technologies are open source… so if you are not happy with a particular feature… go ahead… make it better…

    There might be many people in the market… who may not be able to purchase quest LiteSpeed… or similar such paid software… for them this is a very good choice…

    frankly, I wonder if you can decide a DBA based on the compression software they use… whether it be zip, arc, . I am just providing my opinion based on what I did when a choice was available for me to invest anywhere between 10-20 grand on a de-dupe…

    As much as sql servers are based on set theory… there is a set of non-DBAs out there in the world, who are part-time DBAs by choice… I do not think they will ever look for a DBA employer… because they are happily doing other things…

    A DBA is not on the required list of requirements to use SQL server…

    • Anand – as someone who works for a software vendor, I can assure you that yes, people do indeed need support. I wish there was such a thing as bug-free software, but I haven’t seen it yet.

      You said: “all of these compression technologies are open source… so if you are not happy with a particular feature… go ahead… make it better…” That one argument alone shows that I can’t convince you of why open source isn’t the right answer for everybody. I wish you the best of luck with that software, though.

      Out of curiosity – if you’re such a big fan of open source, why use SQL Server? Why aren’t you using MySQL? After all, if it doesn’t have a feature you need, “go ahead… make it better…” ;-)

  15. This is a pretty decent article, and sums up nicely some of the frustrations I come into contact with on a day-to-day basis, but it totally ignores systems with deduplication on client AND server. Systems that do this (to minimize both storage requirements, AND network traffic) are more efficient than either traditional copmression (with a small window size: eg. LZ usually has a working range of about 32KB) or non-deduped data going out across the network.

    It’s hard to get some people to understand detail. This article does it pretty well.

    C

  16. While I agree completely, we are currently testing de-duplication devices (DataDomain’s) with SQL server backups – uncompressed and compressed via Litespeed. While doing some investigation, I discovered Quest doesn’t supported Litespeed when used with de-duplication technologies. It’s listed on Quest’s support under solution SOL49562. Here’s the direct link – https://support.quest.com/SUPPORT/index?page=solution&id=SOL49562. You’ll have to log in to using your support ID to view it (which you should have if you use Litespeed). For your convenience, here’s the case…

    Solution SOL49562
    Title
    Using de-duping technology (EMC Avamar, DataDomain, NetApp) with LiteSpeed
    Problem Description

    De-duping technology like EMC Avamar, DataDomain, or NetApp is not supported for LiteSpeed backup files regardless of whether the file is encrypted or not.

    Cause
    The data is not always in the same place from backup to backup.

    Resolution
    Unsupported software/platform.

    Environment
    Product: LiteSpeed for SQL Server
    Attachments:
    Server OS: Windows – All
    Database: SQL Server – All

    • Hi, Doug! I sent this over to the support department and they’re trying to clarify that article. The person who wrote that article is no longer with Quest, and we’re pretty sure it’s incorrect. We’re tracking it down to make sure.

      We can’t “support” dedupe appliances in the sense that we don’t have any in-house, and we don’t test with them. We’re not aware of any issues with them, though.

  17. TSM 6.2 (just announced) does client and server-side dedupe.

    Best of both worlds.

    C

    • Craig – just to be clear, TSM 6.2 was just announced for first delivery next month. I look forward to reading how it performs with SQL Server. I can’t seem to find any documentation on that, though.

  18. Hi Brent,
    TSM infocenters (that is, the online TSM manuals) are usually only enabled on the day of general availability.

    I am not in the loop on it, but I’ve no reason to expect anything different for the 6.2 release.

    For info, the TSM v6.1 infocenter is over here http://publib.boulder.ibm.com/infocenter/tsminfo/v6/index.jsp – usually a new URL would be used for each new infocenter.

    The TSM v6.1 server-side deduplication arrangement is described in a redbook at http://www.redbooks.ibm.com/abstracts/sg247718.html – this is likely to be one half of the setup in the 6.2 design.

    C

  19. Hi Brent,
    I’m a bits and pieces IT manager. We have SAP ECC 6 instances running on Vsphere4 on Hp Blades and EMC AX-4
    iSCSI array. The DB is IBM DB2 and Oracle 11i on RHEL 5.2.As we are in initial stages, we tend to take snapshot of every stage.
    My pain points are:
    1) VM snapshots are occupying humongous disk space.
    2) I’m planning a DR based on VMware SRM.

    My questions are:
    1)can dedupe help me in reducing the size of vm’s at the DR site ( i’m talking about a production VM replicated at DR)

    My storage vendor is proposing netapp 2040 at production and 2020 at DR. He says that this solution will dedupe data on production storage and compresses it and sends it across to DR. The results are very less space requirements on DR Storage, reduced WAN bandwidth consumption and zero downtime.

    Can you throw some light on it and help me in building a right DR.

    Thanks in Advance.

    Regards,
    Syed.

    • Syed – you’ve got a lot of good questions in there, but it’s beyond what I could accomplish in a blog comment. Your best bet is to get a trusted consultant involved. Ask your storage vendor for local references that you can talk to. Hope that helps!

  20. You are not accurate about dedupe basics. For example in IBM / Diligent ProtecTier the dedupe in inline. This means backing up 100MB does NOT mean you need 100MB storage!!! The dedupe is done on the fly. Please get the facts in order…

    • Hi, Udi. If you reread my article carefully, it’s not about the storage space – it’s about pushing the 100MB over the network. Here’s the quote:

      “But by that point, you’ve already pushed 100 megabytes over the network, and that’s where the problem comes in for SQL Server.”

      Yes, ProtecTier does the dedupe “inline”, but that inline is happening in ProtecTier, not in SQL Server. SQL Server has to send all 100MB over to ProtecTier first, and that’s what takes so long. Granted, it doesn’t take long for 100MB, but most of my clients have databases over a terabyte, and it does indeed take quite a while to shuffle that much data over to ProtecTier. Other solutions handle the compression on the SQL Server before writing them out to storage (or a dedupe appliance.)

      I do appreciate your enthusiasm though! Let me know if I’m missing something there. Thanks for your help and clarification.

      • Thank you. you are absolutely right.
        In any event my customers are backing up ms sql to VTL/Dedupe which mean they use the ProtecTier to not only dedupe but as a VTL. They stream all data types and cannot decide to use other method just for ms sql…
        Its not in my power to influence on their IT decision making – I have to deliver good dedupe ratio.

        Do you have any good recommendation?

        • Well, I’ve got bad news – the entire point of this article was to explain why dedupe doesn’t make sense for MSSQL backups (or Oracle, for that matter). You’re asking me for good recommendations for your employer (IBM, according to your LinkedIn profile) because your customers already bought something that doesn’t work for their needs. (I’m guessing that’s why you’re here on my blog.)

          Unfortunately, it’s too late for us to help them. Your best bet is to push IBM to putting software on the server itself to reduce data sent over the wire to your appliances. Sorry, and I wish I had better answers there.

  21. just bought some LTO-4 tapes a month ago for $35 each. every time i mention this to the dedupe/d2d software sales people they don’t bother to try to convince me and hang up.

    i’ve only used post dedupe which had a SQL 2005 express backend and one thing i noticed is that the dedupe process was a major PITA. this was due to plain old low tech database locks. i had to schedule the dedupe jobs for servers to not run at the same time because they would block each other.

    the way i found this was that the dedupe server ran out of storage space a few months before it was scheduled to do so. with tapes it wouldn’t be a big deal, $700 for 20 LTO-4 tapes. with disk it meant $5000 mininum expense plus telling the facilities people we need more power. took me a week or two of work to get the space back

  22. I am a DBA for a company that is “forced” to use Symantec NetBackup as my SQL Backup soluiton. And I am not he Backup Admin so I have no control over my own backups. I would much rather use another 3rd party solution that inlcudes compression (becuase we are using SQL2005), but their “big” argument is deduplication of files. I think NetBackup does great for file servers, but I have not been impressed with it as a SQL backup solution.

    Do you have any experience with Symantec NetBackup? If so would this information pertain to NetBackup as well?

    • Howdy! Yes, I’ve got experience with NetBackup. The NetBackup agent may enable you to use compression in conjunction with SQL Server 2008 & 2008 R2, both of which offer compressed backup streams now. If you’re not on 2008, though, you’ll suffer these same problems.

  23. i use netbackup as well. depending on the tapes you use is the type of compression you get. we use LTO-4 tapes. the official numbers are 800GB uncompressed and 1.6TB compressed data per tape. i have tapes with close to 3TB of data on them. i asked about it and was told that the compression ratio varies and the advertised numbers are lower than a lot of people see in the real world.

    we also do some server/disk backups to a file share like most DBA’s for DR and to transfer data to QA. the speed of our tape backups are 2 to 3 times faster than the native SQL backups. and i know for a fact that we haven’t pushed our tape robot to it’s fastest speed since the NIC and disk system is our limiting factor so far.

  24. Hi all, cool note. i have avamar 5, and for file system it rocks, but we have an incident with avamar that they can not resolve is that a 456 gb db of sql 2005 takes over 24 hs to restore and the avamar server after 24 hs cancel it with a time out. coul be a dedup issue??

  25. when we tested SQL backups with i365 to disk we had the same restore issue. takes a lot longer to restore from disk when it’s deduped than from tape

  26. This is a great thread. I’d be curious what your thoughts are around leaving LiteSpeed compression on but turning LiteSpeed encryption off and still using Data Domain as the backup target. This way the amount of data traversing the network is minimized, you’ll still get some additional compression/dedupe in Data Domain on top of the LiteSpeed compression and your backups will be hardware encrypted and replicated to a remote site.

    • Ray – since space is at a very expensive premium when using Data Domain, it doesn’t usually make sense to land SQL Server backups on Data Domain to get a very small percentage of compression. It’s not a good ROI on storage expenses, and it’s easier/cheaper to use conventional SAN replication. Hope that helps!

  27. So much concern about network bandwidth limiting the backup speed… but in a 10Gb/s LAN or upcoming FCOE network the read capacity of the SQL server spindles becomes the bottleneck. Dedup do have real cost savings and you should have your network fixed first :-)

    • Generally, people who are trying to save a small amount on backup space don’t spend a large amount on 10Gb Ethernet. 10Gb is still staggeringly expensive compared to gig. If you’ve already put in the massive investment for 10G, though, then dedupe makes sense – but that’s an awfully small percentage of companies, wouldn’t you agree?

  28. Great post. Learnt a lot from DBA perspective. I am a backup guy so I guess I am auto-fitting into your mentioned Storage Admin role : )

    One point to consider is – while your view on the first or single backup is very valid (regarding the trade-off between bandwidth & CPU cycles), I am wondering is when we are doing multiple backup on the same database and need to retain them on online? Won’t the subsequent backup be benefits by already having majority of the database contents stored?

    Plus, the EMC Data Domain already is supporting a “Boost” protocol – in effects, all data leaving the DB server is already in deduplicated format (requirement – the DB server has to run as a Media Server). Would this neturalized the bandwidth concern as well?

    • Patrick – about multiple backups of the same database, no, there’s no reduction of network traffic just because a bunch of duplicate data already exists on the Data Domain appliance.

      The Boost protocol is something else, though – by doing that, it may reduce the bandwidth concern as long as wherever you restore the database is also a Media Server. That’s rarely the case in QA/dev/DR environments because licensing ain’t free. Cha-ching…

  29. I too would hate to lose our SQL backup compression, as we saw backup times drop to 40% of pre-2008 times. Our SAN admins are pushing hard toward dedupe, but more so to Riverbed WAN accelerator appliances.

    These accelerators work on the same dedupe ideas (and thus will not work well with compressed SQL backups). I assume this is similar to the aforementioned Boost protocol, but Media Server not required.

    So my question is… would the benefits of both a dedupe device and Riverbed accelerators trump the cons of uncompressed backups?

  30. A lot changes in our world over 3+ years, and I wonder if you still have the same opinion as you did in 2009? My background is that of a storage architect, but I’ve been fortunate to fill the shoes of a server admin, storage admin, and even DBA over the past couple decades. (OK, I admit, I’m a rank amateur on the DBA front.) Disk and network continue to drop in price (e.g., increase in availability), and yet our businesses continue to saturate whatever we’ve bought.

    Most enterprises I’ve consulted in are more worried about the cost of disk than the cost of network, and where I see the benefit of dedupe in your example is in long-term storage cost. Acknowledging that client-side compression will result in a much smaller initial backup set, you also have to pay attention to the long-term and incremental storage cost savings of dedupe. The longer you retain data, the greater your savings will be.

    Specific to the performance, “Confused” already mentioned DD Boost (specific to Data Domain) that dedupes before sending over the wire. Licensing cost may be a concern, as you mentioned, but that just brings us back to dollars and cents–and hopefully the business is willing to spend what is required to get business done. Maybe it’s not feasible to license non-production environments to take advantage of available technology, but then maybe it’s not feasible to expect non-production environments to have production performance.

    In the end, I think it’s a matter of the business deciding where it wants to pay for its ability to conduct business.

    • Just so we’re clear – so you’re saying you would spend more for DD Boost in order to get the same performance that you get out of the box with SQL Server for free?

      Why pay at all? That’s the question I’ve got. If you want dedupe, back up your SQL Server to a deduped file share. Done. You don’t need expensive agents to do that.

      • I think we’re on different pages. Your original post identified the network as the bottleneck for backing up to a dedupe device. DD Boost is a plugin that dedupes the backup data before the data hits the network (and then it’s deduped again, against all data on the device once it arrives for increased effectiveness). The benefit could be compared to what you get from server-side compression but also having dedupe on the back end.

        But you’re right, if you don’t need to reduce the cost of disk, there’s no point to pay for the cost of deduping to begin with. I could have avoided bringing up cost altogether, because cost is like that proverbial balloon: when you squeeze it at one end it’s going to affect the other end to some degree.

        My intended question could be restated as such: If the problem you identified with MSSQL + dedupe is that 100% of the data needs to traverse the network, if that problem is removed then do you believe that dedupe is still a bad idea for SQL Server backups?

        • Absolutely – why spend more for something you already get for free in the SQL Server box? Plus, I haven’t seen restore times for dedupe appliances be the same as restoring from regular file shares. Your mileage may vary, though.

  31. Hi,

    We just recently started using Avamar to backup everything including an hourly SQL server transaction log backup. The issue we’re having is that since the transaction file (LDF) file keeps growing; it seems like the Avamar backup is not committing the backup. Do you think this issue is related to the dedupe issue on SQL Server? If not, any idea?

  32. Hi Brent,
    I really enjoy your articles.
    Is this article still applicable to data domain and deduping? My SAN people think technology have evolved and it is no longer valid. I would appreciate your input.

    Thanks,
    Najm

    • What parts of the article do they believe have changed?

      • Sorry Brent, you were (in 2009) and still are wrong on many levels. A reasonable DBA will test/evaluate this to see how the benefits of dedupe, with say Data Domain, apply in his/her environment.

        1. It doesn’t slow down backups/restores: the DB can still read parallel and the optional backup agent can compress the streams during transfer. The backup node can write it to DD uncompressed to get the benefit or dedupe and compression on the appliance. You can also transfer the backup over Fibre Channel. DD has optional modules to perform part of dedupe on the client.

        2. Client application compression: in some limited scenarios this could beat a dedupe appliance when using a very short backup retention. Note that the appliance will not only compress your backups but it will remove redundant blocks from the backups before compressing them. This obviously means that the dedupe appliance will yield the best ratio in most DB backup scenarios. Yes, there are exceptions, a large portion of DB blocks change regularly or you need to use TDE, where dedupe won’t provide any benefit.

        Of course if you don’t pick the right configuration the dedupe appliance/software won’t provide the features your company paid for and you’re wasting resources.

        Akos

        • Akos – by all means, I’d encourage you to test it in your own environment and see how it works out for you. Your comments still talk about compressing on the dedupe appliance side, which as I discussed in the post, means that you’re sending more data to the appliance, and that’s slowing down the network transfer rates. If you take away fiber bandwidth for reads in exchange for writing to the appliance, then you’re robbing Peter to pay Paul.

  33. As someone who works for a dedupe storage vendor, our position is simple. We encourage all of our prospective customers to conduct an eval, and go from there.

    A couple of folks mentioned the low cost of tapes. If you already have a tape solution in place that fits your needs, by all means keep using it. You’re not the type of customer we are trying to sell to.

    On the other hand, when that giant tape library of yours gets EOL’d by the vendor, or your hardware lease is up for a 20yr minimum renewal, or managing the tapes/retention/disaster recovery across X number of branch offices gets painful, or you get any other reason to be in the market for 21st century technology, call us and we’ll go from there.

    And yes, more often than not, transitioning to dedupe disk storage will require doing things differently than they were done in the past, and may well run into other infrastructure bottlenecks that went unnoticed before. Hence try before you buy.

  34. The problem with the Try-before-you-buy approach is that it assumes you will be able to compare apples to apples with the dedupe storage vs traditional storage. We went with EMC Avamar but the problem I have with it is it doesn’t tell me the size of the deduped object on the Avamar grid but instead provides a dedupe rate.

    Truthfully I believe “dedupe rates” to be a very misleading metric. It’s like going to the grocery store and using the customer card to get the sale prices where they say “We saved you $12!!!!’ I’m not as interested in how much I saved as I am in how much I paid.

    Backing up an uncompressed file to Avamar is just going to have a better dedupe rate than a compressed one. Did you actually save any expensive Avamar grid space? How would we know?

    Given that it Avamar is matching blocks with previous backup files then it might be more honest to say the backup file’s size is a combination of one or more previous backups. Then again I can see it being confusing and a better way to report it would be the size of the new data and then also report the size of the new data combined with whatever previous data is required to restore it.

    I guess maybe it depends on why you want the numbers. If you want them to prove how much Avamar is saving you then the dedupe rate is an excellent metric. If you are looking in order to know how much Avamar grid space is being consumed by your SQL backups then it’s useless.

  35. Nice Article. We are considering switching from CommVault to Networker w/ DD boost to our Data Domain.. Can you comment on trying to send a 750 GB (Compressed) .BAK file across a WAN to a hotsite for DR? Our issue is that our DBA’s don’t have any replication and we rely 100% on tape for recovery. The average speed of a tape restore is about 30 GB an hour. I was hoping to use either VDI or VSS to improve the situation but I am getting much resistance from the DBA’s. Have you considered the fact that DDBoost with Data Domain allows the SQL box to transfer directly to the Data Domain over fibre. If you consider the significant improvement in bandwidth transferring over 10G fibre vs Cat6 (network) the loss of time from the dedupe may be a wash.. Thoughts?

    • Jason – there’s a lot of different questions in here. I’m not sure what you mean by 10G fibre – there’s 10Gb Ethernet, and that might be what you’re thinking of.

  36. Exactly the same as why using Fast Compression on Norton Ghost was quicker than uncompressed. The (otherwise idle) CPU does the compression quicker than the network AND remote disk can.

  37. Pingback: SQL Server Backup Infrastructure | nujakcities

  38. Good read.

    1. I’ve used CommVault and I loved that product. I used a large SAN target. Wrote the backups from agents on the servers with the data to the Media agents connected to the SAN target. In the case of some large servers they had Access to the SAN over Fiber themselves. Essentially they were media agents themselves. Then dedupe was done afterwards. So we could write all the data to the cheap SAN at night. ( yes backup SAN was cheap compared to prod SAN’s) Then the backup servers then would do a dedupe job during the day. The backup servers were doing nothing during the day so allowing them to run the post dedupe was a good use of resources.

    2. Currently I use Avamar. The agents are running dedupe in software as I understand so they only send changed data to the grid. So it’s not taking long for the backup to run. Now maybe for SQL that’s not true… I’ll look into that. But our prod is in a 10gig network so we never expect the backup job to saturate the network. So even if it was pushing all the data,(I don’t see a negative impact. Restore takes what it takes, in the case of prod its not slowed down by the network. I don’t see a restore from a big SAN drive being much faster actually. Our tests so far have been about the same. But we have not done restores of our multi terabyte DB’s. just smaller ones.

    So again my take on your comments is dedupe is bad. It’s bad because you have to send data over the wire. Ok, a restore is going to take what it takes. Now if your using a data domain in the middle to do the backup/restore maybe that has an impact. Some claim it helps but for SQL I gather it may hurt more than help. However straight up Avamar/Comvault restore is not going to be slower because of dedupe. The data is on the GRID/SAN and its being sent back to your agent doing the restore. These are expensive items and they push their data pretty darn fast. Maybe faster than one server can accept to be honest. You should be able to do more than one restore at a time. How much space your eating to store four months of daily full backups is a real concern. Sorry but a unique copy of each full backup sucks.

    Then it seems your saying using native compression is better than dedupe. Umm the agent is doing its own compression when it sends the data to grid/media agent. Maybe native compression is better… I don’t know but compression is still happening, but the agent should be pretty good.

    As to backing up to a file share and then backing that up with Avamar/Comvault. That doesn’t work well because of your compression. Frankly the new backup sees it as a new file and your stuck with a lot of copies of files that will not compress much. That will eat lots of backup disk.

    The real point is how much money you want to spend on backup disk, the usual answer is as little as possible. Lets spend money on prod SAN disk and not need to keep buying more disk for backup.

    Lastly to anyone still using tape I hope is just for long term storage or an aux copy to use just if the disk target dies. A restore from tape is truly slow. That’s not where I want to be. LTO-3 and LTO-4 is all I’ve used in the last few years but its still slower than restore from disk. Tape is for restoring something really OLD or if the disk target died, or god help you your site died and this is the off site tape you are reloading from. Tape is reliable but it should be last resort for restore.

    I summary I’m not sure i agree with your overall premise. My understanding is you want to use third party compression or maybe new native over dedupe and I’m not thinking that is the right answer in an enterprise setup. I’m a little shaky on Avamar SQL backups and the blackmaguc in the background I admit. But with Comvault backups were very quick and so were restores. I’m not a data domain user and not a big believer for their expense. It’s another layer of complexity that had certain uses (multi-site) but not really beneficial for a single site backup restore job.

    Maybe a rewrite with all features the current generations of backup solutions can bring to bear is needed?

    • Sean – I read a lot of “maybe” stuff in here. Have you done any A/B testing between the solutions?

      • No, diffrent environments = new solutions. Haven’t been on one with both Commvault and Avamar in the same environment.

        What I have done is test in each to make sure we can recover from a failure.

        As far as the maybes I can’t argue that. I’m not the person posting a blog either. That’s why I am raising some points and suggest you revisit the issue with the current solutions available.

        Heck for data domain customers the DD software running on the systems wasn’t available in 09. There are so many folks that keep touting its capabilities. I haven’t used it so I’m staying out of that. But I have used pre and post dedupe and have had zero issue with either. I think I like post a little better but I’m fine with software pre/inline.

        • Ah, gotcha. As you noted, the blog post was published in 2009. As much as I’d love to drop my work and do A/B testing with million-dollar-hardware I don’t own, I don’t really have the time or hardware to do that testing. ;-)

          If you do have the hardware (either Commvault or Avamar), it sounds like you’d be a great person to do A/B testing versus native. I’d love to read how that turns out.

          Thanks!
          Brent

        • I’ll second your “Sorry but a unique copy of each full backup sucks” when DBAs want to work well with storage and backup admins. Compressed backups don’t dedupe at all, so that approach doesn’t scale.

          However, I tried to point out in my earlier post that with DD software we had the option to compress on the client side so the DB backup pieces are sent over to the backup server or appliance compressed. That option has been around for years. Alternatively you can do some of the dedupe on the client side using the backup software.

          Based on simulated testing of a month long backup cycle of a 450GB DB I’ve found that “dedupe slows down SQL Server backups” is really “it depends on your environment and picking the right configuration”. I found that it is better than the native backup’s throughput rate and certainly uses a fraction of the backup storage space.

          Akos

  39. Here are my concerns with Avamar in particular. I assume that there are other shops with similar issues.

    1. Avamar does not integrate in T-SQL. All the good work in creating hundreds of jobs and dozens of scripts and stored procedures to automate our tasks are going down the drain. Furthermore we are loosing flexibility our users are used to.
    Desc:. We have 7 tier one applications ech having several databases and around hundred more lower tier production databases. For each we have End User Support copy (helpdesk), QA copy and one or more Dev copies. They all have to have the full dataset, because the support people usually troubleshoot data corruption, data cleansing or other issues that require full data set. All those backups and restores have to happen between late night and early morning hours. (Yes we do use differential backups). Restoring is a BIG DEAL for us and happens daily. Those jobs have many more steps regarding security, AD, and sometimes data manipulation. The disk layout is different than in production so the MOVE clause is always used and all this is fully automated.
    I still don’t know how we are going to do that with Avamar, and EMC is not really helping. They are always referring to GUI, and GUI isn’t good enough for us.

    2. Transaction logs contain TransactionIDs and LSNs that are unique by design. I am skeptical how well the dedup will work for them. Could be wrong, though. Compressing trans logs saves a ton of space and makes the backups really fast.

    3. I feel uncomfortable with having some clients running on my production severs, analyzing data changes on the fly, writing it to the file cache, reading from the file cache… Even if everything is working well in the beginning, how is it going to work in the future with all the patching, service packs etc?
    I understand that the client is there to do de-dup on the client side thus saving the network bandwith and storage space in the same time. That’s a lot of work to be done and on the expense of the database server’s resources. Mind you that those resources include SQL Server licensing costs as well.

    4. EMC is a storage company. Buying a backup solution from a storage company is like buying a car from Exxon.

    If anyone went down this path, I’d like to hear how they managed.

    Thanks.

  40. I’m beginning to think a lot of people are being de-duped by the Dedupe vendors ;-). Hi Brent, great seeing and chatting to you at SQLBits last weekend. One of the things you mentioned was about not using snap manager back ups for NETAPP (we use NETAPP dedupe as well) but using the standard SQL Server transaction log backups on a more regular basis instead and snapping those to stop the IO freeze caused by the snap manager backups. Do you have any further info or links on the best way to achieve this?

    Regards
    Dave

    • Howdy Dave! Yes, if you get the documentation from NetApp on SnapManager, they go into minute detail on exactly how to set this up. It’s the default method of setup for replication with their snaps, too. They have hundreds of pages of documentation on it.

  41. Small question Brent. Regardless whether CPU affinity is good or evil, does SQL take CPU affinity into account when compressing backups?

    • Thierry – I have no idea, hahaha. That certainly isn’t something I’d bother digging into. ;-) But if affinity masking is important to you, that would be a great test to run in your environment. Plus keep in mind that if you use any third party backup software (Litespeed, BackupExec, etc) all bets are off.

  42. Ok, got it :) Thanks Brent for taking the time to reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php