Why Dedupe is a Bad Idea for SQL Server Backups

Last Updated February 13, 2017

Has your SAN admin or CIO been telling you not to compress your SQL Server backups because they’re doing it for you with a dedupe tool like EMC’s Data Domain? I’ve been hearing from a lot of DBAs who’ve been getting this bad advice, and it’s time to set some records straight.

The Basics of Storage Deduplication

Dedupe appliances basically sit between a server and storage, and compress the storage. Sometimes they do it by identifying duplicate files, and sometimes they do it by identifying duplicate blocks inside files. It isn’t like traditional compression because the process is totally transparent to anything that stores files on a deduped file share – you can save an Excel file to one of these things without knowing anything about dedupe or installing any special drivers. In theory, this makes dedupe great because it works with everything.

The key thing to know about dedupe, though, is that the magic doesn’t happen until the files are written to disk. If you want to store a 100 megabyte file on a dedupe appliance, you have to store 100 megabytes – and then after you’re done, the dedupe tool will shrink it. But by that point, you’ve already pushed 100 megabytes over the network, and that’s where the problem comes in for SQL Server.

Dedupe Slows Down SQL Server Backups

In almost every scenario I’ve ever seen, the SQL Server backup bottleneck is the network interface or the drives we’re writing to. We DBAs purposely set up our SQL Servers so that they can read an awful lot of data very fast, but our backup drives have trouble keeping up.

That’s why Quest LiteSpeed (and SQL 2008’s upcoming backup compression) uses CPU cycles to compress the data before it leaves the server, and that’s why in the vast majority of scenarios, compressed backups are faster than uncompressed backups. People who haven’t used LiteSpeed before think, “Oh, it must be slower because it has to compress the data first,” but that’s almost never the case. Backups run faster because the CPUs were sitting around idle anyway, waiting for the backup drive to be ready to accept the next write. (This will really ring true for folks who sat through Dr. DeWitt’s excellent keynote at PASS about CPU performance versus storage performance.)

With dedupe, you have to write the full-size, uncompressed backup over the network. This takes longer – plain and simple.

Dedupe Slows Down Restores Too

The same problem happens again when we need to restore a database. At the worst possible time, just when you’re under pressure to do a restore as fast as possible, you have to wait for that full-size file to be streamed across the network. It’s not unusual for LiteSpeed customers to see 80-90% compression rates, meaning they can pull restores 5-10 faster across the network when they’re compressed – or in comparison, deduped restores will take 5-10 times longer to copy across the network. Ouch.

It gets worse if you verify your backups after you finish. You’re incurring the speed penalty both ways every time you do a backup!

And heaven help you if you’re doing log shipping. That’s the worst dedupe candidate of all: log shipping does restores across one or more SQL servers, all of which are hammering the network to copy these full size backups back and forth.

So Why Do SAN Admins Keep Pushing Dedupe?

Dedupe makes great sense for applications that don’t compress their own data, like file servers. Dedupe can save a ton of backup space by compressing those files, saving expensive SAN space.

SAN admins see these incoming SQL Server backups and get frustrated because they don’t compress. Everybody else’s backups shrink by a lot, but not our databases. As a result, they complain to us and say, “Whatever you’re doing with your backups, you’re doing it wrong, and you need to do it the other way so my dedupe works.” When we turn off our backup compression, suddenly they see 80-90% compression rates on the dedupe reports, and they think everything’s great.

They’re wrong, and you can prove it.

They don’t notice the fact that we’re storing 5-10x more data than we stored before, and our backups are taking 5-10x longer. Do an uncompressed backup to deduped storage, then do a compressed backup to regular storage, and record the time differences. Show the results to your SAN administrator – and perhaps their manager – you’ll be able to explain why your SQL Server backups shouldn’t go to dedupe storage.

In a nutshell, DBAs should use SQL Server backup compression because it makes for 80-90% faster backups and restores. When faced with backing up to a dedupe appliance, back up to a plain file share instead. Save the deduped storage space for servers that really need it – especially since dedupe storage is so expensive.

Playing Around with SQL Azure and SSMS

Our Upcoming Book Cover

184 Comments. Leave new

Glenn Berry
November 16, 2009 7:35 am

Brent,

I completely agree that backup compression is a huge win in nearly every scenario (unless you are under heavy CPU pressure). It really makes initializing a database mirror much quicker and easier. SQL Server 2008 Enterprise Edition already has native backup compression, while SQL Server 2008 R2 Standard Edition will also get it. Of course, SQL Server native backup compression does not have the flexibility of LiteSpeed, it is either on or off.

Reply
- Brent Ozar
  November 16, 2009 3:39 pm
  
  Glenn – yep, I covered R2’s new inclusion of backup compression in Std Edition last week here, and I think that’ll make the dedupe conversation even more common. It’s gonna get ugly with dedupe vendors over this one!
  
  Reply
WIDBA
November 16, 2009 8:25 am

Good stuff, I just sat through the Data Domain dedupe seminar. It really sounds good for file shares,etc as you mention. The DBAs in the room were all thinking that this makes little sense in the database world. I can point to your article if the SAN guys get ornery!

Reply
Jason Hall
November 16, 2009 9:04 am

Thank’s for writing this Brent. When new and expensive technology comes out, there always seems to be a push to use it. The “I just bought this 5 million dollar deduped storage system and you better use it” attitude can wreak havok on a backup infastructure. Isn’t the reason why we backup our databases, so that we can restore them in the event of a disaster? I would think that any technology that dramatically increases our time to recovery would be a negative, but I’m finding more and more DBA’s struggling to fight that battle. The end result of deduped hardware vs. compression may be a reduction in storage utilization, but the efficiency of getting to that end result is significantly more efficient (and cost effective) on both backup and restore.

Just my two cents…

Reply
Kendra Little
November 16, 2009 11:47 am

To be fair to the dedup appliances, you shouldn’t just time the first full backup to it. You need to get a few backups to the dedup appliance completed and then start timing what time regular backups take. In the case of at least the DataDomain appliance, they should become more efficient once they have backups to de-duplicate against.

I do also have to say that some of the snapshotting technology DD can do is pretty cool. I will note that I have not tested restoring from those snapshots, but it sure sounds good. 😉

Reply
- Brent Ozar
  November 16, 2009 3:38 pm
  
  Kendra – the dedupe happens as SQL pushes the backup into Data Domain. The full size of every backup has to be copied across the wire, no matter whether it’s the first backup or the tenth. Dedupe makes SQL backups smaller, but not faster. If you have a system that works otherwise, I’d love to see it.
  
  Reply
  - David
    December 22, 2015 4:21 pm
    
    Posting to a 6 year old comment. I can’t help it. Sorry. Solution is DataDomain DD Boost…Compress and dedupe before packets are shipped over the wire.
    
    Reply
    - Brent Ozar
      December 23, 2015 9:36 am
      
      I’ve heard that – do you have any benchmarks comparing it to native SQL Server compressed backups? I’ve asked that of a few folks and no one seems to be willing to share any.
      
      Reply
      - Fred Shope
        January 14, 2016 9:02 am
        
        If you can be more specific about what kinds of benchmark data you’d like to see, I can try to get it. We moved from tape backups with NetBackup to Data Domain last year, and aside from latency (which we minimize by geographically collocating server and DD storage where possible) it has gone very well. EMC claims DD Boost performs the deduplication before transmitting over the network and I have no doubts that it does; but I’d be curious to see actual real-world results too.
        Our tape backups were SQL-compressed and saved to a local drive, then offloaded to tape on a schedule, so it’s hard to compare the 2.
      - Brent Ozar
        January 14, 2016 3:30 pm
        
        Fred – sure, I’d start with the basic metrics whenever I’m comparing two things: how long do they take? If you do your backup to a fast file share, vs doing it with NetBackup, vs doing it with Data Domain, how long does each one take? Then the same for a restore.
      - Bruce
        January 26, 2016 12:51 pm
        
        I don’t have the benchmarks anymore from when we tested this out using EMC Networker last year, but it did not perform as well as using Litespeed and it was abandoned. The backups were a bit slower but the restores were where the difference was very noticeable. You also had to use the Networker command-line to automate backups/restores, so that required more permissions for the engine service account in Networker. My coworker customized Ola’s backup script so that it was easier to manage and deploy to several systems for testing. We let them (storage team) keep it on some dev machines but not any of our QA/UAT/Prod instances.
      - Alex Bransky
        September 12, 2017 2:50 pm
        
        I just started using DD Boost and my backups are taking 50% of the time and restores are taking 25% of the time (though some of the latter is due to the data being replicated to a local Data Domain now).
      - Brent Ozar
        September 13, 2017 8:15 am
        
        Alex – yeah, that’s why we’ve repeatedly mentioned in comments that we welcome an exact study from someone, but this doesn’t sound like a good comparison, eh?
      - Alex Bransky
        September 13, 2017 12:39 pm
        
        I’ll do some good formal testing and report back when I get a chance, but I can say with certainty that DD Boost is faster than SQL Server’s handling of compressed backups.
      - Brent Ozar
        September 14, 2017 6:21 am
        
        It’s interesting that you can say things with certainty without testing first. 😉
      - Ken
        September 14, 2017 8:50 am
        
        Lets Start Small
        DataDomain w/DDBoost 27G database
        
        1. First Time Backup of Source DB with DataDomain w/DDBoost 00:04:10:02
        2. DataDomain Restore Source DB to (testdb) New Location 00:02:30:10
        3. First Backup of Testdb with DD and DDBoost 00:01:50:06
        Note: you might be asking yourself at this point “why did the first backup of testdb take less than half the time of the first backup of the source db?” The answer is that the DataDomain has “seen” this string of 1’s and 0’s already and therefore did not need to back it up again.
        4. SQL backup to disk uncompressed 03:19:25
        5. SQL Backup to disk Compressed 00:03:25:45
        6. SQL restore from Compressed Disk Backup 00:01:52:59
        7. Restore from DataDomain 00:03:00:00
        8. INSERTED 1.3G of data into TestDb
        9. DD Backup with DDBoost 00:01:49:11
        10. SQL Backup to Disk Uncompressed 00:03:32:54
        11. Restore from DD 00:03:16:14
        12. SQL restore uncompressed from disk 00:03:18:56
        
        Real World examples
        2.5 TB Database
        Change rate of approx 1G per day
        DD Full Backups average (7 day average) 02:45:00:00 (2 hrs 45 min)
        DD Diff Backups average (7 day averge) 00:20:00:00 (20 minutes)
        Last time I had to restore this DB (duplicated to another server for testing) took a little less than 3 hrs.
      - Brent Ozar
        September 14, 2017 9:06 am
        
        Ken – great, thanks for taking the time! Got a few questions:
        
        4. SQL backup to disk uncompressed 03:19:25
        5. SQL Backup to disk Compressed 00:03:25:45
        
        Do those above numbers ring any odd bells for you?
        
        6. SQL restore from Compressed Disk Backup 00:01:52:59
        7. Restore from DataDomain 00:03:00:00
        
        Hmmm, interesting, so what do you suppose those above numbers mean?
      - Ken
        September 14, 2017 8:53 am
        
        Correction: Change rate on real world example is 100G per day, not 1G as stated in post.
      - Ken
        September 14, 2017 1:11 pm
        
        Just re-ran compressed vs., uncompressed (twice) and got similar results. Uncompressed 3:19 and 3:20, Compressed 3:01 and 3:15 – one could assume there is some overhead with compression. These backups are being run to local disks to avoid any network issues.
        I’ve observed the restore of the compressed backup running as low as 59 secs to 2 mins.
        
        On the restore numbers you asked about, keep in mind that during a restore from DD, the backup pieces need to be found/discovered. In this case, that piece of the restore was taking almost 1 minute. Given the small size of the data tested here, that was approx 1/3 of the total recovery time. Given a much larger data set with approx 1 minute to discover the backup pieces, that same one minute is a much smaller fraction of the total restore time. And one other thing that should not go unmentioned here, if you are retaining a large number of backups on your DD for a given DB, that initial discovery time will be longer.
      - Brent Ozar
        September 14, 2017 1:16 pm
        
        Ken – I try to teach via asking questions, but that’s not working, so I’ll cut straight to the point.
        
        If your compressed and uncompressed backups take the same time, that typically means you’re bottlenecked by read speed or by CPU. That means this isn’t a great testbed – you’ll want to spend some time reading Microsoft SQLCAT’s whitepaper on performance tuning your backups.
        
        Second, if restores are slower, then that proves exactly what I discuss in the post.
        
        Thanks for trying though!
      - Ken
        September 14, 2017 2:29 pm
        
        Well this certainly not the best “lab” environment for testing, but certainly, whatever it is, it is for all the scenarios tested. And this box may represent 90% of the SQL servers out in the wild; 2008 R2 enterprise, 16G ram, 64b, un-optimized disks, etc. I did try to stay consistent so whatever bottlenecks are present here, they were present for both the backups and restores, regardless of whether they were compressed or not.
        Secondly, the backup efficiency of the DD with DDBoost vs straight up SQL backups, in my opinion, is undeniable. Regardless of your teaching method, you will have to go a long way to convince me otherwise. Thankfully, I spend a lot more time backing up in our production environment than I do restoring, and if I’m doing my job, that will always be the case. If there is a hit on a DD restore, I’ll gladly accept that over the efficiencies gained with the DD infrastructure.
        And for me, I prefer frank uninhibited exchanges of viewpoints over veiled obfuscated lessons – but then again, we all have individual styles.
      - Brent Ozar
        September 14, 2017 3:29 pm
        
        Ken – sure, check out this post about the importance of restore times:
        
        https://www.brentozar.com/archive/2011/12/letters-that-get-dbas-fired/
        
        While you may not restore often, I’d argue that fast backups are worthless if you can’t restore in time for the business.
      - Ken
        September 14, 2017 7:08 pm
        
        I’ve read the article. You’ll have to trust me, we’re covered 3 ways to Sunday.
      - Brent Ozar
        September 15, 2017 11:19 am
        
        I don’t have to trust you. 😉
      - Ken
        September 15, 2017 11:55 am
        
        True, nor should you want to.
Merrill Aldrich
November 16, 2009 12:56 pm

Hey Brent – I just went through this exact issue this year with my team. I would qualify this a little, by phrasing it this way: “the common advice about NOT using compression for SQL backups that go into a dedupe store is probably bad advice.” We do use *both* backup compression and a dedupe archive (+ remote mirror)together, and it works well. Here’s the thing: compressed data fed into the dedupe process generally doesn’t deduplicate as well as other data, because it’s very unlikely to match up against other unrelated bits that are already in there (which is how dedupe works). BUT, it can dedupe against prior versions of the compressed backup files that haven’t changed entirely. Example: take a full, compressed SQL backup on Monday, then another on Tuesday where only a small portion of the database has changed – lots of blocks in the file do match up the second time. So you take a big hit for the first file, but maybe not quite so bad the second and later iterations. We had to actually prove this out by testing it with real data from real compressed SQL backups over a couple of weeks when we first got the dedupe system.

Also, related to the speed issue: you are exactly correct. We always back up to a disk available directly to Windows (local or SAN attached), with compression, to get a fast backup. The resulting files are then archived in a dedupe store, by backup software, before being deleted from the local disk at some later time. If we need to perform a restore, we have recent files sitting right at the server. Only older backups would need to come out of the archive and take the restore speed hit.

Reply
- Doug
  July 4, 2014 12:41 am
  
  +1 to Merrill, I backup databases with native compression and they are written to a Data Domain. No the deduping isn’t as efficient as if they were uncompressed but it still does an amazingly impressive job.
  
  Reply
Nick Weber
November 16, 2009 3:30 pm

Brent,

Great read!

Just one quick correction and a comment. DataDomain actually does Dedupe on the fly “inline Deduplication” so it only writes Dedupe data to disk. While other products like Avamar do a post base Dedupe after the data is written to disk. Even though DataDomain is a inline Dedupe it has been throttled to only transfer data as fast as it can Dedupe or cache it in memory. I’m also in 100% agreement with you regarding backing up directly to Dedupe targets. My initial backup is to a large Raid 10 (rdm) attached to a VM, then CommVault Simpana 8 runs nightly to backup the raid 10 (rdm) and off-site it and Dedupes it a bit.

Reply
- Brent Ozar
  November 16, 2009 3:35 pm
  
  Nick – when you say online dedupe, are you saying SQL won’t have to push 100 megs of data across the wire? That’s not how I understand Data Domain to work. That would require running a driver or app on the SQL Server and deduping the data on the SQL Server, correct?
  
  Reply
  - none
    March 16, 2015 2:14 pm
    
    Inline (not online) dedupe (as in dedupe while it’s receiving data, not on a scheduled basis afterwards).
    
    Reply
Nick Weber
November 16, 2009 3:55 pm

With an inline or a most post-base Deduplication you would still need to transfer all the data across the line. Datadomain Dedupes the data as it enters the unit and only writes Dedupe data to disk. While most post-base Dedupe sends all the data to disk and then Dedupes it after the data is living on the target disk. A big pitfall with the post-base it will require a lot more disk to keep the full copy while it creates the Dedupe copy, also restore times are terrible. The nice thing about DataDomain is their fixed bandwidth rate includes Dedupe time up and down but it is limited. We use CommVault Simpana 8 here with their new Dedupe feature. I’m not going to say it’s the cat’s meow but I would strongly recommend it over Avamar.

With Avamar you would not need to push the whole 100 megs over the wire, but it also requires a agent on the box constantly taking to the master Dedupe data base. Also has some of the worst restore times. If restore times are not important to you then this is a wonderful product.

Nick

Reply
- Brent Ozar
  November 16, 2009 4:04 pm
  
  Okay, great. Just to close the loop on this so future readers understand, you’re saying you agree with my statements that backing up to Data Domain is not faster, right? And in fact, if the DBA is required to send unvompressed data to Data Domain, it’ll be even longer than if they were doing compressed backups to a regular file share, right? Just wanna make it absolutely clear to readers – I understand what you’re saying about Data Domain deduping “inline”, but that means after the full size is already sent over the network, and at that point it’s too late for the DBA.
  
  Reply
- Ken
  September 12, 2017 7:29 pm
  
  Not with DDBost, only changes are sent across the wire
  
  Reply
Nick Weber
November 16, 2009 4:06 pm

I’m in 100% full agreement with you!

Reply
- Brent Ozar
  November 16, 2009 4:07 pm
  
  Ok, cool! Thanks for following up so fast! Have a good one!
  
  Reply
alen
November 17, 2009 12:59 pm

i ran deduped SQL backups for almost a year in parallel with tape backups. In the end we dumped the dedupe system and spent $$$ to upgrade to LTO-4 tape. the dedupe had some amazing compression and it was a lot better than tape. but LTO-4 tapes are $55 for 1.6TB tapes which in the real world come out to almost 3TB of storage per tape.

Reply
- Brent Ozar
  November 17, 2009 10:23 pm
  
  Ouch! Was it strictly a cost issue when you got rid of dedupe?
  
  Reply
alen
November 17, 2009 1:06 pm

forgot to mention, we used a post dedupe system and it actually used SQL 2005 Express as its backend. even had replication functionality built in. and my favorite was when the dedupe tasks ran, it would create a storm of blocking and a lot of the dedupe tasks would fail resulting in more space used. i tried to schedule it and it kind of worked, but never completely.

Reply
alen
November 18, 2009 9:18 am

cost was part of it. Over 3 years I think the estimate was 100TB – 200TB of disk at two locations. and us DBA’s never trusted the disk and trusted tapes a lot more than disk.

with netbackup a restore to a different server was a 1 step process. with i365 it was 2 steps and several hours longer than with Netbackup.

Reply
Anand Shah
December 14, 2009 7:12 pm

Same here…

We were offered de-dupe technologies… about a year back… and when I went into the technicals… i thought to myself… that it is not de-dupe… but dupe…

I then went ahead with an open source version of quest litespeed… called lzop compression….

it basically reduces a 25 gb full / transactional backup to 6 gb within a matter of 3-4 minutes…

http://www.lzop.org/

There are windows binary links from those sites…

Reply
- Brent Ozar
  December 15, 2009 6:18 am
  
  Anand – so that we’re clear, LZOP is in no way, shape, or form an “open source version of Quest LiteSpeed.” All it is is a file compression engine, and LiteSpeed has a lot more than that.
  
  Who do you call for support?
  
  Does the author keep it up to date? I read the “News” section of the site, which says:
  
  “As of Oct 2006 there has not been any problem report, and 1.02rc1 will get released as official version 1.02 whenever I find some spare time.”
  
  That doesn’t sound like the kind of product I want to trust my mission-critical SQL Server backups with, and frankly, if I was hiring a DBA, I would avoid candidates who made choices like that. Something to think about…
  
  Reply
Anand Shah
December 16, 2009 11:25 pm

Brent – I do agree that lzop is only a compression engine and is not an exact copy of quest litespeed… but if you take the concept of fast compression (less than 1/2 the time taken by 7z) for multiple number of databases… it is much better to do that… and then send it over the wire to a secondary location, than to send a whole database or transaction logs…

I wonder anybody calls the creators of zip software for basic support… all of these compression technologies are open source… so if you are not happy with a particular feature… go ahead… make it better…

There might be many people in the market… who may not be able to purchase quest LiteSpeed… or similar such paid software… for them this is a very good choice…

frankly, I wonder if you can decide a DBA based on the compression software they use… whether it be zip, arc, . I am just providing my opinion based on what I did when a choice was available for me to invest anywhere between 10-20 grand on a de-dupe…

As much as sql servers are based on set theory… there is a set of non-DBAs out there in the world, who are part-time DBAs by choice… I do not think they will ever look for a DBA employer… because they are happily doing other things…

A DBA is not on the required list of requirements to use SQL server…

Reply
- Brent Ozar
  December 17, 2009 7:04 am
  
  Anand – as someone who works for a software vendor, I can assure you that yes, people do indeed need support. I wish there was such a thing as bug-free software, but I haven’t seen it yet.
  
  You said: “all of these compression technologies are open source… so if you are not happy with a particular feature… go ahead… make it better…” That one argument alone shows that I can’t convince you of why open source isn’t the right answer for everybody. I wish you the best of luck with that software, though.
  
  Out of curiosity – if you’re such a big fan of open source, why use SQL Server? Why aren’t you using MySQL? After all, if it doesn’t have a feature you need, “go ahead… make it better…” 😉
  
  Reply
Craig
February 2, 2010 2:34 pm

This is a pretty decent article, and sums up nicely some of the frustrations I come into contact with on a day-to-day basis, but it totally ignores systems with deduplication on client AND server. Systems that do this (to minimize both storage requirements, AND network traffic) are more efficient than either traditional copmression (with a small window size: eg. LZ usually has a working range of about 32KB) or non-deduped data going out across the network.

It’s hard to get some people to understand detail. This article does it pretty well.

C

Reply
Doug
February 17, 2010 8:32 am

While I agree completely, we are currently testing de-duplication devices (DataDomain’s) with SQL server backups – uncompressed and compressed via Litespeed. While doing some investigation, I discovered Quest doesn’t supported Litespeed when used with de-duplication technologies. It’s listed on Quest’s support under solution SOL49562. Here’s the direct link – https://support.quest.com/SUPPORT/index?page=solution&id=SOL49562. You’ll have to log in to using your support ID to view it (which you should have if you use Litespeed). For your convenience, here’s the case…

Solution SOL49562
Title
Using de-duping technology (EMC Avamar, DataDomain, NetApp) with LiteSpeed
Problem Description

De-duping technology like EMC Avamar, DataDomain, or NetApp is not supported for LiteSpeed backup files regardless of whether the file is encrypted or not.

Cause
The data is not always in the same place from backup to backup.

Resolution
Unsupported software/platform.

Environment
Product: LiteSpeed for SQL Server
Attachments:
Server OS: Windows – All
Database: SQL Server – All

Reply
- Brent Ozar
  February 18, 2010 11:38 am
  
  Hi, Doug! I sent this over to the support department and they’re trying to clarify that article. The person who wrote that article is no longer with Quest, and we’re pretty sure it’s incorrect. We’re tracking it down to make sure.
  
  We can’t “support” dedupe appliances in the sense that we don’t have any in-house, and we don’t test with them. We’re not aware of any issues with them, though.
  
  Reply
Craig
February 18, 2010 2:03 pm

TSM 6.2 (just announced) does client and server-side dedupe.

Best of both worlds.

C

Reply
- Brent Ozar
  February 18, 2010 2:06 pm
  
  Craig – just to be clear, TSM 6.2 was just announced for first delivery next month. I look forward to reading how it performs with SQL Server. I can’t seem to find any documentation on that, though.
  
  Reply
Craig
February 18, 2010 11:56 pm

Hi Brent,
TSM infocenters (that is, the online TSM manuals) are usually only enabled on the day of general availability.

I am not in the loop on it, but I’ve no reason to expect anything different for the 6.2 release.

For info, the TSM v6.1 infocenter is over here http://publib.boulder.ibm.com/infocenter/tsminfo/v6/index.jsp – usually a new URL would be used for each new infocenter.

The TSM v6.1 server-side deduplication arrangement is described in a redbook at http://www.redbooks.ibm.com/abstracts/sg247718.html – this is likely to be one half of the setup in the 6.2 design.

C

Reply
Syed Abdul Majeed
May 3, 2010 1:56 am

Hi Brent,
I’m a bits and pieces IT manager. We have SAP ECC 6 instances running on Vsphere4 on Hp Blades and EMC AX-4
iSCSI array. The DB is IBM DB2 and Oracle 11i on RHEL 5.2.As we are in initial stages, we tend to take snapshot of every stage.
My pain points are:
1) VM snapshots are occupying humongous disk space.
2) I’m planning a DR based on VMware SRM.

My questions are:
1)can dedupe help me in reducing the size of vm’s at the DR site ( i’m talking about a production VM replicated at DR)

My storage vendor is proposing netapp 2040 at production and 2020 at DR. He says that this solution will dedupe data on production storage and compresses it and sends it across to DR. The results are very less space requirements on DR Storage, reduced WAN bandwidth consumption and zero downtime.

Can you throw some light on it and help me in building a right DR.

Thanks in Advance.

Regards,
Syed.

Reply
- Brent Ozar
  May 3, 2010 7:10 am
  
  Syed – you’ve got a lot of good questions in there, but it’s beyond what I could accomplish in a blog comment. Your best bet is to get a trusted consultant involved. Ask your storage vendor for local references that you can talk to. Hope that helps!
  
  Reply
Udi
July 25, 2010 7:18 am

You are not accurate about dedupe basics. For example in IBM / Diligent ProtecTier the dedupe in inline. This means backing up 100MB does NOT mean you need 100MB storage!!! The dedupe is done on the fly. Please get the facts in order…

Reply
- Brent Ozar
  July 25, 2010 7:28 am
  
  Hi, Udi. If you reread my article carefully, it’s not about the storage space – it’s about pushing the 100MB over the network. Here’s the quote:
  
  “But by that point, you’ve already pushed 100 megabytes over the network, and that’s where the problem comes in for SQL Server.”
  
  Yes, ProtecTier does the dedupe “inline”, but that inline is happening in ProtecTier, not in SQL Server. SQL Server has to send all 100MB over to ProtecTier first, and that’s what takes so long. Granted, it doesn’t take long for 100MB, but most of my clients have databases over a terabyte, and it does indeed take quite a while to shuffle that much data over to ProtecTier. Other solutions handle the compression on the SQL Server before writing them out to storage (or a dedupe appliance.)
  
  I do appreciate your enthusiasm though! Let me know if I’m missing something there. Thanks for your help and clarification.
  
  Reply
  - Udi
    July 25, 2010 1:01 pm
    
    Thank you. you are absolutely right.
    In any event my customers are backing up ms sql to VTL/Dedupe which mean they use the ProtecTier to not only dedupe but as a VTL. They stream all data types and cannot decide to use other method just for ms sql…
    Its not in my power to influence on their IT decision making – I have to deliver good dedupe ratio.
    
    Do you have any good recommendation?
    
    Reply
    - Brent Ozar
      July 25, 2010 1:04 pm
      
      Well, I’ve got bad news – the entire point of this article was to explain why dedupe doesn’t make sense for MSSQL backups (or Oracle, for that matter). You’re asking me for good recommendations for your employer (IBM, according to your LinkedIn profile) because your customers already bought something that doesn’t work for their needs. (I’m guessing that’s why you’re here on my blog.)
      
      Unfortunately, it’s too late for us to help them. Your best bet is to push IBM to putting software on the server itself to reduce data sent over the wire to your appliances. Sorry, and I wish I had better answers there.
      
      Reply
Alen
August 2, 2010 2:02 pm

just bought some LTO-4 tapes a month ago for $35 each. every time i mention this to the dedupe/d2d software sales people they don’t bother to try to convince me and hang up.

i’ve only used post dedupe which had a SQL 2005 express backend and one thing i noticed is that the dedupe process was a major PITA. this was due to plain old low tech database locks. i had to schedule the dedupe jobs for servers to not run at the same time because they would block each other.

the way i found this was that the dedupe server ran out of storage space a few months before it was scheduled to do so. with tapes it wouldn’t be a big deal, $700 for 20 LTO-4 tapes. with disk it meant $5000 mininum expense plus telling the facilities people we need more power. took me a week or two of work to get the space back

Reply
SQL_Padre
October 13, 2010 2:22 pm

I am a DBA for a company that is “forced” to use Symantec NetBackup as my SQL Backup soluiton. And I am not he Backup Admin so I have no control over my own backups. I would much rather use another 3rd party solution that inlcudes compression (becuase we are using SQL2005), but their “big” argument is deduplication of files. I think NetBackup does great for file servers, but I have not been impressed with it as a SQL backup solution.

Do you have any experience with Symantec NetBackup? If so would this information pertain to NetBackup as well?

Reply
- Brent Ozar
  October 13, 2010 2:26 pm
  
  Howdy! Yes, I’ve got experience with NetBackup. The NetBackup agent may enable you to use compression in conjunction with SQL Server 2008 & 2008 R2, both of which offer compressed backup streams now. If you’re not on 2008, though, you’ll suffer these same problems.
  
  Reply
Alen
October 13, 2010 2:29 pm

i use netbackup as well. depending on the tapes you use is the type of compression you get. we use LTO-4 tapes. the official numbers are 800GB uncompressed and 1.6TB compressed data per tape. i have tapes with close to 3TB of data on them. i asked about it and was told that the compression ratio varies and the advertised numbers are lower than a lot of people see in the real world.

we also do some server/disk backups to a file share like most DBA’s for DR and to transfer data to QA. the speed of our tape backups are 2 to 3 times faster than the native SQL backups. and i know for a fact that we haven’t pushed our tape robot to it’s fastest speed since the NIC and disk system is our limiting factor so far.

Reply
Nacho.
December 21, 2010 6:01 am

Hi all, cool note. i have avamar 5, and for file system it rocks, but we have an incident with avamar that they can not resolve is that a 456 gb db of sql 2005 takes over 24 hs to restore and the avamar server after 24 hs cancel it with a time out. coul be a dedup issue??

Reply
- Brent Ozar
  December 21, 2010 6:10 am
  
  Nacho – it might, but it’s too tough to tell from here. You’ll want to call for support with Avamar.
  
  Reply
  - Nacho.
    December 21, 2010 7:43 am
    
    i have avamar on the phone for the past 25 days.
    
    Reply
    - Brent Ozar
      December 21, 2010 7:45 am
      
      Nacho – ouch – in that case, it’s probably not realistic to expect the public to do a better job. If they can’t support their own product after escalating it for 25 days, that tells you something right there.
      
      Reply
alen
December 21, 2010 2:44 pm

when we tested SQL backups with i365 to disk we had the same restore issue. takes a lot longer to restore from disk when it’s deduped than from tape

Reply
Ray
March 9, 2011 6:27 pm

This is a great thread. I’d be curious what your thoughts are around leaving LiteSpeed compression on but turning LiteSpeed encryption off and still using Data Domain as the backup target. This way the amount of data traversing the network is minimized, you’ll still get some additional compression/dedupe in Data Domain on top of the LiteSpeed compression and your backups will be hardware encrypted and replicated to a remote site.

Reply
- Brent Ozar
  March 10, 2011 2:52 pm
  
  Ray – since space is at a very expensive premium when using Data Domain, it doesn’t usually make sense to land SQL Server backups on Data Domain to get a very small percentage of compression. It’s not a good ROI on storage expenses, and it’s easier/cheaper to use conventional SAN replication. Hope that helps!
  
  Reply
Hmm
March 30, 2011 9:58 pm

So much concern about network bandwidth limiting the backup speed… but in a 10Gb/s LAN or upcoming FCOE network the read capacity of the SQL server spindles becomes the bottleneck. Dedup do have real cost savings and you should have your network fixed first 🙂

Reply
- Brent Ozar
  March 30, 2011 10:01 pm
  
  Generally, people who are trying to save a small amount on backup space don’t spend a large amount on 10Gb Ethernet. 10Gb is still staggeringly expensive compared to gig. If you’ve already put in the massive investment for 10G, though, then dedupe makes sense – but that’s an awfully small percentage of companies, wouldn’t you agree?
  
  Reply
Confused
June 24, 2011 9:00 am

Great post. Learnt a lot from DBA perspective. I am a backup guy so I guess I am auto-fitting into your mentioned Storage Admin role : )

One point to consider is – while your view on the first or single backup is very valid (regarding the trade-off between bandwidth & CPU cycles), I am wondering is when we are doing multiple backup on the same database and need to retain them on online? Won’t the subsequent backup be benefits by already having majority of the database contents stored?

Plus, the EMC Data Domain already is supporting a “Boost” protocol – in effects, all data leaving the DB server is already in deduplicated format (requirement – the DB server has to run as a Media Server). Would this neturalized the bandwidth concern as well?

Reply
- Brent Ozar
  June 24, 2011 9:02 am
  
  Patrick – about multiple backups of the same database, no, there’s no reduction of network traffic just because a bunch of duplicate data already exists on the Data Domain appliance.
  
  The Boost protocol is something else, though – by doing that, it may reduce the bandwidth concern as long as wherever you restore the database is also a Media Server. That’s rarely the case in QA/dev/DR environments because licensing ain’t free. Cha-ching…
  
  Reply
WAN accel
September 12, 2011 4:46 pm

I too would hate to lose our SQL backup compression, as we saw backup times drop to 40% of pre-2008 times. Our SAN admins are pushing hard toward dedupe, but more so to Riverbed WAN accelerator appliances.

These accelerators work on the same dedupe ideas (and thus will not work well with compressed SQL backups). I assume this is similar to the aforementioned Boost protocol, but Media Server not required.

So my question is… would the benefits of both a dedupe device and Riverbed accelerators trump the cons of uncompressed backups?

Reply
Mike
January 28, 2013 12:04 pm

A lot changes in our world over 3+ years, and I wonder if you still have the same opinion as you did in 2009? My background is that of a storage architect, but I’ve been fortunate to fill the shoes of a server admin, storage admin, and even DBA over the past couple decades. (OK, I admit, I’m a rank amateur on the DBA front.) Disk and network continue to drop in price (e.g., increase in availability), and yet our businesses continue to saturate whatever we’ve bought.

Most enterprises I’ve consulted in are more worried about the cost of disk than the cost of network, and where I see the benefit of dedupe in your example is in long-term storage cost. Acknowledging that client-side compression will result in a much smaller initial backup set, you also have to pay attention to the long-term and incremental storage cost savings of dedupe. The longer you retain data, the greater your savings will be.

Specific to the performance, “Confused” already mentioned DD Boost (specific to Data Domain) that dedupes before sending over the wire. Licensing cost may be a concern, as you mentioned, but that just brings us back to dollars and cents–and hopefully the business is willing to spend what is required to get business done. Maybe it’s not feasible to license non-production environments to take advantage of available technology, but then maybe it’s not feasible to expect non-production environments to have production performance.

In the end, I think it’s a matter of the business deciding where it wants to pay for its ability to conduct business.

Reply
- Brent Ozar
  January 28, 2013 12:06 pm
  
  Just so we’re clear – so you’re saying you would spend more for DD Boost in order to get the same performance that you get out of the box with SQL Server for free?
  
  Why pay at all? That’s the question I’ve got. If you want dedupe, back up your SQL Server to a deduped file share. Done. You don’t need expensive agents to do that.
  
  Reply
  - Mike
    January 28, 2013 1:56 pm
    
    I think we’re on different pages. Your original post identified the network as the bottleneck for backing up to a dedupe device. DD Boost is a plugin that dedupes the backup data before the data hits the network (and then it’s deduped again, against all data on the device once it arrives for increased effectiveness). The benefit could be compared to what you get from server-side compression but also having dedupe on the back end.
    
    But you’re right, if you don’t need to reduce the cost of disk, there’s no point to pay for the cost of deduping to begin with. I could have avoided bringing up cost altogether, because cost is like that proverbial balloon: when you squeeze it at one end it’s going to affect the other end to some degree.
    
    My intended question could be restated as such: If the problem you identified with MSSQL + dedupe is that 100% of the data needs to traverse the network, if that problem is removed then do you believe that dedupe is still a bad idea for SQL Server backups?
    
    Reply
    - Brent Ozar
      January 28, 2013 2:01 pm
      
      Absolutely – why spend more for something you already get for free in the SQL Server box? Plus, I haven’t seen restore times for dedupe appliances be the same as restoring from regular file shares. Your mileage may vary, though.
      
      Reply
Ej
February 4, 2013 10:23 am

Hi,

We just recently started using Avamar to backup everything including an hourly SQL server transaction log backup. The issue we’re having is that since the transaction file (LDF) file keeps growing; it seems like the Avamar backup is not committing the backup. Do you think this issue is related to the dedupe issue on SQL Server? If not, any idea?

Reply
- Brent Ozar
  February 4, 2013 10:32 am
  
  EJ – unfortunately I’m not familiar with the mechanics of what Avamar is doing. At first glance it sounds like it’s only doing full backups, not log backups.
  
  Reply
najm
March 7, 2013 9:01 am

Hi Brent,
I really enjoy your articles.
Is this article still applicable to data domain and deduping? My SAN people think technology have evolved and it is no longer valid. I would appreciate your input.

Thanks,
Najm

Reply
- Brent Ozar
  March 7, 2013 9:03 am
  
  What parts of the article do they believe have changed?
  
  Reply
  - Akos
    October 10, 2013 7:15 am
    
    Sorry Brent, you were (in 2009) and still are wrong on many levels. A reasonable DBA will test/evaluate this to see how the benefits of dedupe, with say Data Domain, apply in his/her environment.
    
    1. It doesn’t slow down backups/restores: the DB can still read parallel and the optional backup agent can compress the streams during transfer. The backup node can write it to DD uncompressed to get the benefit or dedupe and compression on the appliance. You can also transfer the backup over Fibre Channel. DD has optional modules to perform part of dedupe on the client.
    
    2. Client application compression: in some limited scenarios this could beat a dedupe appliance when using a very short backup retention. Note that the appliance will not only compress your backups but it will remove redundant blocks from the backups before compressing them. This obviously means that the dedupe appliance will yield the best ratio in most DB backup scenarios. Yes, there are exceptions, a large portion of DB blocks change regularly or you need to use TDE, where dedupe won’t provide any benefit.
    
    Of course if you don’t pick the right configuration the dedupe appliance/software won’t provide the features your company paid for and you’re wasting resources.
    
    Akos
    
    Reply
    - Brent Ozar
      October 10, 2013 7:17 am
      
      Akos – by all means, I’d encourage you to test it in your own environment and see how it works out for you. Your comments still talk about compressing on the dedupe appliance side, which as I discussed in the post, means that you’re sending more data to the appliance, and that’s slowing down the network transfer rates. If you take away fiber bandwidth for reads in exchange for writing to the appliance, then you’re robbing Peter to pay Paul.
      
      Reply
Andre
March 14, 2013 5:32 pm

As someone who works for a dedupe storage vendor, our position is simple. We encourage all of our prospective customers to conduct an eval, and go from there.

A couple of folks mentioned the low cost of tapes. If you already have a tape solution in place that fits your needs, by all means keep using it. You’re not the type of customer we are trying to sell to.

On the other hand, when that giant tape library of yours gets EOL’d by the vendor, or your hardware lease is up for a 20yr minimum renewal, or managing the tapes/retention/disaster recovery across X number of branch offices gets painful, or you get any other reason to be in the market for 21st century technology, call us and we’ll go from there.

And yes, more often than not, transitioning to dedupe disk storage will require doing things differently than they were done in the past, and may well run into other infrastructure bottlenecks that went unnoticed before. Hence try before you buy.

Reply
Brendan Bartlett
August 19, 2013 11:00 am

The problem with the Try-before-you-buy approach is that it assumes you will be able to compare apples to apples with the dedupe storage vs traditional storage. We went with EMC Avamar but the problem I have with it is it doesn’t tell me the size of the deduped object on the Avamar grid but instead provides a dedupe rate.

Truthfully I believe “dedupe rates” to be a very misleading metric. It’s like going to the grocery store and using the customer card to get the sale prices where they say “We saved you $12!!!!’ I’m not as interested in how much I saved as I am in how much I paid.

Backing up an uncompressed file to Avamar is just going to have a better dedupe rate than a compressed one. Did you actually save any expensive Avamar grid space? How would we know?

Given that it Avamar is matching blocks with previous backup files then it might be more honest to say the backup file’s size is a combination of one or more previous backups. Then again I can see it being confusing and a better way to report it would be the size of the new data and then also report the size of the new data combined with whatever previous data is required to restore it.

I guess maybe it depends on why you want the numbers. If you want them to prove how much Avamar is saving you then the dedupe rate is an excellent metric. If you are looking in order to know how much Avamar grid space is being consumed by your SQL backups then it’s useless.

Reply
Jason Yanoff
September 25, 2013 1:48 pm

Nice Article. We are considering switching from CommVault to Networker w/ DD boost to our Data Domain.. Can you comment on trying to send a 750 GB (Compressed) .BAK file across a WAN to a hotsite for DR? Our issue is that our DBA’s don’t have any replication and we rely 100% on tape for recovery. The average speed of a tape restore is about 30 GB an hour. I was hoping to use either VDI or VSS to improve the situation but I am getting much resistance from the DBA’s. Have you considered the fact that DDBoost with Data Domain allows the SQL box to transfer directly to the Data Domain over fibre. If you consider the significant improvement in bandwidth transferring over 10G fibre vs Cat6 (network) the loss of time from the dedupe may be a wash.. Thoughts?

Reply
- Brent Ozar
  October 10, 2013 7:18 am
  
  Jason – there’s a lot of different questions in here. I’m not sure what you mean by 10G fibre – there’s 10Gb Ethernet, and that might be what you’re thinking of.
  
  Reply
Conrad Jones
October 27, 2013 5:40 am

Exactly the same as why using Fast Compression on Norton Ghost was quicker than uncompressed. The (otherwise idle) CPU does the compression quicker than the network AND remote disk can.

Reply
Sean from Chicago
January 29, 2014 12:04 am

Good read.

1. I’ve used CommVault and I loved that product. I used a large SAN target. Wrote the backups from agents on the servers with the data to the Media agents connected to the SAN target. In the case of some large servers they had Access to the SAN over Fiber themselves. Essentially they were media agents themselves. Then dedupe was done afterwards. So we could write all the data to the cheap SAN at night. ( yes backup SAN was cheap compared to prod SAN’s) Then the backup servers then would do a dedupe job during the day. The backup servers were doing nothing during the day so allowing them to run the post dedupe was a good use of resources.

2. Currently I use Avamar. The agents are running dedupe in software as I understand so they only send changed data to the grid. So it’s not taking long for the backup to run. Now maybe for SQL that’s not true… I’ll look into that. But our prod is in a 10gig network so we never expect the backup job to saturate the network. So even if it was pushing all the data,(I don’t see a negative impact. Restore takes what it takes, in the case of prod its not slowed down by the network. I don’t see a restore from a big SAN drive being much faster actually. Our tests so far have been about the same. But we have not done restores of our multi terabyte DB’s. just smaller ones.

So again my take on your comments is dedupe is bad. It’s bad because you have to send data over the wire. Ok, a restore is going to take what it takes. Now if your using a data domain in the middle to do the backup/restore maybe that has an impact. Some claim it helps but for SQL I gather it may hurt more than help. However straight up Avamar/Comvault restore is not going to be slower because of dedupe. The data is on the GRID/SAN and its being sent back to your agent doing the restore. These are expensive items and they push their data pretty darn fast. Maybe faster than one server can accept to be honest. You should be able to do more than one restore at a time. How much space your eating to store four months of daily full backups is a real concern. Sorry but a unique copy of each full backup sucks.

Then it seems your saying using native compression is better than dedupe. Umm the agent is doing its own compression when it sends the data to grid/media agent. Maybe native compression is better… I don’t know but compression is still happening, but the agent should be pretty good.

As to backing up to a file share and then backing that up with Avamar/Comvault. That doesn’t work well because of your compression. Frankly the new backup sees it as a new file and your stuck with a lot of copies of files that will not compress much. That will eat lots of backup disk.

The real point is how much money you want to spend on backup disk, the usual answer is as little as possible. Lets spend money on prod SAN disk and not need to keep buying more disk for backup.

Lastly to anyone still using tape I hope is just for long term storage or an aux copy to use just if the disk target dies. A restore from tape is truly slow. That’s not where I want to be. LTO-3 and LTO-4 is all I’ve used in the last few years but its still slower than restore from disk. Tape is for restoring something really OLD or if the disk target died, or god help you your site died and this is the off site tape you are reloading from. Tape is reliable but it should be last resort for restore.

I summary I’m not sure i agree with your overall premise. My understanding is you want to use third party compression or maybe new native over dedupe and I’m not thinking that is the right answer in an enterprise setup. I’m a little shaky on Avamar SQL backups and the blackmaguc in the background I admit. But with Comvault backups were very quick and so were restores. I’m not a data domain user and not a big believer for their expense. It’s another layer of complexity that had certain uses (multi-site) but not really beneficial for a single site backup restore job.

Maybe a rewrite with all features the current generations of backup solutions can bring to bear is needed?

Reply
- Brent Ozar
  January 29, 2014 8:18 am
  
  Sean – I read a lot of “maybe” stuff in here. Have you done any A/B testing between the solutions?
  
  Reply
  - Sean from Chicago
    January 29, 2014 8:38 am
    
    No, diffrent environments = new solutions. Haven’t been on one with both Commvault and Avamar in the same environment.
    
    What I have done is test in each to make sure we can recover from a failure.
    
    As far as the maybes I can’t argue that. I’m not the person posting a blog either. That’s why I am raising some points and suggest you revisit the issue with the current solutions available.
    
    Heck for data domain customers the DD software running on the systems wasn’t available in 09. There are so many folks that keep touting its capabilities. I haven’t used it so I’m staying out of that. But I have used pre and post dedupe and have had zero issue with either. I think I like post a little better but I’m fine with software pre/inline.
    
    Reply
    - Brent Ozar
      January 29, 2014 8:42 am
      
      Ah, gotcha. As you noted, the blog post was published in 2009. As much as I’d love to drop my work and do A/B testing with million-dollar-hardware I don’t own, I don’t really have the time or hardware to do that testing. 😉
      
      If you do have the hardware (either Commvault or Avamar), it sounds like you’d be a great person to do A/B testing versus native. I’d love to read how that turns out.
      
      Thanks!
      Brent
      
      Reply
    - Akos
      January 29, 2014 9:57 am
      
      I’ll second your “Sorry but a unique copy of each full backup sucks” when DBAs want to work well with storage and backup admins. Compressed backups don’t dedupe at all, so that approach doesn’t scale.
      
      However, I tried to point out in my earlier post that with DD software we had the option to compress on the client side so the DB backup pieces are sent over to the backup server or appliance compressed. That option has been around for years. Alternatively you can do some of the dedupe on the client side using the backup software.
      
      Based on simulated testing of a month long backup cycle of a 450GB DB I’ve found that “dedupe slows down SQL Server backups” is really “it depends on your environment and picking the right configuration”. I found that it is better than the native backup’s throughput rate and certainly uses a fraction of the backup storage space.
      
      Akos
      
      Reply
DBA's against Avamar
May 22, 2014 12:49 pm

Here are my concerns with Avamar in particular. I assume that there are other shops with similar issues.

1. Avamar does not integrate in T-SQL. All the good work in creating hundreds of jobs and dozens of scripts and stored procedures to automate our tasks are going down the drain. Furthermore we are loosing flexibility our users are used to.
Desc:. We have 7 tier one applications ech having several databases and around hundred more lower tier production databases. For each we have End User Support copy (helpdesk), QA copy and one or more Dev copies. They all have to have the full dataset, because the support people usually troubleshoot data corruption, data cleansing or other issues that require full data set. All those backups and restores have to happen between late night and early morning hours. (Yes we do use differential backups). Restoring is a BIG DEAL for us and happens daily. Those jobs have many more steps regarding security, AD, and sometimes data manipulation. The disk layout is different than in production so the MOVE clause is always used and all this is fully automated.
I still don’t know how we are going to do that with Avamar, and EMC is not really helping. They are always referring to GUI, and GUI isn’t good enough for us.

2. Transaction logs contain TransactionIDs and LSNs that are unique by design. I am skeptical how well the dedup will work for them. Could be wrong, though. Compressing trans logs saves a ton of space and makes the backups really fast.

3. I feel uncomfortable with having some clients running on my production severs, analyzing data changes on the fly, writing it to the file cache, reading from the file cache… Even if everything is working well in the beginning, how is it going to work in the future with all the patching, service packs etc?
I understand that the client is there to do de-dup on the client side thus saving the network bandwith and storage space in the same time. That’s a lot of work to be done and on the expense of the database server’s resources. Mind you that those resources include SQL Server licensing costs as well.

4. EMC is a storage company. Buying a backup solution from a storage company is like buying a car from Exxon.

If anyone went down this path, I’d like to hear how they managed.

Thanks.

Reply
Wiltchy
July 21, 2014 10:41 am

I’m beginning to think a lot of people are being de-duped by the Dedupe vendors ;-). Hi Brent, great seeing and chatting to you at SQLBits last weekend. One of the things you mentioned was about not using snap manager back ups for NETAPP (we use NETAPP dedupe as well) but using the standard SQL Server transaction log backups on a more regular basis instead and snapping those to stop the IO freeze caused by the snap manager backups. Do you have any further info or links on the best way to achieve this?

Regards
Dave

Reply
- Brent Ozar
  July 21, 2014 10:46 am
  
  Howdy Dave! Yes, if you get the documentation from NetApp on SnapManager, they go into minute detail on exactly how to set this up. It’s the default method of setup for replication with their snaps, too. They have hundreds of pages of documentation on it.
  
  Reply
  - Wiltchy
    July 21, 2014 11:07 am
    
    Excellent stuff. Thank ya kindly pard!!! 🙂
    
    Reply
Thierry
September 5, 2014 5:00 am

Small question Brent. Regardless whether CPU affinity is good or evil, does SQL take CPU affinity into account when compressing backups?

Reply
- Brent Ozar
  September 5, 2014 10:06 am
  
  Thierry – I have no idea, hahaha. That certainly isn’t something I’d bother digging into. 😉 But if affinity masking is important to you, that would be a great test to run in your environment. Plus keep in mind that if you use any third party backup software (Litespeed, BackupExec, etc) all bets are off.
  
  Reply
Thierry
September 8, 2014 2:55 am

Ok, got it 🙂 Thanks Brent for taking the time to reply

Reply
Frank Vila
December 4, 2014 10:06 am

We used EMC’s Data Domain in my last company for various systems to back too but never removed SQL compression because we were not told too. We found out after the fact that Data Domain was still able to dedupe compressed SQL 2008 or 2008R2 backups that were not encrypted. The dedupe rate was not at the same level as uncompressed backups but close to 15x on 4 weeks of nightly full backups. That’s was on one server with a handful of SQL DBs so those numbers may not be the same for all environments.

Has anyone here tested Litespeed with no encryption to EMC’s Data Domain with a month’s worth of full backups? We are trying to find a way to keep as many backups on a centralized backup share without going to tape to avoid additional steps when refreshing lower environments to certain days (we do this often).

Reply
Rob T
March 12, 2015 9:51 pm

That’s the old way of thinking. Data Domain has been able to use DDBoost ‘based on the OST framework’ for a while now with NetBackup, NetWorker, Avamar, and now directly with SQL Enterprise Studio. This does the dedupe at the client, then compresses the unique data, and sends it to the Data Domain. So if you get a 10x dedupe, that’s about a 10x reduction in data sent over the wire (slightly less, because there is meta-data to account for). So you still get the benefits of compression, plus the benefit of dedupe, since you dedupe the native data before the bit patterns are altered by compression.

Reply
- Brent Ozar
  March 13, 2015 4:42 am
  
  Rob – awesome, any benchmark white papers you can share about that?
  
  Reply
Rob T
March 13, 2015 6:11 pm

I’ll see if I can dig some up. If not, I can share some of my benchmarks in an Oracle environment. The dedupe factor (governed by change rate) plays a major factor, since it’s effectively the change you are transmitting over the wire. As we all know, the marketing numbers are one thing, but the real world mileage may vary based on your particular change rate. Either way, it’s a major change in deduplication strategies – taking the dedupe up the stack closer (or at) the application. It also shifts the data protection capabilities from the storage or backup teams to the DBAs, which in my mind is a good thing – so they can backup and restore themselves, and manage their own retention and policies, but still take advantage of global dedupe pools. That’s where Data Domain really starts to shine, when you can have database in it’s native format, VM’s, in their native format, virtually any backup application, AS/400, Mainframe – all going to a single backup target that provides global deduplication for the whole environment. You can start to consolidate your data protection processes to a single target. Pretty swanky!

Reply
Rob T
March 13, 2015 6:14 pm

Here’s a link to a primer: https://www.youtube.com/watch?v=NTIgBZiS6D8

Reply
S.O.
April 15, 2015 7:55 am

Hi,

What a boring SQL Server admin discussion…:-D The article sounds like a complain: I was happy with my old SQL bakcup approach and now I am pushed to use something really bad: deduplication technology instead of SQL backup compression!

Have look at the stats in this article below and read. Is there anyone who still thinks SQL Server backup is slower when used with a good de-duplicating storage? DD BOOST for SQL makes a real difference. It takes half of the time when used with 10G network and more the 4 x better when on still most commonly used 1 Gigabit Ethernet.

http://www.storagereview.com/emc_data_domain_dd2200_review_dd_boost_for_sql

And by the way: Data Domain can do four important things for you (in the exact order): deduplication, compression, encryption for the data at rest (before writing it to disk) and replication (to the second DD – making it a DR solution).

Best regards,
S.O.

Reply
- Brent Ozar
  April 15, 2015 10:06 am
  
  S.O. – interesting how the Storage Review comparison didn’t mention using SQL Server backup compression. Wonder why that is? Hmm.
  
  Best regards,
  Brent
  
  Reply
Robert B
April 22, 2015 9:48 am

Brent,

is this still valid advice? If so, I have to start mounting the proof against this. My SAN admins are now pushing me to use EMC data domain for depude, uncompressed SQL Backups. 🙁

Reply
- Brent Ozar
  April 22, 2015 11:04 am
  
  Robert – so, obvious question: what do you think has changed about the post?
  
  Reply
  - Robert B
    April 22, 2015 11:55 am
    
    The underlying theory of your argument has not changed, just didn’t know if EMC Data Domain technology has changed in the 6 or so years since this post to warrant a revision. Thanks!
    
    Reply
saksqldba
April 23, 2015 11:13 am

Our storage folks are pushing for 100% use of symantec netbackup for ALL sql backups. And while I don’t think a “one solution fits all” is ever a realistic expectation – we (sql dbas) are trying to accommodate them because of the $$$$ storage cost reduction.

My real concern is the restore process. While we can make “adjustments” to get the backups completed and deduped to support this decision – it is much more difficult to explain to our end users why it is taking more time to do a database restore at 2 AM when a critical application is down.

Brent, this article, and all the posts, have really helped me understand the backup & and dedupe challenges and the how vendors are working to improve that (or not).

But am I correct in my understanding that the restore of a sql database – especially larger dbs – will be slower, perhaps even significantly, as a result?

Reply
- Brent Ozar
  April 23, 2015 11:15 am
  
  Saksqldba – great question. Why not run a test yourself in your development environments to see it in action with your own systems? That’ll really help explain it to your storage folks.
  
  Reply
Andrea Caldarone
May 5, 2015 5:12 am

Hello Brent,
EMC DataDomain DD Boost protocol does deduplication client side, i.e. before putting data on the wire.
A client for SQL Server exists, look at this demo video
https://www.youtube.com/watch?v=NTIgBZiS6D8

Reply
- Brent Ozar
  May 5, 2015 5:22 am
  
  Andrea – great, I’d love to hear from customers with real life experiences. (Don’t get me wrong, EMC sales links from GMail accounts are really interesting and all…)
  
  Reply
Lannie
June 5, 2015 3:24 pm

I Brent great post. We are investigating using EMC Avamar for backups right now. I ran some queries against the backup history/media set tables and found that on 50GB databases Avamar is between 24% to 75% slower (even running differentials during the week) than full Dell Litespeed backups. I am doing a write-up on this now and and was wondering if you have another source I could use to review the resource topology used between to the two types of backups (dedupe vs litespeed/native compression).

Kind regards,
Lannie

Reply
- Brent Ozar
  June 5, 2015 3:38 pm
  
  Lannie – thanks! Can you rephrase the question about the source a little?
  
  Reply
Lannie
June 5, 2015 3:52 pm

Hi again Brent,

I am looking for an article with a bit more detail on resource use differences between dedupe and native/litespeed backups. From your article and my own testing I can see that dedupe is slower (even when testing backups off server to share). Specifically an article that focuses on native/litespeed compression resource use.

Lannie

Reply
- Brent Ozar
  June 7, 2015 6:57 am
  
  Lannie – you’re not going to find an article on that because the two components involved change so quickly. Anything you’d write would be out of date within 90-120 days.
  
  Reply
Brett
July 2, 2015 5:12 am

Idea SQLsafe 8.0 has just been released, which now lets you backup and restore databases to and from EMC Data Domain server:

http://community.idera.com/blog/idera/ideras-sql-safe-backup-8-0-is-available-now/

Reply
Andrew
August 10, 2015 11:50 pm

Sometimes, there isn’t much choice and you have to decide dedupe/compression device or compressed backups.

I’ve had an incident where the only suitable storage we had for backups was a Quantum Dxi which did dedupe and compression. It was actually awesome and very fast but when some of our DBAs wrote compressed backups to this device, it caused problems because it couldn’t really dedupe effectively because of the compressed data coming in.

So these things can go both ways. It pays to keep this in mind. DBAs and Sys Admins need to understand each others’ needs and work together! One Team!

Reply
Lannie
August 11, 2015 11:59 am

Hi Brett,

I saw that Idera has a new tool and will be adding it to my wish list at the end of the year. I am working with EMC now, and like Andrew have really no choice when it comes to allowing the media retention processing gap to be filled by EMC for the Infrastucture Team. I do have to admit properly configured DDBoost with Avamar to a Data Domain is very fast.

However, Avamar does NOT have an automation offering for restores, so restores for refreshing Development and Test environments are still being performed with Litespeed (which breaks the LSN Chain forcing another initial Avamar backup (but again, I admit, it is FAST). So this little gap in the Avamar solution, means that I still need to set aside some local storage for backups and it does cost us in performance generating NON-Avamar backups to a local scratch drive.

If anyone has any other solution for automating refreshes to Dev and Test environments for systems with EMC Avamar backups, please let me know. And for those of you just facing this, it is easy to justify the storage just look at your restore history (it actually shocked me how many I do a year – thank you Dell litespeed for making restore automation to test so easy!).

Reply
- ScriptKiddie
  September 23, 2015 3:14 am
  
  Hi Lannie,
  
  Avamar DOES support automated restores, however not from the Admin console. The key is to combine the mccli command line tool, cron job scheduler and decent shell scripting.
  
  Reply
  - Lannie
    September 23, 2015 11:47 am
    
    Do you have an example of a script that grabs the backup label for the restore? E.g. Restore Last nights backup from prod to test instance.
    
    Reply
    - Brent Ozar
      September 23, 2015 2:42 pm
      
      Lannie – that’s kinda unrelated to the blog post here, so I’d probably post it over at http://DBA.StackExchange.com.
      
      Reply
James Wood
October 20, 2015 6:06 am

Hi Brent, There has not been much reference to using transportable off host snapshots. As an admin of a poor man’s SAN (by comparison to some of the contributors here in any case!) , we use backup exec to initiate hardware snapshots on our Equallogic Arrays. These read only LUNS are then transported to the backup exec server, mounted, and backed up locally from there. Apart from a bit of control traffic, it entirely removes all backup IO from the SQL box – confining traffic between the backup exec server and the array. Providing the array is man enough for the job, (and in our case it is) then this leads on… If I were to take quiescent vss snapshots of the SQL server, transport them to the backup server and then back up to deduped storage, presumably we then have the perfect scenario – uncompressed dedupe compliant sql data, without any overhead to the SQL boxes? PS love how this thread has run for 6 years – doing my bit to keep it alive and kicking!

Reply
- Brent Ozar
  October 20, 2015 6:09 am
  
  James – storage hardware snapshots may have their own deduplication, but that’s not the focus of this post. Thanks!
  
  Reply
  - James Wood
    October 28, 2015 3:42 am
    
    …but finding a non disruptive way to use dedupe technology in a sql environment surely is on topic. We needed to get our nightly backups off site. For those interested, I now take a VSS snapshot of SQL iSCSI LUNS, then mount the snapshots on our backup media server. Backing up 1.91 TB of uncompressed data files to local dedupe storage completes in 6hrs. This backup set is then duplicated off-site over 100mbps link to another backup server with dedupe storage attached. 1.91TB takes just over 3hrs (11,300MB/min). Try doing that without dedupe…
    
    Reply
Rodney Benjamin
December 10, 2015 5:50 am

Hi Brent
We are evaluating DD on our SAP servers of 9TB Oracle DB. Currently we backup the Physical Server PRD 9TB over Networker StorageNode(Fibre att) with LTO4 in 3Hours. Our QA Sap server running on VM does the backup over the Network in 6.5hours. But the restores/refresh of our QA and Sandpit servers from Production takes 6Hours to a physical server and 11 Hours to a VM server. We tested the DD on the VM servers and what was determined is that:
1. CPU’s ran 100% when doing the DD backup which affect nightly batchjobs.
2. Backup now ran for 9Hours vs 7Hours to Tape.
3. The Second Full backup to DD was not much better at 7hours – although less Network traffic.

I will evaluate what our restore times are but I am expecting it to be about 16hours.

For all intents and purposes it is great that a backup runs fast but to me it’s the restore that matters.

Reply
Salman
February 5, 2016 5:27 pm

Hello Brent, just wanted to make sure if this aricle is still valid because this is from 2009.

My company is planning to start using DataDomain with DDBoost.

Reply
- Brent Ozar
  February 5, 2016 5:30 pm
  
  Salman – read the comments.
  
  Reply
José
April 19, 2016 3:59 pm

Hey there guys,

Work with Avamar administration for three years and thought I would drop by some info here, hope it is of use to anybody.

As for the concerns on statistics that I read up there are dozens os reports on the Tools > Manage Reports section, also if you buy Avamar directly for EMC or you have a support contract with EMC they are able to perform several kinds of reports.

As for the automation you guys can try either mccli (shell commands are also usable since the Avamar Utility Node uses Suse still for appliance specific commands you will need to use mccli) or you can also install REST API in a Linux server and create several automated XMLs/jason scripts it has it’s learning curve like any integration solution still it works really well, as for the REST API please use the version 2.0 refrain from using the version 1.0 which has some limited features.

Another piece of info is that yes it takes a bit of processing to do the source dedup however in our experience it is really worth it vs transfering everything through the network, you can also do some research on using AvaDomain (Avamar utility grid and administration and the DataDomain repository).

And yes Avamar might not be suited for every SQL case but some cases that I read here just need tunning, keep in mind that not all backup concepts for other softwares work the same way with Avamar including some dedup related ones since it uses Variable-Length Deduplication, something worth noting is that Avamar is mostly thought to be used with virtualized envinronments and of course on that regard it is just incomparable to other backup solutions I’ve tried so far (didn’t try comvault so can’t say anything about it).

Reply
Simon
June 3, 2016 6:02 am

I think this article is missing a key benefit in regards to de-dupe appliances – getting the backups offsite in a timely manner.

– We have been using Quantum DXis (V1000/V4000 units) for a few years now and we have a very low bandwidth connection to our DR site. The benefits in the reduced data flow to the DR site has provided a massive boost in improving our RPO/RTO times for the business.
– I do think there is a key point in regards to the speed issue of backing up and recovering data uncompressed. This IS a bottleneck, but considering that we now have our midnight backups offsite by 7AM in the morning is a significant leap forward in terms of overall business recovery planning.

– It’s swings and roundabouts – if you have thin bandwidth offsite, these tools are incredibly useful as long as you can survive with the restore times.

– Another point – Quantum DXI V series appliances are worth looking at if you are SMB/midsize company, I like the cost point and management time savings these tools provide..
– Also, with the V series, you can control the disks that are used (it’s a VM product)… disk speed is one of the causes of slow restores, so changing these to SSDs may provide a significant boost to performance… (but this needs testing first).

Reply
- Brent Ozar
  June 3, 2016 6:57 am
  
  Simon – yes, as long as you’re okay with slower backups and restores, and if your goal is to get the backups offsite faster, then dedupe appliances are a great solution to that problem.
  
  Reply
Lannie
June 3, 2016 10:26 am

Hi Brent.

Of course I am not okay with a slower RTO; but there is nothing that the DBA can do about the offsite requirement. So, DBAs are put in the position of deciding between slightly faster backups and restores or always being able to recall a backup. EMC Data Domain dedupe provides me with 12 days of backups between my two remote sights in a timely manner, everyday. Until a better solution comes along for replacing tape, compressing and moving backup files to my DR facility. Dedupe with EMC Data Domain is an acceptable solution.
That being said, I treat EMC backups as tape backups.
For systems that have a lot of development work I also request scratch storage to do local native backups for automated refreshes of lower environments. So like all 3 party solutions to an RDBMS there are extra costs to consider before adopting. And for VLDBs with a lot of development, I use Litespeed for OLR.

Reply
Karl
August 25, 2016 9:12 am

LOL, seven years later, and the comments are still running strong! I have not seen an IT article in a while that not only garnered this much attention, but is still relevant this many years down the road.

I have been fortunate enough to land at a company that is in the process of cleaning up their wild-wild-west environment. We are looking at several initiatives, and one of them happens to be “modernizing” the database backup solution. Currently, there are some backups going to an Isolon storage, some to a DD solution (no DDBoost…) and still some going to a share. Cool thing about all of this, is that I get to work with multiple solutions to either prove or disprove cost vs. performance. I am moving forward with an open mind, and trying to keep my pre-conceived notions at bay. So far, I have heard that Isolon is “expensive” and that DD does not do “what they expected” (they being storage admins, the database admins already knew it wouldn’t buy them what they “expected”)

I think I am going to gather numbers at every turn, benchmarking each of the solutions, so that I can pass them along to my boss on cost-effectiveness of the processes vs storage vs man-hours (because failures happen). If you’re interested, I will share those numbers with you Brent, if you’re still interested in beating this dead horse 😉

Reply
- Brent Ozar
  August 25, 2016 10:19 am
  
  Karl – sure, you’re welcome to share the numbers here on the blog in the comments.
  
  Reply
Ken DeFilipps
November 17, 2016 7:56 am

Great article. However, I think you’re ignoring pieces of this technology. For example, using the EMC Datadomain device and deploying DDBOOST on your SQL servers, backup time and restore time can be significantly reduced, all while sending much less data across the wire. Of course there is a bit of a trade off on restores due to the fact that your data must be reconstituted at the datadomain device, but still, that trade off is well worth it. That same reconstitution of data exists in SQL compressed backups – they must be decompressed before they can be applied. Given the expansive growth of data these days and ever decreasing “off-hour windows”, DDUPING technology has been a God-sent, in my humble opinion.

Reply
- Brent Ozar
  November 17, 2016 8:52 am
  
  Ken – feel free to read the other comments. I’m eagerly awaiting real benchmark numbers rather than opinions and feelings. You could be the one to deliver!
  
  Reply
  - Ken
    November 17, 2016 9:02 am
    
    Sure. I’m in the middle of a major upgrade at the moment, but I can pull that together and post.
    
    Reply
- Fred Shope
  November 17, 2016 4:05 pm
  
  I guess I never followed up in comments here, but I did follow up with Data Domain testing that Brent requested, and sent in my results. I was surprised by how badly Data Domain’s SQL plugin (DDBoost) actually performs, times were about 3 times as bad.
  We have SQL in its own mtree, shared with about 30 servers on the non-prod side.
  For the test I did 3 iterations, and to get a “control speed” I did DataDomain, compressed native, and regular native. Tests were done with SQL 2008R2 running on Windows 2008R2, on VMware virtual machines.
  I did a backup of a small (100MB) database, then a restore using each method. Then I did a larger (10GB) database backup and restore.
  I logged the time to set up the restore in DDBoost’s GUI; so if you have some way to automate it via command-line it may be a little faster. But I separately logged the time it took to actually perform the operation.
  DataDomain lagged the others by about 3x, and even more when you add in the time to navigate the GUI and wait on a response from the mtree scan to populate the GUI.
  
  I’m looking for other solutions, but the company chose DataDomain before I was hired and has already eliminated most of the tape array hardware.
  
  Reply
  - Brent Ozar
    November 17, 2016 4:08 pm
    
    Fred – thanks for the followup. Yeah, in every client engagement I’ve seen, we were able to easily smoke DDBoost’s speed by doing plain old native SQL Server backups – especially when you talk about pulling a 1-2 week old backup, like Saturday’s full and all the log backups since.
    
    Reply
Chetan
November 21, 2016 11:42 am

I started evaluating DDBoost since last week. I saw similar issues that the duration of DDBoost is slower than native compressed backups (about 1 minute 32 seconds vs 9 minute 44 seconds). Did a couple more rounds of testing since I thought the dedupe feature would really kick in on the subsequent backups and reduce the duration. However, I still saw 1 minute 21 seconds vs 2 minute 34 seconds.

SAN Admin mentioned that they see 71% compression ratios though with DDBoost.

So I tested again by increasing number of stripes to 4 and DDboost backup on same database completed in 53 seconds only. Tested again with default setting and went back up to 2 minutes 53 seconds. I guess increasing the number of stripes could be the deal here.

I am still worried about redundancy. SAN admins suggest replication of data domain will reduce the chance of a single point of failure.

Reply
- Fred Shope
  November 21, 2016 12:15 pm
  
  Redundancy by replicating between DataDomain systems fairly good, it’s one of the features that doesn’t take a lot of careful planning. We replicate bi-directional between our main site and DR off-site; that way the data can be restored to whichever site needs it.
  Compression/deduplication is generally good with DataDomain, our SQL environments have about 16-40x space reduction (but server image backups are about 320x, for comparison).
  My biggest complaint is the time it takes the DDBoost GUI tool to populate backup sets, it seems like it’s querying absolutely every saveset in the mtree, which could be anywhere from a few hundred to several hundred thousand in my case, which makes it take upwards of a minute. I noticed it does not cache any of the savesets it pulls, so if you test restoring the same recent saveset 3 times, you’ll wait the same total time to navigate the GUI and perform the restore each time. I’m sure it’s a “feature” of how mtrees work; but the system is already doing some very complicated low-level block indexing to enable the deduplication, I wouldn’t think it’s impossible to add some logic or even some caching that speeds up the population of the saveset GUI.
  Then again, maybe it’s how we’re using it; we could have significantly more backup sets than the average system, I’m not sure how carefully they spec’d this before I came on board. EMC support has mentioned a few times that we’re doing non-standard things.
  
  Reply
  - Ken
    November 21, 2016 1:04 pm
    
    I don’t believe it is the way you are using it. If you take a single backup, then query the device using ddbmexptool, you will see that even one backup MAY have many, many pieces.
    
    Reply
  - BradC
    February 15, 2017 10:28 am
    
    Fred. They seem to have fixed the “populate the backup GUI” issue in the lastest DDBoost agent version, v3.5. We had lots of issues there, too. They fixed the “hang forever when starting a tran log backup” issue we were seeing as well.
    
    I can concur with Chetan’s observation that upping the stripe count on backups really is necessary to get competitive backup and restore times from the DDBoost Agent. You can stripe a native backup, too (using multiple FILE = ), but hardly anyone does that.
    
    Reply
- Ken
  November 21, 2016 1:06 pm
  
  Something to keep in mind – the hardware on your device. How many ports, are they all 10 gig ports. Biggest mistake I’ve seen is inadvertently running your backup/restore over the 1g admin port.
  
  Reply
BradC
February 15, 2017 12:21 pm

I’m not going to reproduce it all here, but I posted a DBA.SE answer about DDBoost that includes some of those long-awaited “actual test results”: http://dba.stackexchange.com/a/164449/157

All the normal caveats apply, of course (every environment is different, results may vary, do not taunt happy fun ball), but the bottom line for us was that the DDBoost backup agent (with file striping enabled) was slightly slower than pure native compressed backups, but that the trade-off of space savings (especially over time, and across environments with similar copies of the same large dbs) was worth it.

Also, the latest version of the DDBoost agent (v3.5) supports compressed restores as well, which may reduce restore times even further.

Reply
- Brent Ozar
  February 15, 2017 12:28 pm
  
  Brad – great job on the benchmarks! Those are great.
  
  But slightly slower? Holy cow, the restores are 3x-4x slower. Sure, it saves space on the backups, but that can be a real killer for RTO.
  
  Reply
  - BradC
    February 15, 2017 12:35 pm
    
    True, although in our case our base comparison (the “before”) was single-stripe SQL native uncompressed, so pretty much anything was an improvement over that (as long as we striped the backup).
    
    If I get around to benchmarking the new “compressed restore” feature in v3.5, I’ll post those results as well.
    
    Reply
    - Brent Ozar
      February 15, 2017 2:02 pm
      
      Hahaha, that’s awesome. Reminds me of a case where someone had turned on synchronous mirroring years ago. We found out that they didn’t need sync mirroring, switched to async, and the system was suddenly so stunningly fast that they could add all kinds of new query overhead and it still seemed fast, hahaha.
      
      Reply
    - Karl Lambert
      June 21, 2017 8:56 am
      
      We are doing that testing over the next few days. Making a direct comparison between 3rd party tools. (DDBMA & SQLSafe) According to the EMC folks, this *should* make restore times quicker. Then again, we asked what happens to the backup files on a three-day retention, and could not get a straight answer. i.e., DDBoost works because it backs up Delta. What happens when that first backup expires in four days? Does it reconstitute the third day backup? Is that something DD handles on the back end? Oh so many questions left to be answered. They did make the promise though, that 3.5 could do compressed restores. We will see about that 🙂
      
      Reply
      - BradC
        September 14, 2017 1:26 pm
        
        Karl- First of all, just because a backup has *expired* doesn’t mean it has actually been *deleted* from the backup appliance. Ask me how I know… after running for a year…
        
        You have to set up scheduled runs of their “expiry deletion tool”, emc_run_delete. Even this doesn’t FULLY delete them, it simply marks them for deletion, and they are cleaned up via the nightly scheduled “cleanup” routine that runs on the appliance. However this works internally, I’m sure it takes care of all those cross-references and stuff for de-duped blocks that are no longer needed.
Rodney Benjamin
February 16, 2017 10:48 am

Update – We eventually went with Data_Domain with some additions like adding a San-Node (fibre card) and extra disk spindles. Backups now very quick 10TB at 2 hours and restores around 7 hours. Also means less admin – tapes.

Reply
White Limestone
May 16, 2017 7:51 am

Hi Brent,
this article is very helpful.
Is this article still applicable to data domain and deduping? My SAN people think technology have evolved and it is no longer valid. I would appreciate your input.

Thanks,

Reply
- Brent Ozar
  May 16, 2017 7:53 am
  
  I would ask them what’s changed.
  
  Reply
MIke Ruth
May 18, 2017 8:56 am

If we compress the backups , what technologies we can use to replicate the backup offsite ? Any one of you have any good suggestions to replicate the backups efficiently to offsite. Although compression provides faster restore rate but replication to offsite is also mandatory.

Reply
- Erik Darling
  May 18, 2017 9:09 am
  
  “Mike” — for questions, head over to https://dba.stackexchange.com/questions
  
  Thanks!
  
  Reply
PatTheDBA
June 21, 2017 12:33 am

Wow! 8 years later and this topic is still hot.

To make a long story short, 5 years ago, we tried using Quantum Dxi Dedup Appliance with our SQL Server compressed backups and, as many noted: Dedup and SQL Server compressed backups didn’t make good friends and the space saving was not there at all. So we went back to backing up our databases on local disks with native SQL Server compressed backups and then copying them to tapes using NetBackup. We still use the DXi for all other files (word documents, excel files, etc.) with great space saving.

The “Dedup” vs “compressed/uncompressed SQL backup” discussion is back at our company. Reading this Blog and comments was really helpful. But since we don’t have access to equipments mentioned here (DataDomain, Avamar, etc.), I was wondering if any of you had tested and/or would comment on these thoughts since nobody mentionned them before:

1- Dedup reacts badly to data movements in databases, (for instance, after reindexing, page splits, etc.) even using uncompressed backups
2- Dedup reacts badly to encrypted data (especially TDE) in databases even using uncompressed backups

I think these 2 points should be added to the fact that “Dedupe is a bad idea with SQL Server backups”

Any comments?

Thanks

Reply
- Brent Ozar
  June 21, 2017 7:22 am
  
  Yep, agreed!
  
  Reply
Igor
July 17, 2017 10:42 am

Actually, uncompressed backups are a lot faster, if you transfer them with rsync, also, you save a lot more space, if you are backing up uncompressed backups, using “bup”.
I specifically had to uncompress all of my old compressed backups, in order to backup with bup. It saves so much of space, while keeping all of the history.

Reply
Ryan Malayter
October 6, 2017 9:50 am

There is such a thing as “dedupe-friendly compression”, basically you have the compressor reset its dictionary state based on some interval in the source data (either with a fixed number of bytes or a rolling checksum). See, for example, gzip’s –rsyncable option. This hurts your compression ratio, but only by a few percent in most data sources.

We back up our SQL 2012+ servers with native SQL backup compression to Win 2012R2 NAS with its built-in post-process dedupe turned on. We see a fairly massive (10:1) dedupe ratio when keeping 30 days of full backups of 2+ TB of SQL databases.

What this tells me is that SQL Server’s compressed backup format is at least *somewhat* dedupe-friendly. I’m betting it resets the compression dictionary every X pages or something (which is what allows for striping backups).

Obviously the deduplication on the backup target can’t compress blocks that are already compressed, but it can dedupe amongst them if they are the same from backup-to-backup.

One other thing, if you’re rebuilding indexes regularly, that will kill deduplication ratios, as all the backed up pages are basically unique after a rebuild. We use SSD’s and index rebuilds simply aren’t necessary as we update stats often.

Reply
- Ryan Malayter
  October 6, 2017 1:08 pm
  
  I now believe the above information I posted above is incorrect. I ran a test against two consecutive full SQL-native compressed backups of a very hot 70GB OLTP SQL 2014 application, and the test showed 0% savings. So it appears MSFT does *not* use a “deduplication-friendly” compression scheme with native SQL Server backups.
  Further investigation revealed that some of the systems were sending uncompressed backups to the deduplicated Win2012R2 NAS, which accounted for the large 10:1 deduplication ratio reported by Get-DedupStatus.
  
  So always test your dedup situation on *real* data!
  
  Here’s the output of my test using the DDPEVAL.EXE tool:
  C:\>ddpeval.exe g:\ddpeval_test /V
  Data Deduplication Savings Evaluation Tool Copyright (c) 2012 Microsoft Corporation. All Rights Reserved.
  Evaluated folder: g:\ddpeval_test
  Evaluated folder size: 19.90 GB
  Files in evaluated folder: 2
  Processed files: 2
  Processed files size: 19.90 GB
  Optimized files size: 19.75 GB
  Space savings: 143.48 MB
  Space savings percent: 0
  Optimized files size (no compression): 19.89 GB
  Space savings (no compression): 6.66 MB
  Space savings percent (no compression): 0
  Files excluded by policy: 0
  Files excluded by error: 0
  
  Reply
  - Brent Ozar
    October 6, 2017 1:09 pm
    
    Yep, that’s why in so many of these comments, you’ll hear me chant a very simple mantra: show me the tests.
    
    Thanks for putting in that legwork, and glad I could help open your eyes on that one! 😀
    
    Reply
    - Ryan Malayter
      October 11, 2017 5:06 pm
      
      Of course, MSFT could trivially add “dedupe-friendly compression” to the SQL backup format, probably without breaking any backwards compatibility (just like gzip –rsyncableproduces a valid GZIP file readable by anything). The only thing they need to tweak is the dictionary state during the backup.
      
      This works well for gzipped mySQL backupfiles and the tarsnap deduplication backup product according to this arcticle https://therub.org/2010/11/08/compression-can-play-nice-with-deduplication-and-rsync/
      
      Reply
Ted Locke
October 30, 2017 7:36 am

I did a lot of work with DDBoost at my last company. Setting it up was a big PITA, but we figured out how to modify Ola’s scripts to run the DDBoost agent. Once we did that we figured out all we had to do was install the base software and then create the SQL Jobs using the modified Ola’s scripts to then backup our databases. Another thing I built was a stored procedure that would create the restore scripts for us. This way we cut the amount of clicks down to a minimum. We created another sp that we run once a month to test all of the database backups (because your backups are only as good as your last restore). Between these three scripts we did simplify the backup and restore process (reach out if you would like to see what we did) but even then we found two major glitches with the DD Software.
If you do not have these two particular issues, you will be fine overall, but if you do there are workarounds that add time to roll out. The first problem, if the database has over 90 files (we had one that was over 250 due to partitioning and file groups), the scripts and GUI software will not restore the database to a new location. The problem with the software is that they feed everything from SQL into cmdshell which only allows up to so many characters within the command before it fails. The workaround for this was to extract the BAK flat file from DataDomain to a share and then restore from the share to your new location. This doubled your restore time due to having to extract the flat file (1st restore) and then running the actual SQL restore (2nd restore). The other interesting issue was dealing with a SharePoint Search database and the length of the actual file names. By default when pesky SharePoint creates a database it adds a GUID to the end of the name of files. Even though Windows and SQL have no issue with them, DDBoost Agent does, work around for this is to truncate the filenames when you are restoring them.
Last I heard, EMC does not have a fix for these two issues, and from what we were told, we were doing things wrong, so we had to fix them. My response was, “Oh so this is like Apple telling us we were holding the phone wrong?” Their support staff realized I knew that their response was wrong and then stated they’d look into it. Still haven’t seen it fixed yet.
So there are my two cents on DDBoost… 🙂

Reply
- Brent Ozar
  October 31, 2017 5:47 am
  
  Ted – wow, thanks for the detailed comment! I really appreciate you sharing your experience here for others to see.
  
  Reply
- Deborah
  November 17, 2017 1:39 pm
  
  @Brent – restore more than 90 datafiles
  1. This is actually a limitation of windows. When DDBMA creates the CLI, it is limited by the length of line that can be used. So it isn’t really 90 datafiles but the command can’t be more than 8191 characters and more than 90 datafiles would certainly reach that limit if you have a big path.
  2. In V4 DDBMA added in the –H command which fixes this issue.
  
  -H “‘source_path’;’destination_path'” Relocates file paths and all files contained in a path to a different location during a restore operation. The -H command option can be used to relocate multiple file paths in the same command line. The -H option is supported on standard and redirected restores, which includes the following: l Normal restore—Same server and same instance. l Different instance restore—Same server and different instance. l Restore to different database file. l Different server restore. To relocate individual files, see the table entry for the -C option. You cannot use the -C option with -H.
  
  Reply
- Nate Arnold
  February 8, 2018 7:04 am
  
  Ted,
  We’re currently testing DataDomain and would very much like to see how to modify Ola Hallengren’s backup solution to fit with DDBoost. Ola’s stuff has become integral for us, and I’d very much like to keep using it with DD.
  
  Reply
- Kyle Reyes
  February 28, 2018 1:11 pm
  
  Ted,
  Thanks for all of this information! I am in the process of migrating our backups over to an Avamar/DD setup. We currently use the Ola scripts in our environment and would be very interested if you’d be willing to share what you have to done so far to get things working. Feel free to shoot me an email directly if you have a moment.
  
  Thanks!
  Kyle Reyes
  kreyes@elandersamericas.com
  
  Reply
  - Nate Arnold
    May 30, 2018 10:51 am
    
    Kyle/Ted/All,
    I just checked Ola’s backup solution changelog and he added support for DD Boost on May 10 2018. Time to upgrade!!!!
    
    Reply
- Tom Zu
  June 21, 2021 6:27 pm
  
  Hi Ted, could you share the restore db script from DDBoost?
  
  Reply
Student4Life
November 7, 2017 3:41 pm

Also one thing we noticed that some DBAs might not have is if you use the DDBoost SSMS Add-in to restore a FULL database backup, the timestamp of the backup shown in the SSMS Add-In GUI is the time when the backup process STARTED the FULL database backup and NOT the point in time that database will be once the FULL restore is complete… For example, if you start a full database backup at 6:00PM and it takes 2 hours to complete the FULL backup, the time stamp of the backup in the DDBoost GUI will be 6:00pm rather than 8:00PM. We found that out when we tried to apply native SQL Tlog backups to a restored FULL taken by DDBoost. We were only able to apply native SQL tlog backups that took place AFTER 8:00pm… Anyway something else for the community to test and confirm regarding the use of DDBoost. And I echo Ted Locke’s statement regarding PITA to setup… Oh and wait until you have to do a DDBoost upgrade… Make sure the information in the Lockbox does not disappear during that process otherwise you might be re-entering that information again on EACH database server you upgrade… again PITA…

Reply
- Ted Locke
  November 8, 2017 2:20 pm
  
  Every time I heard the DDBMA term “Lockbox”, I always chuckled to myself and thought of the SNL sketch from the 2000 election of Al Gore’s comments on his “Lockbox”. http://www.nbc.com/saturday-night-live/video/cold-opening-gore–bush-first-debate/n11360?snl=1 (look at the 7:41 mark).
  
  Reply
Deborah
November 17, 2017 1:46 pm

@Student4Life – eCDM (Enterprise Copy Data Manager) from Dell can centrally manage the deployment of the lockbox. Typically the lockbox will not disappear when you do an upgrade. Between V2 and V3, DDBMA did have the directory structure change and it is maybe there that the lockbox went missing because it was relocated. I haven’t found this issue going from V3 to V3.5 to V4 to V4.5. The directory stays the same and the upgrade works without losing the lockbox.

Reply
Simon Cho
June 6, 2018 7:29 pm

Thank you, Brent. It is a great article.

I’m also not a big fan of Dedupe technology for DB server “yet”.
DeDupe is one of great technologied to reduce size.
But, it has a overhead of segment calculation and “licensing”.

In my opinion, the bottom line is that Native compression option shouldn’t be turned off since It give us lots of benefits not just storage space but also operation cost.
Typically, DBA copies the backup file to many other location due to many reasons.
However, DeDeup typically doesn’t work well with compressed DB backup.

To be able to get more benefit in DeDupe, 3 things are required in my opinion.

1. Regular DB compression should create same binary for same data.
– Based on my testing, it doesn’t do that. Because of that, white pagers of DeDupe vendors do not recommand turn on native compression option.

2. DeDupe should have an option for only take care old backup only.
– Typically, we looked for the recent backup file such as a few day ago.
– So, it should be turned off for recently backup files due to better RTO, and safty reason.
– We can set it up by mix and use with another technologies. It just makes more complicated and more maintenance.

3. Licesing model should be changed.
– Typically, DeDupe related products are required license per before compression size of backup instead of actual storage size of hardware.
– I don’t think this is right approach since the compression ratio is not garanteed by all difference technologies.

Just one more additional, it should give a free maintenance window such as cleanup dedupe segment and e.t.c.
Once backup storage goes in maintenance window, many of systems are affected in my humble experience.

Reply
ekonagu
March 6, 2019 9:48 pm

Going through the comments. it’s interesting and i think its the perfect place for asking something I am facing now.
can we consider Dell EMC IDPA is a replacement for SQL Server AlwaysOn or other HA DR Techniques !!! If there is a backup appliance which does all these at server level do we need the SQL Server Features ??

Reply
- Brent Ozar
  March 7, 2019 1:20 pm
  
  Ekonagu – for questions, head to a Q&A site or forum.
  
  Reply
  - ekonagu
    March 8, 2019 3:35 am
    
    ok. i got the answer 🙁
    
    Reply
Marcy Ashley-SellecK
December 5, 2019 1:52 pm

Hi Brent!
We are starting to use Data Domain in our corporate IT environment and so it is coming to a Finance IT theater near us soon. Since we are using Log Shipping for Dev and Prod, our Log Shipping t-log backups are already handled. Should we change those jobs to write to the DD appliance, or should we have the appliance archive the t-logs over to the dupe storage (maybe it’s same difference)?
TIA!

Reply
- Brent Ozar
  December 5, 2019 2:06 pm
  
  Ooo, that’s a great question that would involve a lot of back and forth analysis. Here are the kinds of questions I’d ask:
  
  * What’s the server’s RPO/RTO for problems like a dropped table?
  * Can we meet those with where the log files are stored today?
  * If we do them to the DD appliance, can we still meet the RPO/RTO?
  * Are we using TDE to encrypt the backups, and if so, how’s the DD appliance’s compression rate with that?
  
  Hope that helps! You don’t have to post the answers here – if you want my help with it, this would be more of a consulting engagement. Just giving you a few questions to think through for starters for free, heh.
  
  Reply
  - Marcy Ashley-Selleck
    December 5, 2019 2:19 pm
    
    Thanks Brent, this is perfect. With the conversations that have happened already, my preference would be that if DD impacts RPO/RTO at all, it should not even be part of the game plan for DR for SQL; that said, it’s mildly helpful for routine database refresh processes or refreshing the databases on the Dev DR box.
    We’ve had months of testing and we needed some help on this piece before using it in Production. We’re all nearly convinced it’s not worth it, so discussing these questions should get us to the answer. Thank you!
    
    Reply
Sofia Lindvert
June 3, 2021 10:02 am

Is this still an issue with later SQL Server versions like SQL Server 2019?

Reply
- Brent Ozar
  June 3, 2021 10:03 am
  
  Yes.
  
  Reply
Mirza
September 3, 2021 1:45 pm

We use Commvault backups that do deduplication. We are told not to turn on SQL Server native backup compression. The time for restores takes much longer, beyond RTO. This article explains well where the bottleneck is. Thanks for the information.

Reply
Steffan
October 19, 2021 11:16 pm

I’ve been a backup administrator for decades.
Had a stint where the storage device was a Data Domain. We used it via NFS mounted filesystems from TSM/Spectrum Protect. Worked very well but the customer ended-up disappointed because of the cost of expanding the Data Domain and just before I was laid off (everything was moved to India) the Data Domains were running out of space and I was trying to get disk to use for a compressed disk pool.
I did see that from the deduplication metrics only full database backups got good compression. Literally nothing else.
Now I’m at a site where we use LTO8 and deduplicating storage but now using Spectrum Protect deduplication using Directory Container Pools. All working well. But now with the ransomware threats management is running scared and bought Rubrik. Fortunately it is even faster then Spectrum Protect at backup and restore. We have a 26 TB Epic database and a 36 TB Clarity on Oracle database and Rubrik (with only one cluster but few clients) set records probably 70% or better than Spectrum Protect for Backup and Restore.

Reply
Vamp898
January 24, 2023 11:23 am

We just use out of band deduplication, works fine.

There is now difference in in backup or restore, the dedup happens after the backup only.

Reply