Blog

Right now, your transaction log backups are set up to run every 15 minutes, every hour, or maybe every few hours.

It’s time for a reality check. Figure out what internal customer owns the data inside the database, and walk into their office. Ask them how much data they’re comfortable losing.

You’re probably going to have to repeat the question a couple of times because they won’t believe these words are coming out of your mouth. After they eventually recover, they’ll blurt out words like “NONE” and “ZERO” and “ZOMG”. Give them a quick overview of how much it costs to have a no-data-loss system – typically in the millions of dollars – and then ask for another answer.

They’re going to ask for a list of options and their costs. Forget the fine-grained details for now – let’s just assume one of your solutions involves transaction log backups.

The big hand is on the network cable, and the little hand is on the power cable.

The big hand is on the network cable, and the little hand is on the power cable.

Is it more expensive to run your log backups:

  • A – Every hour
  • B – Every 15 minutes
  • C – Every 5 minutes
  • D – Every minute

It’s a trick question – they all cost the same.

“BUT MORE LOG BACKUPS MEAN MORE SLOWDOWNS!”

Nope – it actually means less overhead. To keep the numbers easy, let’s say you generate 60GB of transaction log data per hour. Would you rather back up 1GB per minute in small chunks, or kick the server in the junk every hour trying to back up 60GB all at once? The users would definitely notice the latter – the server will slow to a crawl while it handles that firehose of log data for a sustained period.

“BUT MORE LOG BACKUPS MEAN MORE FILES!”

Yes, but if you think you’re really going to manually restore log backups through the GUI, one at a time, you haven’t been through a real mission-critical crash. Long before disaster strikes, you need to be familiar with a script-based method to automatically restore all of the log files in a directory. Test it, get comfortable with it, and then when the outage hits, you can let your script restore everything ASAP, no clicks required.

“BUT MY LOG BACKUP JOBS CAN’T FINISH IN A MINUTE!”

If you have too many databases and your jobs can’t keep up, it’s time to start breaking up the log backups into multiple jobs. Some of my clients use two jobs, for example – one to back up all odd-numbered databases by the database ID in sys.databases, and another one to back up all the evens. The log backups don’t stress out their servers by any means, and users don’t notice the impact of two simultaneous log backups in two different databases.

“BUT SURELY YOU CAN’T BE SERIOUS!”

I’m just as serious as you were when you randomly picked 15 minutes or 1 hour as the log backup schedule. The point is that it’s not for the geeks to decide – it’s for the customer to decide. Ask your business user about the business needs for their data, and then give them a menu of options to meet that goal.

You think they’re all going to insist that all data is mission-critical, but you’d be surprised. Once they see the costs involved with log shipping, AlwaysOn Availability Groups, SAN replication, and so forth, they’re able to make better informed decisions about which data really is priceless, and which data we could really re-enter at minimal cost.

But until you ask them this question, they assume you’re never going to lose data.

↑ Back to top
  1. What happen with all “VSS” freezing points ? Will this impact performance ?
    What happen if I got 200 db´s on one instance in one job?

    Thanks for advice, Magnus

  2. Do you have suggestions on helping “BUT MORE LOG BACKUPS MEAN MORE FILES!”. I have tried to build an automated restore routine based on backups in a folder and have failed. Anything out there you would recommend looking at?

  3. Interesting view, and I do think these days that an automated restore routine is important, it’s also potentially an issue in alerting/monitoring if you have logs every minute. You need to know if you have issues, like a full drive, but you also don’t want to be inundated with alerts. Potentially that would lead someone to turn off alerts for the time it might take them to respond, and not turn them on.

    I also think there could be an issue here with the size of msdb and logging activity if you’re a person that backs a full once a week since you need to keep these log files back to the earliest full.

    I’m not sure I agree, but I’m not sure I disagree. Thanks, Brent, for something to think about.

    Garry: http://www.sqlservercentral.com/scripts/T-Log/69750/

    • Steve – rather than alerting on the jobs, I typically want to flip it around and say, “Give me an alert when the number of databases with >2 minutes of data loss exposure is >0.” That way I have a little bit of flex time, and I only get one alert.

    • Any one-off Transaction log backup failure is not that critical when next one picks up with the backup-chain not broken. I have a script monitoring solution from CMS to show you at SQL PASS Summit 2014 that you can schedule at any interval for notification down to specific db level.

      • If you backup transaction log at every minute, you sure not want to be paged for every minute backup failure and on every database. If the next minute the backup transaction log is successful, that last one wasn’t immediately critical. You might want to look into the root cause when dawn breaks. I got a a solution with CMS and using powershell nd T-SQL to manage enterprise databases to show you at SQL PASS Summit 2014 if I get there.

  4. A question & statement of immeasurable value:
    “Ask your business user about the business needs for their data…”
    “But until you ask them this question, they assume you’re never going to lose data.”

    You mention the cost of various replication strategies, and how the cost can inform decisions about the value of data. Very good advice. The cost of a given replication strategy can extend to significant changes in hardware, data layout, batch workload scheduling, and even application level code/design. Until all the pieces are put together, its hard to know.

    Sometimes discussions of data business value to shape recovery objectives can be tough – my people skills can be stretched in those discussions. But I greatly prefer having those discussions before a data loss crisis – or before a performance crisis in which replication is a contributor – rather than during a crisis, or even in post-mortem.

  5. On the “BUT MORE LOG BACKUPS MEAN MORE FILES!” and “if you think you’re really going to manually restore log backups through the GUI, one at a time”, here is what we are currently doing:

    Perform a full backup of the database overnight to a device.
    Backup log files throughout the day, appending to the backup device holding last night’s full database backup.
    So, if a restore to a point in time is required, through the SSMS GUI, it is easy to view and select any or all log backups to restore through (up to the desired point in time). We don’t have to pull up each log backup file through the GUI.

    Of note, we are not dealing with extremely large databases. Sizes range from MB’s to 250 GB.

    Since “BUT MORE LOG BACKUPS MEAN MORE FILES!” implies to me that each log backup (hourly, minutely, etc.) is going to a unique file name, I am wondering if this is the recommended method (as opposed to our current method) for log backups.

    Hope I’m clear on this.

    Thanks Brent and everyone there! We check out brentozar.com daily.

    • Pat – two big problems with that. First, if you want to sweep the backup files off to tape, it’s much more challenging if they’re currently in use. Second, if something goes wrong during a backup and the file becomes corrupt, all of your backups can be corrupt.

    • Hi, Pat, the backup info is in the MSDB that can be queried (and restore script automatically generated, saved as file) from CMS and using Pwershell and T-SQL for the enterprise database management of hundreds of SQL Servers including notification and audit. That is what I plan to show at SQL PASS Summit 2014.

      • This is the short end of SQL Server because there is no centralized backup catalog like Oracle RMAN can use catalog. Maybe SQL Server can improve on this in the future in order to be a big enterprise player. In the meantime, CMS and scripts will do for enterprise. Don’t pay for expensive backup software.

  6. Good food for thought, but there is a considerable cost/benefit, risk/reward analysis I would have to do to change my hourly schedule now, to include asking the business. I realize there’s always a risk to having a failure – even with clusters and high-tech SANs, but the risk is so minuscule (once in my 15-year career).

    We only put the local backups to tape once/week (keeping in mind the backup drive is presented from a SAN at another location) so we have bigger fish to fry in the near term ;-)

    • Allen – you must be a lucky charm! How often have you had car accidents, and have you thought about canceling your car insurance? No sense in paying for it if you don’t need it.

      Life insurance too – might as well cancel that, because you’ve never died, right?

      Have a good one. ;-)

    • I would ask myself how much data would I like to loose, and then ask the data owner the same question. It’s like the question about physically separate the datafiles from the logfiles.
      Is it a Performance reason? Yes, but primarly for disaster recovery.

      Hey Brent, you keep up the good work! This couldn’t be stressed less.

  7. Would you consider synchronous mirroring/AG a zero-loss solution?

  8. Hi Brent, thanks for that article. I have to say that the 1 hour log backup scheduling has been picked up randomly here too. Or at least we inherit from it. My biggest concern here is that TSM (IBM Tivoli) is used to perform backups in our environment and secure them and ship the tapes off-site, etc… but we do not have any control of it. We can ask for a restore but I doubt I can ask for something like “can you please take last night full + the 600 log backups that were taken after it?” to the TSM admin. But granted I am sure TSM can be automated too but I doubt any one in charge knows how to do that…

    • Benjamin, where I work, our Tivoli solution has a nice web-app front end that resembles a folder structure. It’s not practically more difficult to select 600 files, than it is to select 6.

      Before they gave us R_W access, we had Read access, and could at least give them a screenshot of where the files live.

  9. Pingback: Visual Studio IDE Enhancements - The Daily Six Pack: February 28, 2014

  10. Hi Brent.
    As usually, nice post.

    I’m facing a server with more than 150 DDBBs, all of them in full recovery model.

    I have developed an T-SQL that checks the actual size used by each db and launch the log backup based on the size of the log.
    What do you think about this approach?

    It allows us to have more backups for the heavy used DBs and less files and backups for the rest.
    So in some way I’m measuring the amount of data we can loose in Mb, not in time.

    Greetings from Spain and congratulations for your site, its a great resource for the SQL Server mere mortals.

    • Jorge – thanks, glad you like the site.

      I would ask the business if they’re okay with it, and if they’re okay with databases being out of sync when you restore. Databases will be at different times, and that’s usually not okay for the business.

  11. Pingback: (SFTW) SQL Server Links 28/02/14 - SQL Server - SQL Server - Toad World

  12. On the “BUT MORE LOG BACKUPS MEAN MORE FILES!”, in a log shipping scenerio I find that the restore process takes 20-30 seconds to process even the smallest TRN file that has no actual data changes, but only the relevant markers. So a restore process takes 2 minutes every hour even if there is no activity in the DB during that hour. If my math is correct, that would work out to 30 minutes of “restores” every hour, even when there’s nothing to restore. This pretty much makes the Warehouse unusable if we prevent user connections while performing the restores.

  13. I was waiting to see this post drop since the Dell backup webcast! Good stuff. Although I have to ask…are there any figures, or example tests with figures available to support the first assertion, saying that there is less overhead backing up more often? Not sure what the best measurement of that would be…even just cumulative job time over an hour with either 4 backups at 15 min or 60 backups at 1min, on a busy system. I get the idea that one big long log backup is more likely noticeable to the users than a constant stream of small ones, that makes sense, just wondering if people have done the testing…so I don’t have to ;)…to establish how exactly the performance changes as you change the increment…and in which direction!

    Of course, your point about RPO is well taken. Luckily our business has sometimes proposed some laughably lenient expectations on RPO, but if your system can handle it easily, every 5 min or 1 min, I suppose sounds like a good idea. Regarding the many-files issue, if you aren’t employing some automation for tlog restores its time to get scripting, because doing it manually just sounds terribly laborious and prone to user error (with the CIO tapping his watch behind you).

    • Nick – thanks, glad you liked the post. Like you mentioned, it’s tough to get a single set of measurements that will make everyone happy, but cumulative job time is a great start. If you run those tests, make sure to clean out your MSDB backup history first as well – updating that can affect job times.

  14. Hi Brent ,
    Question regard log shipping with readable mode when back up log every minute . in standard log shipping for long transaction each trn file will be restored and rollback until the transaction completed (i.e when rebuild one index than take 5 minutes – it means that the log shipping will restore the first file 5 time and rollback it 4 time , the second file restore 4 time and rollback 3 time etc’ ).

    We can work around it by customized script to restore log shipping only every X minutes ( 15 ? ) and rollback only open transaction in the last restore trn file .

    Any recommendation ?

    • Eyal – log shipping with readable mode is for running reports on the log shipped subscribers, not disaster recovery. Those two things aren’t compatible. If you need a reporting server, build a reporting server – you won’t want to do restores every minute on that box.

      • Hi Brent ,
        I need both reporting server and disaster recover . Do you have any suggestion for the report server to avoid the multiple log shipping restore and rollback when apply the every minutes log ?

        Thanks Eyal

        • Eyal – sure, the backup schedule has nothing to do with the restore schedule. You can just do restores once per day if you want as long as the backup files are available on a central server that isn’t affected by the main server going down.

          • Brent ,
            But the issue is the multiple restores and rollbacks of any trn file that cause by the frequency of the backup vs. long transaction .

            For example for transaction of Index rebuild that take 5 minutes the log shipping will restore the first file 5 time and rollback it 4 time , the second file restore 4 time and rollback 3 time etc’.

          • Eyal – I think you’re confusing how rollbacks work with how restores work. If you don’t run RESTORE WITH RECOVERY, then the rollbacks aren’t going to happen multiple times.

  15. Can anyone recommend an automated restore script that would work well with the default folder structure and file names used by Ola Hallengren’s Maintenance Solution script in cases where the backup info in msdb and the Ola’s commandlog table is gone? I’d like to be prepared for a worst-case scenario.

    • Mark – did you read the other comments with the MSSQLtips link?

      • I could easily adapt the MSSQLtips script (Ola’s script uses separate subfolders for the three types of backup files). But if someone else has already done it (or something similar), I’d rather not reinvent that wheel.

        • Ola’s scripts also put those three subfolders under a subfolder for each database. I think that the MSSQLtips script assumes that all of the files are in one folder, but I could be wrong about that.

  16. OK. I agree with Brent’s assertion that our backup strategy should not be based on arbitrary rules, but instead it should be driven by the actual negative impact to the company shoud data be lost. In some cases, the loss of a single transaction can be devistating…it’s very rare, but it all comes back to a DBA’s favorite pat answer, “It depends”. In this case, it depends on the value of the data that could be lost, the finanancial and legal ramifications,etc… and that information has to come from the business. This doesn’t mean the business always knows best when it comes to backup strategy because they do not always fully understand what they are asking for, but that’s why it is important to engage in a dialogue about he pros, cons, risks, costs, etc… with the business in order to come to the proper solution.
    Now, to the main reason I am responding to this post…I feel like I’ve entered the twilight zone. Please tell me what I am missing about this discussion………
    I do not understand the angst about relying upon the SSMS GUI to generate restore scripts. I see recommendations to duplicate functionality that is already available to us natively in SQL Server. I’ve been restoring databases for years using the SSMS GUI with just a few clicks of the mouse no matter how complicated the restore scenario, how far back I wanted to restore to ( as long as the backup files still existed ), or the number of backup files involved….in our shop, each backup operation writes to its own distinct file. When formulating a restore, SSMS intelligently navigates the combinations of full, diff, and log backups and the timeline provides a very simple way to pinpoint your desired recovery point. It even properly traverses forked recovery paths when they exist, something that not even all of the leading third-party tools can do ( one example, Idera SQLsafe, which we use extensively here ). I have yet to run into a scenario in which SSMS did not provide the simplest and quickest route to the desired restore. To restore a database with SSMS: point, click a few times, and out pops your restore script, complete with hundreds of individual file restores if that is what is required with very frequent log backups like Brent is proposing above. Again, what am I missing? I am genuinely intrigued.

    • I ran out of patience trying to read this. What’s the question?

      • Sorry, I should have lead with that. Why the concern over relying upon SSMS to generate restore scripts? What functionality is SSMS not providing that is necessary to restore a database quickly and reliably?

        • One database isn’t a big deal. Try restoring 50 or 100 when a server disappears.

          • OK. After running through some more dramatic scenarios, I am seeing some of what I was missing. We have pretty comprehensive SOPs on how to recover in the worst of situations, and have had to do so on several occassions. But after this exchange and some deep thought ( ouch! ), I do see a gap in our DR plans….we need to be able to restore msdb to the point of failure in order to have all of the backup data we need to recover fully and quickly when msdb has also been lost. Currently we only do nightly full backups of msdb. I believe this will be easier and more reliable than a home-grown scripting solution based on existing backup files, although having one in reserve as another recovery option wouldn’t hurt…one more layer of protection. I’m not too concerned about handling multiple databases, as we can generate the scripts via SSMS so quickly, by the time one database is done restoring, we can have the next DB set or two ready to go….as long as we have the msdb data to do so. This will definitely spawn some additional requirements discussions in our shop. I always appreciate being challenged to think deeper into issues like this. Just when I think I have it figured out, I discover or learn something new. Keep challenging us!

  17. I’m a fan of Idera’s SQLsafe product, as it gives you a lot of the automation you need when you have lots of databases to backup, and want to be able to quickly find the files for a particular backupset, etc. You can also use their SQL virtual database product to attach a backup without having to restore it, which can be handy with large databases. And you get backup set compression and encryption, but that’s no longer an issue with 2012.

    • I agree, Brian! SQLsafe has been the backbone of our backup infrastructure for years. It provides a very effective enterprise-level backup/restore solution. It also has built-in functionality to backup directly to TSM, bypassing disk. You do give up the ability to use Virtual Database with this approach, but it does allow you to save on disk space for backups and the associated $$$. You can extract a backup file from TSM to disk to use Virtual Database if needed. Not as convenient, but possible. Their console/GUI is great and fascilitates quick troubleshooting, browsing, and restores. It even supports point-and-click restoration of databases from one server to another, very handy! The only negatives in my opinion are 1) obviously it adds cost to your infrastructure and 2) it doesn’t navigate forked recovery paths properly.

  18. Pingback: (SFTW) SQL Server Links 28/02/14 • John Sansom

  19. Ya, truly a tricky question to ask, but hourly differential backup and 10 min once transactional log backup will do good for most of the application right !!! OMG

  20. …and don’t call me Shirley.

  21. just a quick note on the number of files from our experience.
    One day out of the blue our full backups started to take a very long time to complete (from a few hours to over a day!). It took us some time to find the cause, which was the large number of small transaction log backup files.
    We’re backing up around 60 instances to a Data Domain, most of them backing up trnlogs every 15 minutes (and 1 full/day).
    A problem with the cleanup job resulted in an increasing number of small files on the Data Domain. Once we manually deleted the older trnlog backup files the duration of the full backups went back to normal.
    Btw, eventually the vendor pointed us in the right direction.
    So, this seems to be a case where more files could be an issue. Just wanted to let you guys know in case someone experiences the same symptoms
    Chrz

  22. My network administrator says you’re off your rocker! He said that every time you back up the log file, all transactions in the database are going to stop because they can’t write to the log while the log is bring backed up. He says that if we follow your suggestion, that we will be locking our database every minute and chaos and anarchy will reign.

    I’m trying to avoid a fist fight, or at least a heated argument, Could you provide me the information I need to keep peace in our office!

    Thanks,

    • Jeff – you know, that’s a great question. I would actually challenge *him* to prove that. I know for sure it’s not the case, but I can’t find any documentation proving it.

      The network admin is effectively saying, “I believe planes can’t carry margaritas.” You won’t find any documentation that says planes can’t carry margaritas.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php