Right now, your transaction log backups are set up to run every 15 minutes, every hour, or maybe every few hours.
It’s time for a reality check. Figure out what internal customer owns the data inside the database, and walk into their office. Ask them how much data they’re comfortable losing.
You’re probably going to have to repeat the question a couple of times because they won’t believe these words are coming out of your mouth. After they eventually recover, they’ll blurt out words like “NONE” and “ZERO” and “ZOMG”. Give them a quick overview of how much it costs to have a no-data-loss system – typically in the millions of dollars – and then ask for another answer.
They’re going to ask for a list of options and their costs. Forget the fine-grained details for now – let’s just assume one of your solutions involves transaction log backups.
Is it more expensive to run your log backups:
- A – Every hour
- B – Every 15 minutes
- C – Every 5 minutes
- D – Every minute
It’s a trick question – they all cost the same.
“BUT MORE LOG BACKUPS MEAN MORE SLOWDOWNS!”
Nope – it actually means less overhead. To keep the numbers easy, let’s say you generate 60GB of transaction log data per hour. Would you rather back up 1GB per minute in small chunks, or kick the server in the junk every hour trying to back up 60GB all at once? The users would definitely notice the latter – the server will slow to a crawl while it handles that firehose of log data for a sustained period.
“BUT MORE LOG BACKUPS MEAN MORE FILES!”
Yes, but if you think you’re really going to manually restore log backups through the GUI, one at a time, you haven’t been through a real mission-critical crash. Long before disaster strikes, you need to be familiar with a script-based method to automatically restore all of the log files in a directory. Test it, get comfortable with it, and then when the outage hits, you can let your script restore everything ASAP, no clicks required.
“BUT MY LOG BACKUP JOBS CAN’T FINISH IN A MINUTE!”
If you have too many databases and your jobs can’t keep up, it’s time to start breaking up the log backups into multiple jobs. Some of my clients use two jobs, for example – one to back up all odd-numbered databases by the database ID in sys.databases, and another one to back up all the evens. The log backups don’t stress out their servers by any means, and users don’t notice the impact of two simultaneous log backups in two different databases.
“BUT SURELY YOU CAN’T BE SERIOUS!”
I’m just as serious as you were when you randomly picked 15 minutes or 1 hour as the log backup schedule. The point is that it’s not for the geeks to decide – it’s for the customer to decide. Ask your business user about the business needs for their data, and then give them a menu of options to meet that goal.
You think they’re all going to insist that all data is mission-critical, but you’d be surprised. Once they see the costs involved with log shipping, AlwaysOn Availability Groups, SAN replication, and so forth, they’re able to make better informed decisions about which data really is priceless, and which data we could really re-enter at minimal cost.
But until you ask them this question, they assume you’re never going to lose data.
What happen with all “VSS” freezing points ? Will this impact performance ?
What happen if I got 200 db´s on one instance in one job?
Thanks for advice, Magnus
Magnus – typically VSS freeze points are only used for full backups, not log backups. I would advise against using VSS for log backups.
Do you have suggestions on helping “BUT MORE LOG BACKUPS MEAN MORE FILES!”. I have tried to build an automated restore routine based on backups in a folder and have failed. Anything out there you would recommend looking at?
Garry – sure, in our 6 Month DBA Training Plan emails ( https://www.brentozar.com/go/needs ) we talk about this one from MSSQLTips: http://www.mssqltips.com/sqlservertip/1584/auto-generate-sql-server-restore-script-from-backup-files-in-a-directory/
I like the script you reference and can make it work for most things. But what if you have a directory full of FULL, DIFF and TLog files for multiple databases that are not named easily to identify. Have you seen a routine that will read the file information for each file and build a restore sequence based on the results?
No, fix the file names.
The easiest way I know of to fix the file names is to use Ola Hallengren’s maintenance scripts.
Just to clarify – reading the file information for each file is an incredibly bad idea due to taking additional time when you can least afford it.
Everytime I take a backup, the sql agent job outputs the restore script for each db backed up automatically and we store the scripts on the backup server. I’ll look for the link where I got the idea from. That way all your restore scripts are ready to go for each server.
You can dump RESTORE WITH HEADERONLY together with fileinfo to a temptable and from there generate restorescripts…
Easiest option is to setup your backups to go to different sub folders for each database, so you have sub folders for FULL, DIFF or LOG etc. for each one – makes it clearer what needs to be restored and in what order.
Gary – that’s an interesting approach. Do you have a script to automate the restores by folder? Or are you talking about doing human intervention for restores?
I wrote that in powershell about 9 years ago. Backups dump to a replicated NAS share with an instance -> dB filestructure. I load all of the files meta data into a dataset, iterate through with restore headeroly and restore filelistonly identify the full look for diffs and then subsequent logs. All written using SMO. Took a while but can be done. We use this as the backend for our self-service website which the Devs use to dump load prod -> non-prod.
Automatical scripts are fine, but (at least in SQL 2014) you could use the Management Studio too.
It lets you select one ore more backup files at once in the Add-Backup-Media dialog. Furthermore it reads the backup headers and uses only the necessary files, even if you select more. Also it supports the restore-to-point-in-time (something that is a little bit harder with the scripts).
If you have different folders for FULL / DIFF / LOG, you have to open the Add-Media-dialog three times (once for every directory) and multiselect the files to restore (theoretical you could press Ctrl-A to select all but it could need some time to read the headers from all LOG-Backups of the last month if you do a one-minute backup).
Thomas – yeah, it’s just usually painful when you talk about restoring dozens, hundreds, or thousands of databases through the GUI.
If you automatically and/or pre-build your restore scripts just add “norecovery” by default. It’s easy to add a single command at the end to bring it online. You can also add a commented StopAt to every restore command. It’s then easy to do a replace all on that with a time you want to stop on all. SQL Server does the rest for you, ignoring all that don’t need restoring.
Folders are a nice to have but once you go to scripts they don’t really get you much. Ola Hallengren scripts do all the hard work for us, breaking it up into folders for server/AG, Database and backup type.
You can set up your primary database as the target for Log Shipping and let Log Shipping create the jobs you are talking about for you. After inital set up, bring your databases on-line as the primary. The Log Shipping jobs will still work — just restore with NORESTORE option and have your TRN’s in the same directory you specified when setting up LS.
Works like a charm!
I have found sp_RestoreScriptGenie to be incredibly useful. Note that this script assumes that msdb is still available and that the backups are in their original location at the time of backup which limits its utility in some situations.
https://paulbrewer.wordpress.com/sp_restoregene/ is another useful tool
Hi, Garry, I have a script solution for you at SQL PASS 2014 Summit.
Personally I prefer to regenerate a backup script with every logfile taken. This is then moved, along with the logfile, to a dedicated backup storage server. Actually having a file ready with the restore that potentially only needs editing with a STOP AT time is very nice when things go pecs up. I would post the one I use here – but at c.380 lines it’s little long for the comment box I think.
I’d need to find somewhere to upload and post the link if anyone’s interested?
Sure, consider Github or Pastebin.
I’m interested. Do you have a link?
Have you considered writing your restore scripts to a database on another server? Maybe you have a DBA server. Pushing them to a central server means we can easily automate restore testing for all backups across multiple servers.
Here’s the script I use – makes a list of restore statements starting with the last full backup. One of the best SQL scripts I have ever copied and pasted if I do say so myself 😀
DECLARE @databaseName sysname
DECLARE @backupStartDate datetime
DECLARE @backup_set_id_start INT
DECLARE @backup_set_id_end INT
— set database to be used
SET @databaseName = ‘AdventureWorks’
SELECT @backup_set_id_start = MAX(backup_set_id)
WHERE database_name = @databaseName AND type = ‘D’
SELECT @backup_set_id_end = MIN(backup_set_id)
WHERE database_name = @databaseName AND type = ‘D’
AND backup_set_id > @backup_set_id_start
IF @backup_set_id_end IS NULL SET @backup_set_id_end = 999999999
SELECT backup_set_id, ‘RESTORE DATABASE ‘ + @databaseName + ‘ FROM DISK = ”’
+ mf.physical_device_name + ”’ WITH NORECOVERY’
FROM msdb.dbo.backupset b,
WHERE b.media_set_id = mf.media_set_id
AND b.database_name = @databaseName
AND b.backup_set_id = @backup_set_id_start
SELECT backup_set_id, ‘RESTORE LOG ‘ + @databaseName + ‘ FROM DISK = ”’
+ mf.physical_device_name + ”’ WITH NORECOVERY’
FROM msdb.dbo.backupset b,
WHERE b.media_set_id = mf.media_set_id
AND b.database_name = @databaseName
AND b.backup_set_id >= @backup_set_id_start AND b.backup_set_id < @backup_set_id_end
AND b.type = 'L'
SELECT 999999999 AS backup_set_id, 'RESTORE DATABASE ' + @databaseName + ' WITH RECOVERY'
ORDER BY backup_set_id
Interesting view, and I do think these days that an automated restore routine is important, it’s also potentially an issue in alerting/monitoring if you have logs every minute. You need to know if you have issues, like a full drive, but you also don’t want to be inundated with alerts. Potentially that would lead someone to turn off alerts for the time it might take them to respond, and not turn them on.
I also think there could be an issue here with the size of msdb and logging activity if you’re a person that backs a full once a week since you need to keep these log files back to the earliest full.
I’m not sure I agree, but I’m not sure I disagree. Thanks, Brent, for something to think about.
Steve – rather than alerting on the jobs, I typically want to flip it around and say, “Give me an alert when the number of databases with >2 minutes of data loss exposure is >0.” That way I have a little bit of flex time, and I only get one alert.
Any one-off Transaction log backup failure is not that critical when next one picks up with the backup-chain not broken. I have a script monitoring solution from CMS to show you at SQL PASS Summit 2014 that you can schedule at any interval for notification down to specific db level.
If you backup transaction log at every minute, you sure not want to be paged for every minute backup failure and on every database. If the next minute the backup transaction log is successful, that last one wasn’t immediately critical. You might want to look into the root cause when dawn breaks. I got a a solution with CMS and using powershell nd T-SQL to manage enterprise databases to show you at SQL PASS Summit 2014 if I get there.
A question & statement of immeasurable value:
“Ask your business user about the business needs for their data…”
“But until you ask them this question, they assume you’re never going to lose data.”
You mention the cost of various replication strategies, and how the cost can inform decisions about the value of data. Very good advice. The cost of a given replication strategy can extend to significant changes in hardware, data layout, batch workload scheduling, and even application level code/design. Until all the pieces are put together, its hard to know.
Sometimes discussions of data business value to shape recovery objectives can be tough – my people skills can be stretched in those discussions. But I greatly prefer having those discussions before a data loss crisis – or before a performance crisis in which replication is a contributor – rather than during a crisis, or even in post-mortem.
On the “BUT MORE LOG BACKUPS MEAN MORE FILES!” and “if you think you’re really going to manually restore log backups through the GUI, one at a time”, here is what we are currently doing:
Perform a full backup of the database overnight to a device.
Backup log files throughout the day, appending to the backup device holding last night’s full database backup.
So, if a restore to a point in time is required, through the SSMS GUI, it is easy to view and select any or all log backups to restore through (up to the desired point in time). We don’t have to pull up each log backup file through the GUI.
Of note, we are not dealing with extremely large databases. Sizes range from MB’s to 250 GB.
Since “BUT MORE LOG BACKUPS MEAN MORE FILES!” implies to me that each log backup (hourly, minutely, etc.) is going to a unique file name, I am wondering if this is the recommended method (as opposed to our current method) for log backups.
Hope I’m clear on this.
Thanks Brent and everyone there! We check out brentozar.com daily.
Pat – two big problems with that. First, if you want to sweep the backup files off to tape, it’s much more challenging if they’re currently in use. Second, if something goes wrong during a backup and the file becomes corrupt, all of your backups can be corrupt.
Hi, Pat, the backup info is in the MSDB that can be queried (and restore script automatically generated, saved as file) from CMS and using Pwershell and T-SQL for the enterprise database management of hundreds of SQL Servers including notification and audit. That is what I plan to show at SQL PASS Summit 2014.
This is the short end of SQL Server because there is no centralized backup catalog like Oracle RMAN can use catalog. Maybe SQL Server can improve on this in the future in order to be a big enterprise player. In the meantime, CMS and scripts will do for enterprise. Don’t pay for expensive backup software.
Good food for thought, but there is a considerable cost/benefit, risk/reward analysis I would have to do to change my hourly schedule now, to include asking the business. I realize there’s always a risk to having a failure – even with clusters and high-tech SANs, but the risk is so minuscule (once in my 15-year career).
We only put the local backups to tape once/week (keeping in mind the backup drive is presented from a SAN at another location) so we have bigger fish to fry in the near term 😉
Allen – you must be a lucky charm! How often have you had car accidents, and have you thought about canceling your car insurance? No sense in paying for it if you don’t need it.
Life insurance too – might as well cancel that, because you’ve never died, right?
Have a good one. 😉
I would ask myself how much data would I like to loose, and then ask the data owner the same question. It’s like the question about physically separate the datafiles from the logfiles.
Is it a Performance reason? Yes, but primarly for disaster recovery.
Hey Brent, you keep up the good work! This couldn’t be stressed less.
Would you consider synchronous mirroring/AG a zero-loss solution?
Tobi – if a user accidentally deletes or updates something, how would you recover?
Make a tail log backup and use point-in-time restore. I don’t need 1min backup for that.
Interesting approach. Have you had to do a restore to fix a single deleted record with that, and how long did it take?
Ah I see. You propose log-shipping with lag? That is indeed a nicer solution.
HA/DR does not replace backup. You can recover logical corruption much faster and precise with backups.
Hi Brent, thanks for that article. I have to say that the 1 hour log backup scheduling has been picked up randomly here too. Or at least we inherit from it. My biggest concern here is that TSM (IBM Tivoli) is used to perform backups in our environment and secure them and ship the tapes off-site, etc… but we do not have any control of it. We can ask for a restore but I doubt I can ask for something like “can you please take last night full + the 600 log backups that were taken after it?” to the TSM admin. But granted I am sure TSM can be automated too but I doubt any one in charge knows how to do that…
Benjamin, where I work, our Tivoli solution has a nice web-app front end that resembles a folder structure. It’s not practically more difficult to select 600 files, than it is to select 6.
Before they gave us R_W access, we had Read access, and could at least give them a screenshot of where the files live.
Benjamin, I am one of those TSM admins. There is a GUI that should make your recovery very user-friendly. It should even be available from the Start menu.
At some point I should have a PowerShell script to drive the process from a CLI perspective. But the TSM TDP for SQL Server has a fairly straight forward CLI command ‘tdpsqlc’ that fairly closely mimics the native SQL Server CLI commands for performing a restore. To be practical with any significant quantity of log files a script is really needed.
As usually, nice post.
I’m facing a server with more than 150 DDBBs, all of them in full recovery model.
I have developed an T-SQL that checks the actual size used by each db and launch the log backup based on the size of the log.
What do you think about this approach?
It allows us to have more backups for the heavy used DBs and less files and backups for the rest.
So in some way I’m measuring the amount of data we can loose in Mb, not in time.
Greetings from Spain and congratulations for your site, its a great resource for the SQL Server mere mortals.
Jorge – thanks, glad you like the site.
I would ask the business if they’re okay with it, and if they’re okay with databases being out of sync when you restore. Databases will be at different times, and that’s usually not okay for the business.
On the “BUT MORE LOG BACKUPS MEAN MORE FILES!”, in a log shipping scenerio I find that the restore process takes 20-30 seconds to process even the smallest TRN file that has no actual data changes, but only the relevant markers. So a restore process takes 2 minutes every hour even if there is no activity in the DB during that hour. If my math is correct, that would work out to 30 minutes of “restores” every hour, even when there’s nothing to restore. This pretty much makes the Warehouse unusable if we prevent user connections while performing the restores.
Al – if restoring a tiny log takes 20-30 seconds, you’ve got a problem, and you need to get that problem fixed.
I was waiting to see this post drop since the Dell backup webcast! Good stuff. Although I have to ask…are there any figures, or example tests with figures available to support the first assertion, saying that there is less overhead backing up more often? Not sure what the best measurement of that would be…even just cumulative job time over an hour with either 4 backups at 15 min or 60 backups at 1min, on a busy system. I get the idea that one big long log backup is more likely noticeable to the users than a constant stream of small ones, that makes sense, just wondering if people have done the testing…so I don’t have to ;)…to establish how exactly the performance changes as you change the increment…and in which direction!
Of course, your point about RPO is well taken. Luckily our business has sometimes proposed some laughably lenient expectations on RPO, but if your system can handle it easily, every 5 min or 1 min, I suppose sounds like a good idea. Regarding the many-files issue, if you aren’t employing some automation for tlog restores its time to get scripting, because doing it manually just sounds terribly laborious and prone to user error (with the CIO tapping his watch behind you).
Nick – thanks, glad you liked the post. Like you mentioned, it’s tough to get a single set of measurements that will make everyone happy, but cumulative job time is a great start. If you run those tests, make sure to clean out your MSDB backup history first as well – updating that can affect job times.
Hi Brent ,
Question regard log shipping with readable mode when back up log every minute . in standard log shipping for long transaction each trn file will be restored and rollback until the transaction completed (i.e when rebuild one index than take 5 minutes – it means that the log shipping will restore the first file 5 time and rollback it 4 time , the second file restore 4 time and rollback 3 time etc’ ).
We can work around it by customized script to restore log shipping only every X minutes ( 15 ? ) and rollback only open transaction in the last restore trn file .
Any recommendation ?
Eyal – log shipping with readable mode is for running reports on the log shipped subscribers, not disaster recovery. Those two things aren’t compatible. If you need a reporting server, build a reporting server – you won’t want to do restores every minute on that box.
Hi Brent ,
I need both reporting server and disaster recover . Do you have any suggestion for the report server to avoid the multiple log shipping restore and rollback when apply the every minutes log ?
Eyal – sure, the backup schedule has nothing to do with the restore schedule. You can just do restores once per day if you want as long as the backup files are available on a central server that isn’t affected by the main server going down.
But the issue is the multiple restores and rollbacks of any trn file that cause by the frequency of the backup vs. long transaction .
For example for transaction of Index rebuild that take 5 minutes the log shipping will restore the first file 5 time and rollback it 4 time , the second file restore 4 time and rollback 3 time etc’.
Eyal – I think you’re confusing how rollbacks work with how restores work. If you don’t run RESTORE WITH RECOVERY, then the rollbacks aren’t going to happen multiple times.
Can anyone recommend an automated restore script that would work well with the default folder structure and file names used by Ola Hallengren’s Maintenance Solution script in cases where the backup info in msdb and the Ola’s commandlog table is gone? I’d like to be prepared for a worst-case scenario.
Mark – did you read the other comments with the MSSQLtips link?
I could easily adapt the MSSQLtips script (Ola’s script uses separate subfolders for the three types of backup files). But if someone else has already done it (or something similar), I’d rather not reinvent that wheel.
Ola’s scripts also put those three subfolders under a subfolder for each database. I think that the MSSQLtips script assumes that all of the files are in one folder, but I could be wrong about that.
I found these, but haven’t decided which to implement yet:
I thought others might find this information useful.
OK. I agree with Brent’s assertion that our backup strategy should not be based on arbitrary rules, but instead it should be driven by the actual negative impact to the company shoud data be lost. In some cases, the loss of a single transaction can be devistating…it’s very rare, but it all comes back to a DBA’s favorite pat answer, “It depends”. In this case, it depends on the value of the data that could be lost, the finanancial and legal ramifications,etc… and that information has to come from the business. This doesn’t mean the business always knows best when it comes to backup strategy because they do not always fully understand what they are asking for, but that’s why it is important to engage in a dialogue about he pros, cons, risks, costs, etc… with the business in order to come to the proper solution.
Now, to the main reason I am responding to this post…I feel like I’ve entered the twilight zone. Please tell me what I am missing about this discussion………
I do not understand the angst about relying upon the SSMS GUI to generate restore scripts. I see recommendations to duplicate functionality that is already available to us natively in SQL Server. I’ve been restoring databases for years using the SSMS GUI with just a few clicks of the mouse no matter how complicated the restore scenario, how far back I wanted to restore to ( as long as the backup files still existed ), or the number of backup files involved….in our shop, each backup operation writes to its own distinct file. When formulating a restore, SSMS intelligently navigates the combinations of full, diff, and log backups and the timeline provides a very simple way to pinpoint your desired recovery point. It even properly traverses forked recovery paths when they exist, something that not even all of the leading third-party tools can do ( one example, Idera SQLsafe, which we use extensively here ). I have yet to run into a scenario in which SSMS did not provide the simplest and quickest route to the desired restore. To restore a database with SSMS: point, click a few times, and out pops your restore script, complete with hundreds of individual file restores if that is what is required with very frequent log backups like Brent is proposing above. Again, what am I missing? I am genuinely intrigued.
I ran out of patience trying to read this. What’s the question?
Sorry, I should have lead with that. Why the concern over relying upon SSMS to generate restore scripts? What functionality is SSMS not providing that is necessary to restore a database quickly and reliably?
One database isn’t a big deal. Try restoring 50 or 100 when a server disappears.
OK. After running through some more dramatic scenarios, I am seeing some of what I was missing. We have pretty comprehensive SOPs on how to recover in the worst of situations, and have had to do so on several occassions. But after this exchange and some deep thought ( ouch! ), I do see a gap in our DR plans….we need to be able to restore msdb to the point of failure in order to have all of the backup data we need to recover fully and quickly when msdb has also been lost. Currently we only do nightly full backups of msdb. I believe this will be easier and more reliable than a home-grown scripting solution based on existing backup files, although having one in reserve as another recovery option wouldn’t hurt…one more layer of protection. I’m not too concerned about handling multiple databases, as we can generate the scripts via SSMS so quickly, by the time one database is done restoring, we can have the next DB set or two ready to go….as long as we have the msdb data to do so. This will definitely spawn some additional requirements discussions in our shop. I always appreciate being challenged to think deeper into issues like this. Just when I think I have it figured out, I discover or learn something new. Keep challenging us!
I’m a fan of Idera’s SQLsafe product, as it gives you a lot of the automation you need when you have lots of databases to backup, and want to be able to quickly find the files for a particular backupset, etc. You can also use their SQL virtual database product to attach a backup without having to restore it, which can be handy with large databases. And you get backup set compression and encryption, but that’s no longer an issue with 2012.
I agree, Brian! SQLsafe has been the backbone of our backup infrastructure for years. It provides a very effective enterprise-level backup/restore solution. It also has built-in functionality to backup directly to TSM, bypassing disk. You do give up the ability to use Virtual Database with this approach, but it does allow you to save on disk space for backups and the associated $$$. You can extract a backup file from TSM to disk to use Virtual Database if needed. Not as convenient, but possible. Their console/GUI is great and fascilitates quick troubleshooting, browsing, and restores. It even supports point-and-click restoration of databases from one server to another, very handy! The only negatives in my opinion are 1) obviously it adds cost to your infrastructure and 2) it doesn’t navigate forked recovery paths properly.
Ya, truly a tricky question to ask, but hourly differential backup and 10 min once transactional log backup will do good for most of the application right !!! OMG
…and don’t call me Shirley.
Richie stole my joke 🙂
Though I have never managed a server with this much data or databases (and probably never will), I see the logic of a little slow down over time, to a big wait when it might be at a critical time. I have capture a few of the scripts to practice. thanks.
just a quick note on the number of files from our experience.
One day out of the blue our full backups started to take a very long time to complete (from a few hours to over a day!). It took us some time to find the cause, which was the large number of small transaction log backup files.
We’re backing up around 60 instances to a Data Domain, most of them backing up trnlogs every 15 minutes (and 1 full/day).
A problem with the cleanup job resulted in an increasing number of small files on the Data Domain. Once we manually deleted the older trnlog backup files the duration of the full backups went back to normal.
Btw, eventually the vendor pointed us in the right direction.
So, this seems to be a case where more files could be an issue. Just wanted to let you guys know in case someone experiences the same symptoms
Thierry – correct, that’s one of many reasons we don’t recommend folks use dedupe appliances with SQL Server: https://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/
Hey thanks for the tremendously useful link Brent. I was unaware of that and will certainly go have a chat with my colleague on Monday (the one responsible for our DD). Oh I can already picture him turning pale lol 🙂
My network administrator says you’re off your rocker! He said that every time you back up the log file, all transactions in the database are going to stop because they can’t write to the log while the log is bring backed up. He says that if we follow your suggestion, that we will be locking our database every minute and chaos and anarchy will reign.
I’m trying to avoid a fist fight, or at least a heated argument, Could you provide me the information I need to keep peace in our office!
Jeff – you know, that’s a great question. I would actually challenge *him* to prove that. I know for sure it’s not the case, but I can’t find any documentation proving it.
The network admin is effectively saying, “I believe planes can’t carry margaritas.” You won’t find any documentation that says planes can’t carry margaritas.
I don’t think he understood the backup. I expected the physical log file to be locked while it was being copied. I explained the log file structure and that the backup was just a copy of the data within the file, with very distinct start and stop points. He’s on board now.
Well I am sitting at home watching 20 txlogs restore over the last 4hrs with no end in sight… seems regardless of how big or small my txlog file is, there’s huge wait time between each as they restore. more isn’t better in my case.
Don – you may be falling victim to an MSDB that hasn’t been purged in years. Here’s the background on that: https://www.brentozar.com/archive/2009/05/brents-backup-bottleneck-msdb/
I have tested changing our hourly backups to minutely, and it seems to work fine. I know when I go to management and recommend we change to minutely backups, they’re going to ask why we couldn’t do it more frequently. We have fairly small databases, and the minutely t-log backups are taking ~5 seconds. Is there any reason not to continue your line of thinking and go with 30 seconds? How about 15 seconds? 10? 5?
Dave – sure, in mission-critical environments, some shops have a WHILE 1 = 1 loop that just keeps backing up log files.
Great post Brent!
We already do run our t-logs on a minute schedule for our mission critical systems. A question though, I notice that if there are heavy transactions, that sometimes results in the backup taking more than a minute. For instance, when we add large tables to transaction replication read-only subscribers, sometimes the backups take several minutes while the snapshot and distribution agents are busy writing schema and data changes over. Anyway to mitigate that?
Great post! We already have our mission-critical databases on a 1 minute transaction log backup schedule, and have never experienced any problems with it. A question though, I notice that when there are heavy transactions happening, the backups take longer to complete, sometimes several minutes, missing the schedule in between. An example would be adding a large table to a transaction replication read-only subscription. On the subscriber I’ve noticed it can take a long time for backups to complete while the snapshot and distribution agents are loading the schema and data. Is there anyway to mitigate this?
I’m not sure this would be quite the same situation, but your mention of replication and hitting bottlenecks with log backups reminds me of a similar situation we had with log backups and CDC…basically, make sure your VLFs are well configured (not too many) because we had one database with high VLFs and CDC, and the log reader for CDC/replication was having a hard time churning through the log during big expensive operations like index maintenance. Shrinking and regrowing out the log file at fixed intervals to get rid of the massive VLF count really seemed to improve performance and resolve our issues with it.
Brent mentions splitting backup jobs up; but he doesn’t mention how to do this if you’re using Ola Hallengren’s SQL Backup scripts (MaintenanceSolution.sql).
Since I’ve never tried to do this before and found myself needing to split transaction log backups into multiple jobs, I found myself needing to modify Ola’s scripts. Here’s how I changed the ‘transaction log’ backup job:
@description=N’sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d DatabaseAdminTools -Q “DECLARE @odds nvarchar(max);SELECT @odds = coalesce(@odds+”,” , ””) + name from sys.databases where database_id > 4 AND database_id % 2 = 1;EXECUTE [dbo].[DatabaseBackup] @Databases = @odds, @Directory = N”H:\Backup”, @CompressionLevel=1, @Compress=”Y”, @BackupSoftware=N”SQLBACKUP”, @BackupType = ”LOG”, @Verify = ”Y”, @CleanupTime =336, @CheckSum = ”Y”, @LogToTable = ”Y”, @Execute=”Y”” -b’
[DatabaseAdminTools] is the name of the database I keep our maintenance scripts in, yours may differ.
The secret sauce is this line (incredibly simple, but I’d never tried it before):
DECLARE @odds nvarchar(max);SELECT @odds = coalesce(@odds+”,” , ””) + name from sys.databases where database_id > 4 AND database_id % 2 = 1;
And then use “@odds” as the parameter to @Databases.
George – HAHAHA, Odds scares me just because it implies gambling, but I like where you’re going with that, heh.
I have a question regarding log shipping, i have created a log shipping every went well, now my Q is 1.How do i know the transaction log has been restore to the the database? 2. Where are the transaction logs came from?
You can use a logshipping monitor server, or you can query logshipping tables to see that information. Example: http://msdn.microsoft.com/en-us/library/ms188331.aspx
For more info on transaction logs, see Jes’ video here: https://www.youtube.com/watch?v=lcmYeE-cqQo
sorry i meant the log shipping got successfully.
I’d like to add that after running these t-log backups once a minute for about half a year now, I’ve started to notice msdb bloat (1,000% growth). I tracked the culprit down to the msdb.dbo.backupset table which contains a row for each backup set. The msdn page references the sp_delete_backuphistory stored procedure as a way to limit the growth of this and other associated tables. I’d recommend anyone setting up a frequent backup schedule consider creating a sql agent job to run this stored procedure and clear out old backup records every so often.
I found that the sp_delete_backuphistory sp can cause locking issues, even if only deleting a small amount of data, so I delete out the backup history manually, (as part of the nightly full backup) which works well.
After a nightly full, we delete the prior transaction log backup history, so msdb does not bloat. although it means that the history is not available for prior days, we currently only keep transaction logs since the last full (might increase to last full -1, to give some extra granular restore ability), so this works for us.
Would anyone recommend log shipping every 1 min for a single db server with over 1000 databases? If not, what timing and why?
Brian – start here: https://www.brentozar.com/archive/2015/02/many-databases-can-put-one-sql-server/
This blog post is in Portuguese (I can translate it), but I think it can help on thinking about how many databases a single instance stands for… =)
How is this strategy affected by our VMware administrator performing server snapshots once a day? During the snapshot process, it appears a full DB backup is being performed. Does this ruin my SQL backup chain (full, diff and transaction)?
Pat: your question makes it sound like you’ve never tested your backups. O_O
You want to get started on that right away, because some snapshot products do indeed break the backup chain in a big, ugly way.
You are very observant Brent, we have not tested our backups. There’s no time for that (so says the company), they’ll just work.
Until just two days ago, I didn’t even know that the VMware snapshot backup was backing up the databases as part of it snapshot. According to the backupset history, the backup (from the snapshot) is a copy only so I think I’m OK, but still not entirely convinced.
I need to test it and I need to convince the company to spend some time to develop server restoration documents. In the mean time, I’ve convinced our VMware manager to only snapshot our SQL servers once a week and not daily.
Thanks for the great column, loved the reading of this.
Very interesting your points about log backups. I´ve ready your blog post to the end, and I thought taking log backups on a minute basis will only protect your database on situations where you can´t take tail log backups when a crash occurs, right?
If I understood it right… don´t you think the probability to get some of these hundreds or even thousands of log backup files corrupted is higher than a crash scenario where you can not take the tail log backup?
I´m talking about the cost vs benefits on taking that kind of log backup strategies.
Edvaldo – correct. In my decade-plus of working with SQL Server, I’ve seen sooooo many instances where the server was down and unable to take a tail of the log backup. Hardware crashes, storage failures, SQL Server refusing to start, you name it.
If you’re worried about backup files being corrupted, that’s a separate problem – go fix that! Don’t back up to corrupt storage. 😀
Regarding costs (vs. benefit):
Each log file is about 92 KB, even if there was no transaction to log. So you would have an overhead of ~129 MB per database, if you log every minute (vs. only once per day or once per hour).
The total backup time would not increase (beside the little overhead), because it makes not much difference, if you write 50 GB log backup at once or split it in many small parts (for the I/O subsystem it should be better to write 30 MB every minute over the whole day instead of writing 50 GB at once).
Restore: on my local PC it takes about 1-2 seconds to “restore” an empty / nearly empty log backup file. If you would take log backups every minute, your total restore time could be increased by 1440 min per day * 2 sek / 60 sec = 48 min in the worst case (slow restore, crash just before the next full / diff backup; no log shipping set up). If you would backup all 5 min (as we do) the restore time would be increased by max. 10 min.
Regarding the chance that one of the 1440 log backups is corrupt vs. the chance, that one of e.g. 24 log backups is corrupt:
– corruption could / should only occur because of I/O-errors (firmware error; backup to a bad tape / local or external HDD without RAID / parity). Logical errors (because of a bug in SQL Server) should be negligible
– there is a slighly higher chance for many files to be corrupt than for few – if the I/O-error is in the file table instead of the data part of the HDD
– if there is an error in a 1-hour backup file, you’ll loose up to 60 min data (+ all data in following log backups)
– if there is an error in a 1-minute backup file, you’ll loos only 1 min data, because you can restore all the files before
Thomas – great in-depth thinking there!
We take short interval backups as you’ve suggested. I then routinely run a script which restores a VerifyDB using a script restore to test the viability of those backups to restore the database, then perform some DBCC checks on the VerifyDB to determine if there might be issues with the resulting database if we ever had to actually restore it. Much more frequently than I would have ever guessed ( on average now about 1 run in 9 ), the script hums along with the FULL, the DIFFERENTIAL, and then at some point say LogFile = 70 or 220 ( or in one of this morning’s failures LogFile 2 ), the script fails with the error…
Database ‘VerifyDB’ cannot be opened. It is in the middle of a restore. [SQLSTATE 42000]
I can’t believe that sql server backups are that unreliable. What is the flaw in my process?
Unfortunately, I can’t really troubleshoot your scripts from here.
Just wondered if there might be some switch or a timing problem I was not taking into consideration. In general the script is something like.
RESTORE DATABASE VerifyDB FROM DISK=’Full backup file’ WITH STATS = 1, MOVE ‘prod_log’ TO ‘Verify.ldf’, MOVE ‘prod’ TO ‘Verify.mdf’, NORECOVERY, REPLACE — restores until Jan 27 2016 10:09PM
RESTORE LOG VerifyDB FROM DISK=’tranlog’ WITH STATS = 1, FILE = 2, NORECOVERY — restores until Jan 27 2016 10:01PM
RESTORE LOG VerifyDB FROM DISK=’tranlog’ WITH STATS = 1, FILE = 3, NORECOVERY — restores until Jan 27 2016 10:02PM RESTORE LOG VerifyDB FROM DISK=’tranlog’ WITH STATS = 1, FILE = 4, NORECOVERY — restores until Jan 27 2016 10:03PM
RESTORE LOG VerifyDB FROM DISK=’tranlog’ WITH STATS = 1, FILE = 5, NORECOVERY — restores until Jan 27 2016 10:04PM
and a few hundred more
Results typically look like
RESTORE DATABASE VerifyDB FROM DISK=’full backup file’ Move…
6 percent processed. [SQLSTATE 01000]
13 percent processed. [SQLSTATE 01000]
20 percent processed. [SQLSTATE 01000]
27 percent processed. [SQLSTATE 01000]
34 percent processed. [SQLSTATE 01000]
41 percent processed. [SQLSTATE 01000]
48 percent processed. [SQLSTATE 01000]
55 percent processed. [SQLSTATE 01000]
62 percent processed. [SQLSTATE 01000]
69 percent processed. [SQLSTATE 01000]
76 percent processed. [SQLSTATE 01000]
83 percent processed. [SQLSTATE 01000]
90 percent processed. [SQLSTATE 01000]
97 percent processed. [SQLSTATE 01000]
100 percent processed. [SQLSTATE 01000]
Processed 1832 pages for database ‘VerifyDB’, file ‘[database]’ on file 1. [SQLSTATE 01000]
Processed 3 pages for database ‘VerifyDB’, file ‘[database.log] file 1. [SQLSTATE 01000]
RESTORE DATABASE successfully processed 1835 pages in 0.196 seconds (73.107 MB/sec). [SQLSTATE 01000]
often will go sometimes a few sometimes 70 to 200 or more transaction files in before…
RESTORE LOG VerifyDB FROM DISK=’tranfile’ WITH STATS = 1, FILE = (?), NORECOVERY — restores until Jan 27 2016 10:01PM
Msg 927, Sev 14, State 1, Line 1 : Database ‘VerifyDB’ cannot be opened. It is in the middle of a restore. [SQLSTATE 42000]
Is there a down side to using copy-only full backups in combination with normal transaction log backups?
I ask because we are looking into a sql 2014 upgrade and using Availability Groups. You can do only copy-only fulls from a secondary, although the transaction log backups seem to work well no matter what replica you run them on. As long as you have the copy-only full, and and an unbroken set of transaction log files that cover the period in which the copy-only full ran, you seem to be able to restore without any troubles.
JohnH – that question isn’t really related to this blog post. When you’ve got a general question about databases, your best bet is to head over to http://DBA.StackExchange.com. (I’d rather teach you how to fish.) Enjoy!
Please feel free to redirect me! I’m asking you as I think you can probably give me quick and straight advice.
We have a single database, not too big only 60Gb running on SQL Server 2008 R2. It’s merge replicated every minute to 3 subscribers. I’ve kind of inherited the backup plan, which is a single backup of the Database, Log and system databases every evening including full server backups with Backup Exec.
We’ve had 2 major server failures over the last 10 years, both were recovered with minimal data loss (acceptable to the company) using these backups. Downside is it took a few hours, but because of our multiple sites users are just redirected to continue working.
Would we benefit by adopting a more frequent log backup plan?
Jim – rather than asking me, ask your business users. Head on over to http://BrentOzar.com/go/fail and get our HA/DR RPO/RTO worksheet, fill that out, and share it with your business users. They’ll tell you if things need to be better. Hope that helps!
What happens when you’re generating new transaction logs faster than they can be written to disk?
We have our transaction log backups happening every 20 minutes. As a test I tried reducing it to once per minute as you propose here but what I found is that for awhile transaction log backups would occur once per minute, but then there would be a spike in transaction log activity and the next backup would take 20 minutes to complete, then the one after that 20 minutes and so forth until the next differential or full backup occurred and the backups would go back to one per minute for awhile until the problem repeats.
For us we have a single database being backed up so creating multiple transaction log backup jobs won’t help either…. is there another way around this besides faster disks or eliminating activity which causes the transaction logs to grow?
Brad – it’s so funny you ask that! That’s exactly how I first wanted to go into consulting. I ran into that exact problem, and we brought in a consultant who said, “This looks impossible – there’s no way you’re doing that many transactions per second. I’ve never seen a server go that fast.”
At that moment, I knew I wanted to be a consultant, because I already had more experience than that guy.
You basically have three options:
* Write less transactions
* Speed up the reads on the transaction log drive
* Speed up the writes on the log backup target
Troubleshooting each of those is beyond what I can do fast in a blog post comment, but hopefully that gets you started on the journey. And congrats!
I had to restore 5 minute interval logs from a tape library once. It took well over a day because the tape robot was retrieving tapes all the time, and rewinding / fast forwarding to some very small log backup somewhere on the tape. Now, I use a script to determine the percentage usage of the log file, and trigger a log backup once it’s over 10%. Never looked back.
Peter – have you explained to the business that you can lose hours of data? As long as they’re cool with that…
Why do you write your backups primary to tapes? Get some hard drives (no, don’t use the same SAN as for your data), create the 5-min-log backups on this drives and let your backup software write those files to the tapes when a sufficient amount (or time) is reached. When you need to do a restore (and have all the log files on the disks already deleted), you can restore the files from tape to disk (should be fast, if they are many written in one large block) and then a SQL restore from disk.
PS: Bonus points for using the MIRROR_TO option to write a copy of the backup stream to a second location (same drive type, so another HDD and no HDD plus tape).
The risk of not being able to restore the data for days is offset by a mirror setup which we have on line. Furthermore, we have the data in cash registers, so while we had some corruption trouble with the database, which was 10TB at the time, we never lost a byte of data.
If we backup every minute, or backup based on the size, what’s the difference? That increment can be very small. While we rebuild indexes using Ola’s scripts, there’s a lot of log activity, but especially during the night, there’s hardly any update, delete or insert at all. No, I’d rather backup logs on demand, I’ve learned the hard way. IMHO there’s no merit to timed log backups.
Peter – sure, as long as nobody ever accidentally runs a delete or an update without a where clause.
Oops. It’s trashed on both the primary and the mirror. Enjoy!
Brent, how do you schedule transaction log backups every minute? Is that using an SQL Server capability? Don’t the backups block so you can’t run another backup (of the same type) until the current backup finishes?
I can appreciate pushing the frequency to the wall. On our most critical SQL Server DB, we take transaction log backups every 15 minutes. Most complete in seconds. However, there are occasional amazingly large transaction log backups that take over an hour!
We have been using the Windows Task Scheduler and I note that it blocks: if a backup is running when the next is to start, the next isn’t run.
My next script will now probably start itself again to keep taking transaction log backups continuously (or sleep to limit frequency to one minute before).
What do you think of taking more frequent differential backups? We are still taking daily fulls (which are taking about 5 hours for 3TB). We have also resorted to taking 2 transaction log backups before the full (won’t need to do that when we switch to a continuous transaction log backup model) to keep the amount of log contained in the full to a minimum (and we keep taking transaction log backups every 15 minutes).
Steffan – using SQL Server Agent.
If you have log backups that take over an hour, that means you have no point-in-time recovery during that hour. If the server dies, you just lost up to an hour of data. That’s usually a sign that you need to do performance tuning on your backup process: tweaking compression settings, backing up to a faster share, checking your network throughput, etc.
Is it possible to restore Transaction Logs to a database every hour without having to restore the full backup first?
Can you set database to No Recovery, apply transaction Logs, then set to Recover. Do that every hour? is that possible?
John – for Q&A, head over to https://dba.stackexchange.com/
You can set the database to STANDBY. This allows you to read (not write) from it and you can restore the next log backups any time you want (and set it to standby again).
And of course you can execute a RESTORE myDb WITH RECOVERY when you need it as full usable database (but after this you will no longer be able to append more log backups).
Hey, Brent. Long-time listener, first-time caller (bet it’s been a minute since anyone heard that). I’m planning our backups and want to implement minutely log backups. My concern is what happens if a log backup runs long? Do the other log backup jobs fail or stack up and run as soon as the other job ends? My guess is they fail but I’m curious if they fail “elegantly” (i.e., the backup chain isn’t broken and everything is fine, just the job fails) or does it bugger everything up? I see similar questions asked above but they weren’t exactly the same and I couldn’t gleam my answer from those questions or some google-fu. Thanks for the help.
Jon – why not give it a shot? While one of your Agent job doing a log backup is running, try triggering it again and see what happens.
Hello! I have been very successful in my backup strat! i have created a table which holds all databases and what sort of backup i want for it. i.e DayendFull, HourlyTransactionLog… I then have set a stored proc for each of these schedules which uses a cursor to run through each one with a where clause to restrict whats being backed up in my table. this means i get the dynamic results i wanted and makes it very easy setting a new database up or removing one from the desired scheduled. for each scheduled i audit the process and send a email on any failing database. Once a database is backed up or has failed, i add this to a to an audit log table, then verify the headers straightaway and log the result to the audit table before going on to the next. this means i have a fully error handed backup system which audits the results and only lets me know if there are failures.
I do however check the results of the audits weekly when i do my full restores onto another instance. you can only fully check your backups with a full restore, with headers only you may still have issues with your backup set! I do this in a similar way but from the all backups in the folder and puts my results in my audit table for ISO!!