Out of all of the scripts in our free First Responder Kit, sp_BlitzBackups is probably the one you’ve used the least. Let’s talk about what comes out of it, and why you should use it more often.
First, let’s define two terms:
- Recovery Point Objective (RPO) – measured in time, it’s how much data you would lose if you restored a backup. For example, if your last backup finished at 1AM this morning, and it’s currently 10AM, you’d lose 9 hours of data if the server went down right now. That’s a 9 hour RPO.
- Recovery Time Objective (RTO) – also measured in time, it’s how much time it would take you to perform a recovery. For example, if the server went down right now, and it took you a total of 1 hour to decide to do a restore, start the restore process, finish the restore, and then let people back into the app again, that’s a 1 hour RTO.
Businesses have a really hard time agreeing on what their desired RPO and RTO are because they wanna lose as little data as possible, and they wanna be down for as short a time as possible. That’s where sp_BlitzBackups comes in. It can’t tell you what your goals are, obviously. Instead, it estimates what you’re actually delivering today.
When you run sp_BlitzBackups, it analyzes the last 7 days of backups. Here’s what it returns:
I know, tiny screenshot – you can click on it if you wanna see the results, but I’ll describe the non-obvious columns for you:
- RPOWorstCaseMinutes – the longest length of time between two successful backups. Say you regularly do log backups every 15 minutes, but between 9:10AM and 9:40AM, the backup target was offline, and no backups could be done. You had successful log backups at 9:00AM and 9:45AM. Your worst case for RPO would be if the server went down at 9:44AM, just before the 9:45AM backup ran, so you’d have a 44-minute worst case RPO.
- RTOWorstCaseMinutes – the longest length of time it would take you to do a restore and get back online. Say you do full backups every night, and log backups every 15 minutes. Your worst case RTO scenario would be if the server went down right before the nightly full backup because you’d have to restore the prior day’s full backup, plus all day long’s transaction log backups. In order to calculate this number, we take the total backup time spent during all of those backups. Technically, this isn’t accurate because your restores could take longer than the backups, especially due to the lack of instant file initialization on the transaction log. This just gives you a rough starting point idea, though.
- Supporting info – when people see these “worst case” numbers, their next question is, “Holy smokes, when was the time where we might have lost 44 minutes worth of data?” The following columns give you information about what the dates/times were and the backups involved, plus more-info queries so you can examine the history stored in MSDB, like this:
Wanna performance tune your backups too?
Continue scrolling to the right in the result set, and you’ll find:
- FullMBpsAvg, Min, Max – the throughput you’re getting in megabytes per second.
- FullSizeMBAvg, Min, Max – how large your backups are, before compression.
- FullCompressedSizeMBAvg, Min, Max – how large your backups are, after compression.
- Similar columns for Diffs, Logs – these are useful if you need to estimate change rates.
I love using backup throughput as an early warning system, like a canary in the coal mine. If backup throughput suddenly drops, it’s a sign that something went wrong with the storage or the networking. We can’t tell you whether it’s a problem with slower read speeds, slower write speeds, or more traffic on the storage network, but it’s just your clue that it’s time to start troubleshooting – because user queries are likely running more slowly too. And when backup throughput drops simultaneously across multiple SQL Servers, that’s an even bigger clue that something went seriously awry with the company’s overall storage.
When you sit down at your desk each morning, backups aren’t usually the first thing on your mind. Just take a few minutes today to double-check that you’re actually backing this stuff up as frequently as you think, and that downtime will be as short as you think. I wish I had a dollar for every time a client’s DBA got caught with their pants down because backups weren’t scheduled correctly or were regularly failing at specific days/times.
Oh wait – I actually do have a dollar for each of those times. Several dollars, in fact. Look, let’s not get sidetracked.
For over 10 years, for our 90,000 databases, we have been using Redgate Sql Server Backup. Why? Because of its robustness and versatility, and moreover it’s excellent compression, much better than that by Microsoft.
Everytime we reevaluate, it’s nope for Microsoft.
With all due respect, imho, reliability is not negotiable in deciding a backup implementation, mostly. Redgate compression has significant value add, but fails throughout their versions to provide an acceptably reliable solution. Native SQL Server backups have yet to fail me in many years. While compression is directly linked to cost savings, it takes long term vision to realize the trade off. I don’t mean to imply the Redgate product doesn’t have its place, but it’s not the backup solution savior I have always hoped it would be.
MY goto source for plain and precise information and all the stuff the main paid for courses don’t tell you, thank you
Thanks for post Brent and thanks for the information Nicolas Reid it’s good to hear real life solutions.
We don’t follow best practice where I work we follow what the business defines so every implementation is the same and everyone knows how to support it.
I’m trying to align them so they are the same but it’s a challenge but I fear we need to have a disaster before we change our implementation.
Backups only become important when you need to restore and that’s when the business finds the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO) don’t align to meet the business needs but by then it’s too late.
Happy New Year to all and I hope we have a better year this time.
could it be that the RTOworstcase column will give an incorrect value if using Backup software like Veeam?
Hi. As I write in the post: