In this issue of my multi-part series about SQL Server backup compression software, I’m going to talk about some features that database administrators might find useful, but don’t yet exist in the current generation of backup software.
In-Depth Backup & Restore Progress Monitoring
On multi-terabyte data warehouses – especially those running new backup compression software that the DBA may not be familiar with yet – DBAs want to know how the backup is coming along. None of the products on the market do a particularly good job on a centralized console showing up-to-the-minute statuses across a farm of servers. Everybody has pieces of the picture, but no one shows the whole thing at once. Statistics I’d like to see include:
- Status of the restore (is it initializing the data & log files or actually restoring data – very important distinction for large data warehouses)
- Estimated time to completion
- Compression ratio
- Throughput rate (especially compared to this server’s historical average or the datacenter’s average)
- Overlapping jobs (full backups that have run into the transaction log backup window, or jobs that won’t finish in a given amount of time)
- Trend alerting (jobs that are taking longer than normal, jobs getting much lower throughput than normal)
The good news is that by querying each vendor’s activity database directly, an ambitious DBA with time on their hands can produce these reports themselves. The bad news is that few of us have that much time on our hands! Well, I do, but I use it to write my blog, heh.
Ability to Send Emails During Restores
I’m spoiled by Symantec/Veritas NetBackup, which will automatically email the system administrators whenever a restore finishes (regardless of whether it succeeded or not). This feature helps me empower my junior DBAs: while I want them to do their own restores, I also want to know who’s taking advantage of the service, and I also want to know when I need to jump in and help with a failed restore.
Ideally, I would like the ability to enter an email address during the restore process and get an email when the restore finishes. Optionally, it’d be even better to get emails upon each 10% of progress made. My data warehouse restore took several hours, and I wish I didn’t have to keep logging in to check on its progress.
Diagnostic Reporting & Recommendations
The backup software should be able to pinpoint the current backup bottleneck. Some examples might be:
- CPU load – if the CPU is too heavily loaded, the backup software may be waiting on CPU time to compress the backup. It would alert the DBA and suggest a lower compression ratio.
- Source disk throughput – if the source drives aren’t responding quickly enough, and if CPU load is low, the backup software could suggest a higher backup compression ratio. That way, the backups could be compressed smaller without affecting the backup times or the server response times.
- Network throughput – if the backup target is a UNC path, and if the network bandwidth is saturated on the database server, the software could suggest adding network cards for better performance or using a higher compression ratio.
- Target disk throughput – as with our source disk throughput scenario, if the target drives are too slow, the software could suggest higher compression ratios.
This sort of reporting could be an add-on license, or be used as a cross-selling tool for the vendor’s SQL Server performance monitoring software. In Quest’s case, the combination of Quest LiteSpeed for backup and Quest Spotlight for performance monitoring means that the two products have all of the information necessary to make these configuration recommendations.
Restore Source Selection
When it’s time to restore databases, Idera SQLsafe does a great job of integrating history across servers. When selecting a source for the restore, SQLsafe lets the DBA pick from a list of source servers and databases. Idera also includes a button to fetch the most recent backup of that source database including all matching transaction logs.
Quest LiteSpeed does a better job of picking source files off disk, though: their file-picking wizard shows all machines in the network (not just SQL Servers) and lets users drill into each machine’s file shares. LiteSpeed accesses the restore files via UNC path. This beats Idera’s UNC selection method, which requires the end user to manually type in the full UNC path including file name.
The best restore source selection would be a combination of both Idera and Quest’s methods.
Even better, I’d like to see a true disaster recovery wizard. Instead of specifying a single database to restore, I’d like to be able to restore an entire server: fetch the latest full & transaction logs for all databases on a given server, and restore all of them to another target server.
Automatically Skip T-Logs on Simple Mode Databases
Our developers and junior DBAs routinely create new databases without advance warning. I have to know that those databases will get backed up without human intervention, and the only way to do that with today’s backup software is to use a single maintenance plan for all user databases. If we set up individual maintenance plans for different groups of databases, then any new database wouldn’t be backed up until it was manually added to that maintenance plans. That’s not an acceptable risk for us, so we have to use a single maintenance plan for all user databases.
However, on a single server we’ll have some databases in full mode and some in simple, depending on their purposes. For example, we have systems monitoring databases like Ipswitch WhatsUp, and we don’t mind losing transactional history for a day.
The problem comes in with the single maintenance plan for all databases. If we only use one maintenance plan, today’s backup software isn’t smart enough to skip transaction log backups on databases in simple mode. It will try to do a t-log backup, and then send an error because it can’t perform the backup.
One solution would be to have the maintenance plans automatically skip t-log backups for databases in simple mode. Another solution would be to use policy-based management that defines a default policy for any newly created databases, and then lets me move simple-mode databases into their own policy.
Take Corrective Action Inside the Management Console
Both Idera SQLsafe v4.6 and Quest Litespeed v4.8 have dashboards that show recent activity, like which databases haven’t been successfully backed up in a given time range.
That’s a great start, but from that same screen, I want to take corrective action (like starting a backup or scheduling one) and diagnostic actions (like viewing the error logs or the event logs). If something is color-coded red to indicate trouble, then it should have a right-click mechanism to correct the problem.
A while back, I got the nickname of Six Million Dollar Man. Somebody said that if they followed all of my suggestions, our products would take six million dollars to deliver. Here’s a good example of my wacko ideas.
For most of our production servers, we do full backups every day and transaction log backups every 15 minutes. Here’s the thing, though: I don’t care when the fulls happen, and I don’t always know when the server is at its lowest load point (where the backup should be done). The time of the full backup doesn’t really matter – I just want to find the right time to do it.
To make that easier, I want a report that shows the critical hardware metrics for a server so that I can pick the best time for a backup window. It should look at source disk load, target disk load, network throughput, CPU load and the size of the databases, and then suggest the best time window for the full backups.
As if that wasn’t tough enough, I want it to be aware of the backups that are already running, and highlight that load or subtract it from the report. Meaning, if there’s a set of full backups running from 4pm to 6pm, then don’t just show high server load during that window and say it’s the wrong time to run a backup. The backup is creating the load!
And finally, I would want to balance the backup load across multiple servers. I don’t want to try to back up two large data warehouses at the same time to the same file share, or two blades in the same chassis since that would max out my network throughput. I would want the ability to say that full backups should avoid overlapping.
(Pipe dream, right? I know.)
Some of my suggestions are pretty expensive to implement, but they give database administrators an idea of the kinds of features missing in the market now. This list shouldn’t stop a DBA from buying backup compression software – there’s still a huge ROI for backup compression, and I’ll talk about that ROI in my next blog post justifying the purchase price. (I’m writing these a little out of order!)