March 31st is World Backup Day, and while most of the press will be focusing on backing up your priceless lolcat photos, it’s still a good time to talk with management about your database backups.
Your boss will have some simple questions, and you can arm yourself with a few easy queries.
Are We Backing Up Every Database?
Even if you think your backups are running successfully, you have to double-check. I had a client recently who’d copy/pasted jobs across several different servers and databases. While editing one of the jobs, someone made a typo, and they were backing up the master database instead of the user databases. The backups worked, ran successfully every night, but weren’t actually backing up the data that people really cared about.
This query will list the last full backup for all of your databases, and like all of the queries in this post, the results should come back nearly instantaneously with no blocking. They’re completely safe to run in production.
SELECT d.name, MAX(b.backup_finish_date) AS last_backup_finish_date FROM master.sys.databases d LEFT OUTER JOIN msdb.dbo.backupset b ON d.name = b.database_name AND b.type = 'D' GROUP BY d.name ORDER BY 2 DESC
Review that query result and double-check – then ask, “Do I really need to back up all of these databases?” If you’ve got the demo databases AdventureWorks, Northwind, or pubs on your server, and you’re backing them up daily, you’re wasting resources. Get ‘em off your production servers.
Are We Backing Up Corrupt Data?
Just because your backups are succeeding doesn’t mean you’re backing up legit data. The backup process doesn’t do anything like the DBCC CHECKDB process, which checks that the data on disk actually makes sense.
I really, really care about this because if you get a data corruption error, you may have to restore the most recent database backup and see if the data’s corrupt there too. If it is, step back and restore the full backup from the day before that – and the day before that – and the day before that. If you’re lucky, you’ve got a copy of the data from before the IO corruption occurred. If you’re unlucky, you don’t, and then you start looking for an uncorrupted copy of your resume.
The key to success: run DBCC more often than you expire backups. If you only keep 7 days of backups, then you should run DBCC more frequently than that. If you only run DBCC once a month, you won’t find corruption until long after the clean backups are gone.
Here’s how to see the last time DBCC CHECKDB finished successfully on each database (for SQL Server 2005 and newer, and must be run in a database in compatibility level 90 or higher):
CREATE TABLE #temp ( ParentObject VARCHAR(255) , [Object] VARCHAR(255) , Field VARCHAR(255) , [Value] VARCHAR(255) ) CREATE TABLE #DBCCResults ( ServerName VARCHAR(255) , DBName VARCHAR(255) , LastCleanDBCCDate DATETIME ) EXEC master.dbo.sp_MSforeachdb @command1 = 'USE [?]; INSERT INTO #temp EXECUTE (''DBCC DBINFO WITH TABLERESULTS'')', @command2 = 'INSERT INTO #DBCCResults SELECT @@SERVERNAME, ''?'', Value FROM #temp WHERE Field = ''dbi_dbccLastKnownGood''', @command3 = 'TRUNCATE TABLE #temp' --Delete duplicates due to a bug in SQL Server 2008 ; WITH DBCC_CTE AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY ServerName, DBName, LastCleanDBCCDate ORDER BY LastCleanDBCCDate ) RowID FROM #DBCCResults ) DELETE FROM DBCC_CTE WHERE RowID > 1 ; SELECT ServerName , DBName , CASE LastCleanDBCCDate WHEN '1900-01-01 00:00:00.000' THEN 'Never ran DBCC CHECKDB' ELSE CAST(LastCleanDBCCDate AS VARCHAR) END AS LastCleanDBCCDate FROM #DBCCResults ORDER BY 3 DROP TABLE #temp, #DBCCResults ;
If you’re not happy with the results of this query, it’s time to start running DBCC CHECKDB more often. If you’re unable to run it in production due to performance/uptime issues, consider restoring your databases at least once a week to a development or QA or disaster recovery server, and run DBCC CHECKDB there. Red Gate’s SQL Backup 7 Beta even helps schedule this for you.
Are The Backups Making It to Tape?
If you’re backing up to disk, and the network admins are backing up your disks to tape, … well, are they? How can you be sure?
The first step is to check the archive flag. Open Windows Explorer and navigate to the folder where you’re writing backups. Right-click on the bar where Name, Date Modified, and Type are shown, and click More. Choose the Attributes column, and your Explorer window will look something like this:
If you see an “A” in the Attributes column, that means the Archive bit is set – the file is ready to be archived. When your backup software runs and sweeps these files to tape, it resets the Archive bit to off, thereby telling you that your file no longer needs to be archived. If you see A bits on database backups from yesterday, that tells you they haven’t made it to tape yet, and it’s time to start asking questions about backup schedules.
Are Our Backups Fast Enough?
In the msdb database, SQL Server saves backup size and duration for all backups. We can use those two numbers, we can use the power of math to get throughput:
SELECT @@SERVERNAME AS ServerName , YEAR(backup_finish_date) AS backup_year , MONTH(backup_finish_date) AS backup_month , CAST(AVG(( backup_size / ( DATEDIFF(ss, bset.backup_start_date, bset.backup_finish_date) ) / 1048576 )) AS INT) AS throughput_MB_sec_avg , CAST(MIN(( backup_size / ( DATEDIFF(ss, bset.backup_start_date, bset.backup_finish_date) ) / 1048576 )) AS INT) AS throughput_MB_sec_min , CAST(MAX(( backup_size / ( DATEDIFF(ss, bset.backup_start_date, bset.backup_finish_date) ) / 1048576 )) AS INT) AS throughput_MB_sec_max FROM msdb.dbo.backupset bset WHERE bset.type = 'D' /* full backups only */ AND bset.backup_size > 5368709120 /* 5GB or larger */ AND DATEDIFF(ss, bset.backup_start_date, bset.backup_finish_date) > 1 /* backups lasting over a second */ GROUP BY YEAR(backup_finish_date) , MONTH(backup_finish_date) ORDER BY @@SERVERNAME , YEAR(backup_finish_date) DESC , MONTH(backup_finish_date) DESC
The awesome part of this query is that it’s not measuring backup duration alone – sure, backups run longer as you add more data. This query’s actually measuring backup throughput, meaning how fast the backups can get data out.
When I review the results of this query, I focus on the throughput_mb_sec_avg field and look for big drops (or increases) over time. If backup speed dropped by 30% in January, I start asking questions about what network or storage changes we made at that time.
Compare that number to the bandwidth rates in Kendra Little’s “How Big Is Your Pipe?” bandwidth reference poster, and you’ll get a rough idea for comparison. If you can’t get at least the throughput of a 1Gb Ethernet connection, it’s time to start talking to your storage and network admins about teamed network cards, RAID 10 backup targets, and how to get shorter maintenance windows with backup compression.