Backup speed isn’t the sexiest thing DBAs spend time on during the day. It’s kind of boring.
Backups don’t complain that they’re not running fast enough. Users, those are the ones who complain: “My query’s not fast enough. Why can’t I do a cross join between data warehouse tables?” The squeaky wheel gets the lube, so we bend the users over the – wait, where was I going with that?
Backup speed also worsens very gradually over time, taking longer and longer to complete each night. DBAs assume the time increase is caused by more database data, but you know what they say about assume: there’s no lube involved. What if the time has nothing to do with the database size?
In the next couple of posts, I’ll explore some backup bottlenecks I found when doing a Backup Health Check recently for a company with a couple dozen SQL Servers. Today, put your lifejackets on, because we’re in for…
The Perfect MSDB Storm
Any one or two of these things wouldn’t have been a problem, but combine them all together and we had a disaster on our hands:
- SQL Server 2000 – which by itself isn’t a big deal.
- MSDB was on the C drive – which in this case, happened to be a very slow pair of ancient drives. The rest of the server was on a pretty quick SAN with RAID 10 arrays for data and logs, but MSDB was still stuck.
- The server had over 100 databases – again, no big deal.
- Transaction log backups ran every 15 minutes – of all 100 databases. That’s a lot of backup history data pouring into MSDB.
- MSDB cleanup jobs weren’t set up initially – which meant that weeks of backup history data started turning into months, and then into years.
- MSDB had grown to 3gb – on SQL Server 2000, this is a huge problem because it’s not indexed well.
Combine these factors, and we had The Perfect MSDB Storm. I couldn’t add MSDB cleanup jobs at this point because they couldn’t finish in any reasonable time. Search the web for problems like this, and you’ll find people in forums beating their chests and wailing for lifejackets. I tried a couple of stored procs and an ordered-delete trick, but doing these deletes meant holding a lock on the MSDB backup tables. The business wasn’t comfortable not being able to back up while I ran these deletes.
Kids – don’t try this at home. We’re what you call experts. We’ve got years of experience that keeps us safe.
- I scripted out the foreign key constraints on the relevant MSDB tables (backupset, backupfile, backupmediaset, etc)
- I disabled the backup jobs and waited for them to stop
- I dropped the constraints
- I truncated the tables
- I created the foreign key constraints again
- I enabled the backup jobs
The good news: all the backup history was instantly gone without waiting for slow logged deletions.
The bad news: all the backup history was instantly gone. This wasn’t a problem for this client because they used Quest LiteSpeed, and it has its own backup history repository anyway. (Shameless product plug: when your server goes down, what good does a local MSDB backup history do you anyway? LiteSpeed’s repository is centralized on another server, so you can take action on restoring backups faster.) This is not a great solution, and I wouldn’t post the code here for that reason, but it did work.
The bottom line: backup times dropped by 2/3! The nightly backups had been taking two hours, but a whopping 90 minutes of that was spent just updating MSDB tables with the backup history. They now take under 40 minutes. This certainly isn’t a typical result, but take a minute to make really sure that you’re cleaning up your MSDB history regularly.