Backup speed isn’t the sexiest thing DBAs spend time on during the day. It’s kind of boring.
Backups don’t complain that they’re not running fast enough. Users, those are the ones who complain: “My query’s not fast enough. Why can’t I do a cross join between data warehouse tables?” The squeaky wheel gets the lube, so we bend the users over the – wait, where was I going with that?
Backup speed also worsens very gradually over time, taking longer and longer to complete each night. DBAs assume the time increase is caused by more database data, but you know what they say about assume: there’s no lube involved. What if the time has nothing to do with the database size?
In the next couple of posts, I’ll explore some backup bottlenecks I found when doing a Backup Health Check recently for a company with a couple dozen SQL Servers. Today, put your lifejackets on, because we’re in for…
The Perfect MSDB Storm
Any one or two of these things wouldn’t have been a problem, but combine them all together and we had a disaster on our hands:
- SQL Server 2000 – which by itself isn’t a big deal.
- MSDB was on the C drive – which in this case, happened to be a very slow pair of ancient drives. The rest of the server was on a pretty quick SAN with RAID 10 arrays for data and logs, but MSDB was still stuck.
- The server had over 100 databases – again, no big deal.
- Transaction log backups ran every 15 minutes – of all 100 databases. That’s a lot of backup history data pouring into MSDB.
- MSDB cleanup jobs weren’t set up initially – which meant that weeks of backup history data started turning into months, and then into years.
- MSDB had grown to 3gb – on SQL Server 2000, this is a huge problem because it’s not indexed well.
Combine these factors, and we had The Perfect MSDB Storm. I couldn’t add MSDB cleanup jobs at this point because they couldn’t finish in any reasonable time. Search the web for problems like this, and you’ll find people in forums beating their chests and wailing for lifejackets. I tried a couple of stored procs and an ordered-delete trick, but doing these deletes meant holding a lock on the MSDB backup tables. The business wasn’t comfortable not being able to back up while I ran these deletes.
Kids – don’t try this at home. We’re what you call experts. We’ve got years of experience that keeps us safe.
- I scripted out the foreign key constraints on the relevant MSDB tables (backupset, backupfile, backupmediaset, etc)
- I disabled the backup jobs and waited for them to stop
- I dropped the constraints
- I truncated the tables
- I created the foreign key constraints again
- I enabled the backup jobs
The good news: all the backup history was instantly gone without waiting for slow logged deletions.
The bad news: all the backup history was instantly gone. This wasn’t a problem for this client because they used Quest LiteSpeed, and it has its own backup history repository anyway. (Shameless product plug: when your server goes down, what good does a local MSDB backup history do you anyway? LiteSpeed’s repository is centralized on another server, so you can take action on restoring backups faster.) This is not a great solution, and I wouldn’t post the code here for that reason, but it did work.
The bottom line: backup times dropped by 2/3! The nightly backups had been taking two hours, but a whopping 90 minutes of that was spent just updating MSDB tables with the backup history. They now take under 40 minutes. This certainly isn’t a typical result, but take a minute to make really sure that you’re cleaning up your MSDB history regularly.


A very interesting and informative read Brent, thanks!
One question if I may, probably because I’m missing something here but anyway, how did you identify that the MSDB database itself was causing slower execution of the backup jobs? Was it a case of deduction, so to speak, as a result of the combination of all the points you detailed or was there something specific?
Great question. In this particular case, I could tell because the DBAs tried to implement MSDB cleanup jobs and they wreaked havoc. If that hadn’t happened, I could have identified it by doing an sp_who2 during the backup process, by running Profiler to catch the slow statements, or looking at wait stats to see where the holdups were. If the holdups weren’t backups during the backup window, then I could look to see what the slow statements were.
Very useful to know, thanks!
Quest Litespeed. Prayhap it’s time I get me one of those!
Great job as always Brent. When it’s time for me to blog I always read your stuff to get me ramped up!
Ha! Thank you, sir.
I was interested in preserving some history and ended up going with deletes, but tried to optimize as much as possible. This is what I came up with: delbackuphist
Mr Ozar, once again you have wowed me with some crazy DBA knowledge. Thank you for sharing your knowledge with us all. Especially the part where you told us how you knew to look at this. Good times.
Jeremiah out!
“on SQL Server 2000, this is a huge problem because it’s not indexed well.”
Wow! Thank god I’ve never forgotten to setup my history cleanup tasks!
Brent,
Geoff Hiten had a good post on this topic a while back, including adding some suggested performance indexes for msdb. I wonder if having these or other indexes would help make the msdb maintenance process take less time – especially for SQL 2000?
http://weblogs.sqlteam.com/geoffh/archive/2008/01/21/MSDB-Performance-Tuning.aspx
Scott R.
I ran into this problem too. We had over 7 years of uncleared backup history in MS SQL 2000, 160 databases, and transaction log backups every 15 min. The major symptom was that right clicking a database to view properties in Enterprise Manager or SSMS would lock the application for 4-9 minutes.