Scary SQL Surprises: Crouching Tiger, Hidden Replication

Last Updated 7 years ago

The more SQL Servers I see, the more scared I get. There are a staggering number of ways that your server can sneak around behind you and … well, let’s just say you shouldn’t drop the soap in the datacenter.

As part of my health checks, I’m used to verifying that full backups are done regularly, but that’s not enough.

If your database is in full recovery mode, then it needs transaction log backups to clear out the transaction log. Full backups by themselves don’t free up log space – only transaction log backups do it. (This explains why people are still searching the web for how to do BACKUP LOG WITH TRUNCATE_ONLY in SQL 2008 – they need to clear out log space without waiting around for a ginormous log backup.)

However, I ran into an interesting surprise recently at a client. The full backups were working great, the transaction log backups were working great, so the DBA team thought everything was fine.

Except that we found a 16GB database with a 160GB log file.

After the disbelief and panic died down, we ran a simple query to see the problem:

SELECT name, log_reuse_wait_desc FROM sys.databases

By querying sys.databases, we can get all kinds of nifty information about database status. The log_reuse_wait_desc column shows why SQL Server can’t reuse the existing space in the transaction log and must continue growing it out – in our case, to 10x the data size. Some possibilities include:

NOTHING – you’re in good shape
LOG_BACKUP – you need to take a log backup (and preferably schedule it to happen regularly in a way that matches your recovery point objective)
DATABASE_MIRRORING, REPLICATION, and AVAILABILITY_REPLICA – one of your mirrors or replicas is down, and SQL Server is holding on to log data until the replica comes back up

In our case, we saw REPLICATION – which was quite a surprise since the company wasn’t using any SQL Server replication – or so they thought. Turned out that somebody had played around with replication once and tore it down, but didn’t rip out all of the parts. SQL Server was patiently waiting for a dead server to come back online and catch up.

Whoops.

The Moral of the Story

Just because you’ve got the right jobs in place and the jobs are running successfully doesn’t mean they’re doing everything you expect. Yes, transaction log backups are supposed to let SQL Server reuse log space, but there are a lot of prerequisites for that to work.

The more you monitor, the more you can prevent these kinds of problems. The answer isn’t to build your own monitoring tool and continue to add checks like this one by one AFTER they’ve jumped you in the shower. The answer is to reuse wheels that other people have already invented that have extensive checks built in, and learn from other peoples’ mistakes.

Who’s taking your backups? No, really – who’s taking your backups?

Write a Killer Technical Resume in Five Steps

22 Comments. Leave new

Paul Olson
August 30, 2012 9:59 am

Thanks Brent
What centralize monitoring solution have you been impressed with, if cost is not a road block?

Reply
Brad
August 30, 2012 10:13 am

Looking forward to hearing your session at Red Gate SQL in the City [Seattle] in November.

Reply
David
August 30, 2012 10:39 am

Ha! I encountered the exact same issue only a couple months back. A 2GB database, regular log file backups and 90GB odd log file!

The customer had changed or removed mirroring from the database a long time ago, but the database was still in a state that it thought it needed to keep them around.

Reply
Guillaume
August 31, 2012 2:01 am

It sounds like my workplace 😀

Reply
Ayman El-Ghazali
September 4, 2012 9:17 am

I wish I had enough time to do this type of monitoring at work. Unfortunately, I end up with multiple hats many times and we don’t allot enough time for this type of work. I love being proactive, I’ll just keep pushing and hopefully things will change.

Oh yeah, nice post and hilarious. One of my new worst fears now is dropping that bar of soap.

Reply
Lynn Nguyen
September 4, 2012 10:13 am

Thanks Brent. I ran the above query and most of my databases returned “NOTHING” except two of them showed “LOG_BACKUP” in the log_reuse-wait_desc column. I just don’t know why since their log files are being backup every two hours. Any thoughts? Thanks.

Reply
- Brent Ozar
  September 4, 2012 10:16 am
  
  Lynn – that means there’s data in the transaction log that needs to be backed up. The log file might be a little on the small side.
  
  Reply
Lynn Nguyen
September 4, 2012 10:21 am

Thank you so much Brent. That explains it. I reran the query and they all returned “NOTHING” now.

Reply
Josh
October 22, 2013 4:15 pm

I realized that I have a handful of databases where log_reuse_wait_desc=LOG_BACKUP. So they need a LOG backup. But I have all user databases doing LOG backups multiple times a day. But these databases only actually yield FULL and DIFF backups.

So to try to fix: after setting to SIMPLE then back to FULL mode, I am yet unable to get a LOG backup to run. It was say it is successful, but nothing actually gets written. I use the ola hallengren method of backup. Any help on this would be greatly appreciated. Thanks.

Reply
- Brent Ozar
  October 22, 2013 4:18 pm
  
  Hi Josh. I can’t tell exactly what you mean by “but these databases only actually yield FULL and DIFF backups”. This is kind of beyond what I can troubleshoot in a blog post comment. You may consider posting it on http://DBA.StackExchange.com.
  
  Reply
  - Josh
    October 22, 2013 4:24 pm
    
    Sorry, I meant to say that I can successfully get FULL and Differential backups to run, yielding actual backup files, but am unable to get LOG backup files to do the same. Most of the databases are doing all three right on time, no issues, but these few will not actually write out a LOG backup. I will see what I can get at stackexchange as well. Thanks for your time.
    
    Reply
    - Ayman El-Ghazali
      October 22, 2013 4:45 pm
      
      Wow I’m still subscribed to this post 🙂
      
      You know what I would try Josh? Take your full and restore it on a Development server. Try your log backups there. If they don’t work, try to detach and reattach the database without the log file. Then try running log backups again. Could be a bad log file. I’ve had a similar situation and that helped me. Not exactly the same and please do it in a dev environment 🙂
      
      Reply
      - Josh
        October 22, 2013 5:14 pm
        
        Ayman,
        
        Thank you so much for the reply. I just tried both suggestions. I detached and reattached without the LDF and the problem remained.
        
        I then took a recent full backup file and restored it to a dev server and the problem existed there as well.
        
        I then went to double check that the backup was in FULL recovery mode and it wasn’t. Switched it to FULL from SIMPLE and then it worked on the dev side.
        
        Back to production, turns out the db was back in SIMPLE mode as well so I changed it to FULL. But it still won’t do a LOG backup.
        
        I’m trying not to be a burden on the community, honestly. I am just puzzled. Thanks.
Ayman El-Ghazali
October 22, 2013 5:28 pm

you need to do a full backup after changing the recovery model before doing the log backup.

you’re not burdening anyone the community enjoys the challenges of helping others that’s how we grow stronger.

Reply
- Josh
  October 22, 2013 5:31 pm
  
  I failed to mention that in each of the scenarios, I did a full backup prior to attempting the LOG backup.
  
  So, I took this full backup from dev and (after taking backup precautions with the original database) replaced it with the ‘working’ one from dev. Checked to see if it was FULL recovery mode. Yes it was. Made a full backup. Tried to do a LOG backup, same problem. :/
  
  Reply
  - Ayman El-Ghazali
    October 22, 2013 6:09 pm
    
    Well darn, I thought you may have had a bad log file with an easy fix 🙁
    
    I would take a step back and approach this like a doctor; maybe because my Dad is one and I work for a medical society?! No need to put on a lab coat like Brent does in his webinars 🙂 Just look at the symptoms, are they occurring with other DBs on the same server? Check the SQL Logs, Windows Logs, anything interesting in there? I know in the stack exchange post you mentioned that you used Ola’s robust logging, but just be certain and check the SQL/Windows Logs.
    
    Also, what changed the recovery model to simple? That’s a big huge deal to be honest. Looks like you were working under the assumption that it is under the Full Recovery model, who has access to change that? I had an issue where DBs were set to Auto_Close and Auto_Shrink by a vendor, put a policy in to turn that option off on all DBs on a daily basis so they couldn’t do it again. Do you have a gremlin in the system that is playing with your settings?
    
    I’m going to follow the discussion over at stack exchange, very interesting issue.
    
    Reply
Josh
October 22, 2013 5:44 pm

Here is the link to the same issue I cross posted to StackExchange: http://dba.stackexchange.com/questions/52005/how-to-fix-log-reuse-wait-desc-log-backup-log-backup-not-possible
if that helps as well.

Reply
Josh
October 24, 2013 11:32 am

UPDATE: I figured it out!

Because the ola hallengren scripts check for log shipping roles and skip log backups if it finds the current db in either

msdb.dbo.log_shipping_primary_databases

or

msdb.dbo.log_shipping_secondary_databases

AND because the databases I have been fighting with recently DID have old log shipping entries that were not proiperly cleaned out, this script was not able to create the LOG backups. Now that that is all cleaned up, we appear to be good to go.

Many thanks to all who took the time to chime in! -Josh

Reply
- Ayman El-Ghazali
  October 24, 2013 11:48 am
  
  But why would the regular non-Ola Hallengren backups not work? They don’t check for those roles by default do they?
  
  Reply
  - Josh
    October 24, 2013 1:42 pm
    
    Apparently they do. I also saw no way to use a parameter to change it.
    
    Reply
Richard Armstrong-Finnerty
June 15, 2016 9:35 am

Good article and liked the title. Reminds me of a round of the BBC’s I’m Sorry I Haven’t A Clue, when the panel where asked to adapt a movie title to describe handymen at one’s home. My fave was from the ingenious Barry Cryer: Crouching Plumber, Hidden Charges.

Reply