The more SQL Servers I see, the more scared I get. There are a staggering number of ways that your server can sneak around behind you and … well, let’s just say you shouldn’t drop the soap in the datacenter.
As part of my health checks, I’m used to verifying that full backups are done regularly, but that’s not enough.
If your database is in full recovery mode, then it needs transaction log backups to clear out the transaction log. Full backups by themselves don’t free up log space – only transaction log backups do it. (This explains why people are still searching the web for how to do BACKUP LOG WITH TRUNCATE_ONLY in SQL 2008 – they need to clear out log space without waiting around for a ginormous log backup.)
However, I ran into an interesting surprise recently at a client. The full backups were working great, the transaction log backups were working great, so the DBA team thought everything was fine.
Except that we found a 16GB database with a 160GB log file.
After the disbelief and panic died down, we ran a simple query to see the problem:
SELECT name, log_reuse_wait_desc FROM sys.databases
By querying sys.databases, we can get all kinds of nifty information about database status. The log_reuse_wait_desc column shows why SQL Server can’t reuse the existing space in the transaction log and must continue growing it out – in our case, to 10x the data size. Some possibilities include:
- NOTHING – you’re in good shape
- LOG_BACKUP – you need to take a log backup (and preferably schedule it to happen regularly in a way that matches your recovery point objective)
- DATABASE_MIRRORING, REPLICATION, and AVAILABILITY_REPLICA – one of your mirrors or replicas is down, and SQL Server is holding on to log data until the replica comes back up
In our case, we saw REPLICATION – which was quite a surprise since the company wasn’t using any SQL Server replication – or so they thought. Turned out that somebody had played around with replication once and tore it down, but didn’t rip out all of the parts. SQL Server was patiently waiting for a dead server to come back online and catch up.
The Moral of the Story
Just because you’ve got the right jobs in place and the jobs are running successfully doesn’t mean they’re doing everything you expect. Yes, transaction log backups are supposed to let SQL Server reuse log space, but there are a lot of prerequisites for that to work.
The more you monitor, the more you can prevent these kinds of problems. The answer isn’t to build your own monitoring tool and continue to add checks like this one by one AFTER they’ve jumped you in the shower. The answer is to reuse wheels that other people have already invented that have extensive checks built in, and learn from other peoples’ mistakes.