Ran across a funny situation in my test lab, and it’s only funny because it was my test lab.
The bad news: backups started failing a few days ago. One of the databases had a filegroup that wasn’t online, and as my maintenance plan looped through the list of databases, it died when it couldn’t back up that database. Unfortunately, it was going in alphabetical order, and that database started with a B.
The good news: the separate cleanup jobs still worked great. They were dutifully cleaning out any backups older than a few days.
The worst news: database mail had failed – of course, a few days before the backups started failing. My DNS servers in my lab had decided to take the week off, so email wasn’t making it out of my lab. I didn’t get notified about the backup jobs that started to fail.
The cleanup jobs worked better than the backup jobs, and the right hand didn’t know what the left hand was doing. In this particular case, the left hand had been amputated at the wrist. In a perfect world – or at least, in a world where my job depended on this data – the maintenance plan jobs would be interconnected so that they wouldn’t delete backup files if the backup job failed. That perfect world would not be my server lab.
No real data was harmed in the making of this blog post, but times like this remind me of just how hard it is to be a good database administrator, and how easy it is to lose data. Have you tested your restores lately? Do you really think you’ve got something more important to do?