Back up every database on the production server. No exceptions.

SQL Server
29 Comments

Before our consulting engagements start, we send clients a couple of queries to run. One of them outputs a list of databases, and the last backup date for each.

You wouldn’t believe how often this query produces surprises like:

  • “Wait – your script must be wrong. I swear we’re backing things up. We’ll fix that.”
  • “Whoa – it turns out we only run full backups monthly, and differentials weekly, but if we add a new database in the middle of the month, it doesn’t get backed up for weeks. We’ll fix that.”
  • “It’s okay – we don’t need to back up that database because it doesn’t have anything useful.”

The last one is actually the scariest one because often we discover real production data in databases that aren’t supposed to be used.

For example, at one client, there were two databases: AppV3 and AppV2. AppV2 hadn’t been backed up in months, and the architect assured me it was a really old version of the app that was just kept around for checking data every now and then.

However, as we started the engagement, we discovered that the login queries were still hitting AppV2 and storing data in there about user profiles. If they’d have lost that database, their users would have been destroyed, and their app would be down.

If the database is important enough to live on the production database server, then it needs to be considered a production-quality database, and it needs to get backed up. Full stop. Don’t gamble that no one is putting data in it.

Previous Post
How Do You Manage DBAs? Part 2: Training Budgets
Next Post
Team Photos from the Company Retreat

29 Comments. Leave new

  • I also like the, “We can reload it from the source,” argument for not taking a backup. The cases where that’s easier and faster than restoring are few and far between.

    Reply
  • I’m in the habit of taking databases offline that should not be used. That makes callers fail and makes noise.

    Reply
  • Customer: No it’s only a dev/testserver, we do not need any backups.

    Me: Ok, is anybody using it?

    Customer: Sure, we got 8 developers working with it every day.

    Me: What is 8 devs x 8 Hours x “number of days lost” in $?

    Customer: Oooh yeaaa ok….

    Reply
    • And of course they don’t have source control of the DB, hahaha.

      Reply
      • Haha no no this is the wild west of development!

        Reply
      • Oh, sometimes they have a “backup” of the dev database – in production. 🙂
        In fact that’s exceedingly common – developers set up the dev DB server, don’t bother to tell anyone, and it goes unmanaged or mismanaged for five or ten years. Then the server hardware fails, is obsolete and thus impossible or unreasonably expensive to repair, and the dev team has to reconstruct from production. They’ve lost days or weeks of in-flight work, details of which they don’t even remember because … no source control. And the chaos only begins there because there’s no excuse for missing a deadline.
        But they don’t need their DBA “messing with their stuff,” they can Google how to set up the SQL server. 🙂

        Reply
  • @Tobi – Did that exact thing fairly recently. Decommissioning old Dev server. Took a good bunch of databases offline that the dev team said we don’t know those they must be old. Within an hour massive outcry! Those databases were quite important and very very live!

    I have put Olas scripts in place to catch any new databases that get added which is a good failsafe measure

    Reply
    • We are going through this process at the moment, and fervently wishing previous devs and dbas had kept an up to date inventory of servers and databases to lessen some of the ‘will something important fall over if we offline this old mystery database?’ worries

      Reply
    • Yes, Hallengren’s script is great!

      Reply
  • If I had a dollar for every database I was told was not essential that caused almost immediate heartbreak when removed, I could be easily retired by now.

    I learned a long time ago to make those backups for my benefit!

    Whether they know about them is inconsequential. I always made a backup prior to dropping and stored it where I could easily retrieve it. I would age those out after a year or so with no inquiries.

    Reply
  • Says something about my penchant for pedantry that I would lodge an exception or two to this rule…tempdb and resource, for obvious reasons. 🙂

    The gutting horror of finding yourself missing backups when you actually do need them is terrifying enough that I think people benefit from having multiple independent scheduled checks of this…for example having a third party app that does a check on backups being up to date as well as a homebrewed job to do roughly the same thing albeit in a slightly different way. Monitoring systems can fail, just like drives can fail, so having a Redundant Array of Monitoring Solutions can be just as useful as a RAID array for covering yourself.

    Reply
  • And, by extension, clean up and remove databases that are not in use. At some point, unused databases, orphaned users and unused logins will be considered vulnerabilities and you’ll have to clean them up anyway. Much easier to clean them up when they are replaced.

    Reply
  • The first thing I was taught as a junior DBA is that you are only as good as your last FULL backup. Make sure you have all databases backed up daily on every SQL server and the database backup files are archived to a different server or tape from the physical SQL server. The second item I learned is you need to test the ability to restore a database from the backup files on a regular basis; I routinely implement processes that copy production databases down to development servers using backup files and restore the database backup files to the development servers weekly.

    Reply
    • Maybe it’s better to say that you are only as good as your last successful test restore of the last FULL backup?

      Reply
  • I read articles like this from time to time and still find it somewhat shocking and amusing at the same time; I don’t know why I’m still surprised after all my years of working with clients when hear about this… I just started with a new client last week who also has customers world-wide and what did I find – no backups anywhere. Again, I was surprised to find this and yet, it is so easy to remedy.

    Reply
  • Because somebody is always new to the game.

    Reply
  • This may not be the place to put this but I will ask the question anyways. What do you guys feel is a good backup schedule? Right now by default I do Full Backup Monthly that runs at the end of the month and I am keeping that for 6 months. I do a Full Backup Weekly and keep that for 2 months. I do a differential backup every day and keep that for a month. I also do a incremental(tlog) backup every hour. I have some databases that get used more than others that run incrementals every 15 minutes but every hour seems good so far. I keep those backups for 2 weeks. Do think this is a good schedule or not? Please the chart below for a better understanding:

    Full-Monthly = end of month = keep for 6 months
    Full-Weekly = End of week = keep for 2 months
    Diff-Nightly = Every day = Keep for 1 month
    Inc-Hourly = Every hour = keep for 2 weeks

    Reply
    • Kendra Little
      August 20, 2015 4:21 pm

      Hey Joe,

      Brent’s onsite somewhere, it may be a while for him to respond. He wrote this post which can help you out: https://www.brentozar.com/archive/2011/12/letters-that-get-dbas-fired/

      RPO and RTO should drive backup type, frequency, where you store it, and how long you keep it. The thing that jumps out at me first is that many databases will require far more frequent full and transaction log backups than you’re listing, both to prevent data loss and shorten restore time.

      Reply
  • Yep, this is usually the first thing I look at when I have a new job or client. I’ve found some pretty scary things, like production databases hosted in a managed service provider’s datacentre that hadn’t been backed up in years, or backups stored on the same disks as the databases and only retained for 2 days due to lack of space.

    Reply
  • We make every database server backup not just production. Developers seem to protest that (which is odd considering it has no impact to them), but paradoxically if their server died and we said, “Just make a new one” they’d be mad.

    The elephant in the room though are Infrastructure tools that backup databases. I can’t tell you the number of times DPM has stopped backing up and nobody looks at the constant error messages being spewed out (because they’re desensitised to error messages from Windows right?) And it needs to be reconfigured whenever the recovery model changes.

    Same with other solutions like Veeam. Enabled on a server in full recovery mode and never taking a log backup because Infrastructure simply don’t understand the impact.

    Same with NetBackup and database to tape backups stop running; they just don’t notice or don’t care. Same with SAN disk snapshots locking up the entire disk and causing apps and databases and clusters to fail.

    This is not to mention poor DBAs who create a maintenance plan and just walk away never to check it again. AlwaysOn fails over to the other node with no plan? Your log backups stop because they didn’t get it.

    Every time it’s someone other than a competent DBA professing that backups will be their responsibility it always, always, inevitably, every time, ends in disaster. Which is why I state things very straightforward up front and get it in writing – except of course they always try to weasel out later and everyone ends up looking bad except them (somehow).

    Reply
  • @Chris
    Our “We can reload it from the source,” database (500 GB+) went down a while ago. The last backup was months old. Although no actual customer data was lost, it was definitely not a happy feeling. It is now being backed up, though not often.

    Reply
  • We have a staging database that is used to load data from the ETLs and then insert into the client database. We don’t do a “backup” of the database, because in theory, all the tables are SUPPOSED to be truncated at the end of the ETL. If the database were to disappear, we would want to retain the table structure, but not the data in each table. So, we created a script that runs weekly that will script out each table to a single file, and then they’re all put in a zip file. We also have another process through a DML trigger that will capture the time that a table is dropped from this database. This way, if a table is accidentally dropped, we can find out when it was dropped and look through the zip files (with the date in the file name) and find the script to recreate the table.

    This is our only database that we do this with. All other databases (and servers) have a full backup done weekly, differential done daily, and no transaction log backup. For most databases, we do a restore to a reporting server after the full backup. Some databases to a restore to the reporting server after every backup. This way we are checking out backups on a weekly basis. On the reporting server, we do not backup the databases that are restored from the production server. I don’t really see a point to that since they are restored on a weekly (or sometimes daily) basis.

    Reply
  • Charlie Smith
    August 21, 2015 4:21 pm

    Actually there is a corollary to this with respect to mirrored databases. If one DB on a production server is important enough to mirror, all are. You always find dependencies between the databases on a server. The longer they have been there, the more dependencies there will be.

    Reply
  • @John Hill Why not just back it up after you have truncated all the tables? Backup file will be small, then, and I think easier to restore than “from scripts”; also, to my mind, less chance of something falling through the cracks that scripting might miss

    “Back up every database on the production server. No exceptions.”

    🙂

    Our backup tools skip any database named “RESTORE_%”. We only ever make temporary restores to RESTORE_xxx and we definitely don’t want them to be backed up and hog precious disk space. I suppose one day someone will create a critical database called RESTORE_LOG …

    Any other newly created database is added to the backup routine, within a few minutes, with 10 minute log backup frequency (if Full Recovery Model). If it needs less often then DBA has to, deliberately, change the backup routine’s metadata for that DB.

    Reply
  • Cody,

    The O’Hal scripts for database maintenance handle the issues with mirroring and availability groups. They can detect the active node, and if you have a network sharepoint for the backups to reside, there is no duplication of backups either.

    Reply
  • One more corollary: Make sure that ALL of your SQL backup files are being backed up to long-term storage (tape or disk-to-disk). And don’t take the admins’ word for it. Ask to see the configuration so you can understand them and feel comfortable defending them.

    I was in a situation like this a number of years ago. The network admin responsible for the tape backups didn’t think that differential and transaction log backups and weekend full backups were worth keeping, so they quietly excluded them from their tape backups without telling anyone. Unfortunately, I found out about this only after a Sunday evening failure that required a restore, and the newest viable backup available to me was from early Friday morning. (Our policy at the time also required that older SQL backup files be deleted from disk as soon as newer ones were created.) Luckily, the data from those three days could be recreated for this particular database, but in many of our other databases, three days’ data loss would have been catastrophic for the business. Needless to say, the tape backup policies were changed the next day.

    Reply
    • @Ed

      I envy you if your infrastructure staff are the least bit willing to cooperate. In some places seeing both outsourced and unsourced backups repeatedly lost, the respective managers always found ways to bluff past it and divert blame back to the DBAs who had hard documentation of repeatedly pointing it out for literally months prior and trying to escalate to management (who don’t want to “fight” each other by demanding proper service).

      Repeated polite requests (even with management approval) ended with, “You’re being aggressive and not a team player.” A coworker DBA was even let go. Pretty sad! I made sure to leave probably the best referral I’ve ever written on his LinkedIn account for doing the right thing.

      Sadly it’s the DBA which always suffers in the end despite whatever logic and documentation trails he/she has to prove it. I mean in dysfunctional companies anyway ?

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.