A while back, Jes asked who’s taking your backups. Making sure you have good backups is important. How much thought are you giving to handling historical backups? Right now, there’s a really good chance that you’re using a solution based on tape. While tape backups work, there’s a better way.
How Are You Archiving Backups Right Now?
Sending backups to tape isn’t the easiest process. For SQL Server, the process looks something like this: SQL Server backs up the database, the backup files are copied from a central location to a tape, on a regular schedule an administrator takes tapes out of the backup machine and sends them to an off-site facility. Doesn’t that sound like fun?
In addition to requiring that tapes need to be added and remove from a tape robot, magnetic tape also has the distinct disadvantage of requiring careful storage and handling to prevent damage to the storage media. There has to be a better way.
Offloading Backup Archives to the Cloud
Durable off-site storage is a must for a lot of businesses and when you don’t have requirements for physical media, I can’t think of a better option than using Amazon S3. Many companies are already making use of Amazon S3 to house durable off-site backups of data. S3 has the advantage of being durable and relatively highly available – the S3 SLA guarantees ten 9s of durability and four 9s of availability. For this privilege, we pay a pittance (between $0.05 and $0.13 per GB per month). And, let’s face it, that’s a cheap price to pay for being able to expand your archive capabilities on demand.
Amazon Glacier is a relatively new, low cost, durable storage solution. It looks a lot like S3 but has a distinct price advantage – Glacier costs $0.01 per GB per month. Glacier is built with long term storage in mind – storage is incredibly cheap but retrieval takes longer and costs more. When you need to retrieve data from Glacier you issue a request and Amazon will notify you when the data is available to download. Typically this takes a few hours, but it’s faster than getting tapes returned from off-site storage.
Automating the Archive Lifecycle
Until recently, putting data into Glacier required that administrators or developers create a set of scripts to push data into Glacier from S3 as it aged out. While this works, it’s still a manual step – if something happens to the server driving the data movement data won’t be copied. Earlier this week, Amazon announced support for automatic archiving into Glacier through lifecycle rules.
Lifecycle rules make it easy to automatically move files into Glacier based on a prefix and a relative or absolute timestamp. It’s easy to create groups of groups of backups and archive them on a daily basis. Rules can be even use to expire the files once they’ve been in Glacier for a fixed amount of time. Some businesses are required to keep backups, source data, or even older versions of the code base for a period of time – marking files for expiration makes it easy to comply with internal and external regulations.
Data lifecycle rules sound like they’re going to be painful to create, right? Thankfully, it’s incredibly easy to put one together. There’s only one step. In this example, files with a name beginning in “archive” will be archived to Glacier after 15 days and deleted from Glacier after 180 days.
What Does AWS Glacier Mean For Your Backups?
It probably doesn’t mean anything right now if you aren’t already looking at using AWS. The combination of S3 and Glacier gives DBAs and system administrators another set of options for keeping backups for long periods of time. Automating data motion removes the fallibility of human processes and physical media from the equation. It’s worth considering how you can improve your backup retention, reliability, and recoverability by automating storage of backups using S3 and Glacier.