Long Term Backup Storage With Amazon Glacier

A while back, Jes asked who’s taking your backups. Making sure you have good backups is important. How much thought are you giving to handling historical backups? Right now, there’s a really good chance that you’re using a solution based on tape. While tape backups work, there’s a better way.

How Are You Archiving Backups Right Now?

Sending backups to tape isn’t the easiest process. For SQL Server, the process looks something like this: SQL Server backs up the database, the backup files are copied from a central location to a tape, on a regular schedule an administrator takes tapes out of the backup machine and sends them to an off-site facility. Doesn’t that sound like fun?

In addition to requiring that tapes need to be added and remove from a tape robot, magnetic tape also has the distinct disadvantage of requiring careful storage and handling to prevent damage to the storage media. There has to be a better way.

Offloading Backup Archives to the Cloud

Durable off-site storage is a must for a lot of businesses and when you don’t have requirements for physical media, I can’t think of a better option than using Amazon S3. Many companies are already making use of Amazon S3 to house durable off-site backups of data. S3 has the advantage of being durable and relatively highly available – the S3 SLA guarantees ten 9s of durability and four 9s of availability. For this privilege, we pay a pittance (between $0.05 and $0.13 per GB per month). And, let’s face it, that’s a cheap price to pay for being able to expand your archive capabilities on demand.

Amazon Glacier is a relatively new, low cost, durable storage solution. It looks a lot like S3 but has a distinct price advantage – Glacier costs $0.01 per GB per month. Glacier is built with long term storage in mind – storage is incredibly cheap but retrieval takes longer and costs more. When you need to retrieve data from Glacier you issue a request and Amazon will notify you when the data is available to download. Typically this takes a few hours, but it’s faster than getting tapes returned from off-site storage.

Automating the Archive Lifecycle

Until recently, putting data into Glacier required that administrators or developers create a set of scripts to push data into Glacier from S3 as it aged out. While this works, it’s still a manual step – if something happens to the server driving the data movement data won’t be copied. Earlier this week, Amazon announced support for automatic archiving into Glacier through lifecycle rules.

Lifecycle rules make it easy to automatically move files into Glacier based on a prefix and a relative or absolute timestamp. It’s easy to create groups of groups of backups and archive them on a daily basis. Rules can be even use to expire the files once they’ve been in Glacier for a fixed amount of time. Some businesses are required to keep backups, source data, or even older versions of the code base for a period of time – marking files for expiration makes it easy to comply with internal and external regulations.

Data lifecycle rules sound like they’re going to be painful to create, right? Thankfully, it’s incredibly easy to put one together. There’s only one step. In this example, files with a name beginning in “archive” will be archived to Glacier after 15 days and deleted from Glacier after 180 days.

Creating a Data Lifecycle Rule
Creating a Data Lifecycle Rule

What Does AWS Glacier Mean For Your Backups?

It probably doesn’t mean anything right now if you aren’t already looking at using AWS. The combination of S3 and Glacier gives DBAs and system administrators another set of options for keeping backups for long periods of time. Automating data motion removes the fallibility of human processes and physical media from the equation. It’s worth considering how you can improve your backup retention, reliability, and recoverability by automating storage of backups using S3 and Glacier.

Learn more about our SQL Server in Amazon cloud resources page.

Previous Post
Introducing the SQL Server Plan Cache (and a Better sp_Blitz®)
Next Post
How to Set SQL Server Max Memory for VMware

6 Comments. Leave new

  • This sounds pretty cool. Do you have to use Glacier in conjunction with S3? Or can you use it as a stand-alone service?

    • Jeremiah Peschka
      November 20, 2012 10:59 am

      You can use Glacier on its own as well. Stand alone access to Glacier is through the API – as long as you’re comfortable with a programming language you can build your own tools to push data into Glacier. Now, admittedly, I just told you to go blank yourself, but there are some examples in the AWS Glacier documentation.

  • Until there’s something better than the 1986 ECPA governing law enforcement access to online data, I’m going to be hesitant to utilize any such cloud storage service for anything sensitive or proprietary. The fact that law enforcement can go to Amazon and say, “We want access to XYZ company’s glacier backups older than 180 days because it’s necessary for an ongoing criminal investigation,” and they don’t even need a warrant gives me pause (http://www.americanbar.org/publications/law_practice_today_home/law_practice_today_archive/march12/competing-interests-enforcing-cyber-security-and-protecting-privacy.html). Also, even when there is a warrant, when you see what is going on with the MegaUpload case and the possibility of a bad faith situation between DHS and the FBI (http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10849627), I would be increasingly hesitant.

    I know this makes me sound like an old guy telling kids to get off my lawn, but at least with offsite backups to tape you have the option of encrypting via your method first, meaning the storage company can’t decrypt without your assistance.

    • Jeremiah Peschka
      November 26, 2012 8:35 am

      Nothing says you shouldn’t encrypt your back ups before they leave your facilities. S3 can encrypt using AES at the file system level, but it is ultimately to the consumer to determine what level of security is necessary for their backups. The technology is there, it’s up to us to find ways to use it effectively.

  • I’ve been thinking about cloud storage for backups (not just SQL) and particularly Glacier however my concern would be recovery time.

    If I’m storing 4TB of data in Glacier and a disaster occurs, how long is it going to take me to recover that 4TB over my 10Mb Internet connection? WolframAlpha tells me 37 days, which is unacceptable.

    Is it only companies with 100Mb pipes considering cloud storage for backups?

    • Jeremiah Peschka
      March 22, 2013 4:38 pm

      I like to think of using cloud storage as a replacement for/augmentation of off-site tape backup systems. You shouldn’t use either one as your primary backup system, but they can both serve as a way to augment and enhance on-site backup techniques.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.