When people say “cloud”, they’re simplifying a lot of different solutions into a single catchphrase. Let’s break out the different options and compare them.
1. SQL Server in Amazon EC2 and Azure VMs
Amazon EC2 is a virtualization platform. Amazon buys servers, installs their secret sauce software, and rents you Windows virtual machines by the hour. Microsoft offers a similar product, Windows Azure Virtual Machines, that just went officially live.
You can rent a blank Windows VM without SQL Server installed, and then install SQL Server yourself just like you would on-premise. That’s a licensing mess, though – you have to use your existing SQL Server licenses or buy new ones for your VMs. That doesn’t make much financial sense. Instead, Amazon and Microsoft will rent you a Windows VM with SQL Server already configured, and your hourly fee includes the SQL Server licensing.
SQL Server runs just as it would in your own datacenter, which means you can use this as a disaster recovery option for your on-premise SQL Servers. You can do log shipping or database mirroring up to SQL in the cloud, running in Amazon EC2 or Microsoft Azure. When disaster strikes, fail over to your EC2/MS infrastructure, and you’re off and running.
The term “cloud” conjures up images of easy scalability and redundancy, but that’s not really the case here. We’re talking about a single virtual machine. This isn’t much different from running SQL Server in a VMware or Hyper-V guest in your own datacenter or in a colo datacenter. You can use all your traditional tools and techniques to manage SQL Server, which is both a pro and a con. If you need to patch it or scale out to multiple servers, there’s no tools included here. I still consider this the cloud, though, because the infrastructure and licensing are managed by somebody else. It’s easy to get started with one – or a hundred – virtual SQL Servers with no initial investment.
This method is the most conventional, and as I explain other options, I’m going to move from conventional to really-far-out-there. SQL in EC2 or MS VMs just works – it’s easy to understand and leverage without changing your code or your techniques – but it doesn’t bring a lot of the cloud’s benefits.
2. Amazon RDS for SQL Server
Instead of running SQL Server in an EC2 virtual machine, let’s start giving up a little bit of control in order to get more of cloud benefits. The next layer of clouds is Amazon Relational Database Service (RDS). Here, Amazon builds a Windows VM, installs SQL Server, configures it, and manages both Windows and the SQL Server service for you.
This is still the real, true blue SQL Server you know and love – all of your commands still work exactly the same as you’re used to, as long as you don’t try to access the server’s local drives directly. (Example: you can’t upload a flat file to the server’s C drive and then try to BCP data from that file into SQL Server.)
Amazon RDS is kinda like they’re the DBA, and you’re a very, very powerful developer. You can create and drop databases and users, but you can’t Remote Desktop into the SQL Server, nor can you access the drives.
Amazon RDS takes a few job duties away from you:
- Storage management – want faster storage? Just pick (and pay for) more IOPs. There’s no arguing with the SAN guy.
- Monitoring – Amazon CloudWatch tracks performance metrics and sends you emails when they’re outside of your thresholds.
- Patching – You pick the major/minor versions you want and when you want patches applied.
- Backups – You pick a time window for the full backups, and Amazon manages it using storage snapshots. You can restore from snapshots just by pointing and clicking in the management console.
But herein lies the first real compromise: you can’t restore from anything except snapshots. You can’t upload your own database backup file and restore it. To get data into Amazon RDS, you’ll want to export it to a file, upload that file to Amazon S3 (cloud-based file storage), and then import it. This also means you can’t use Amazon RDS as a participant in log shipping or database mirroring.
Microsoft doesn’t have a competitor to Amazon RDS for SQL Server today. Well, I say that, but some shops already manage their SQL Servers this way – they have an internal admin team that manages Windows and SQL. Departments get access to create & drop databases, change code, etc, but they don’t get access to the server’s desktop or backups. This doesn’t really compete with Amazon RDS, though – RDS is for companies who are too small to have this kind of internal engineering infrastructure. (Or for companies that want to get rid of this large engineering burden, I suppose.)
3. VMware vFabric Data Director
vFabric Data Director is a lot like running Amazon RDS in your own datacenter, but you can control the base Windows virtual machines. You build a Windows template to be used by default whenever a new SQL Server is created. VMware vFabric manages the implementation details for backups and high availability.
vFabric also supports Oracle, PostgreSQL, and Hadoop for a single pane of glass to create and manage your database servers. When someone in your company wants a new database instance, the sysadmins open up vFabric, configure it, and within a few minutes, it’s up and running.
vFabric makes sense for 100% virtualization shops who aren’t interested in moving their databases up to the cloud, but they want easier database management integrated into their virtualization tools.
Microsoft doesn’t have a competitor to VMware vFabric Data Director today. In theory, you could build your own alternative using System Center and a whole lotta scripting. That is left as an exercise for the reader.
4. Microsoft Windows Azure SQL Database (WASD)
The artist formerly known as SQL Azure takes the least conventional approach of all. While it’s technically built on Microsoft SQL Server, that’s like saying your pants are built on your underpants. They both cover your back end, but you can’t use them interchangeably.
Microsoft essentially built a new product designed for common database storage requirements. Like your underpants, you only get a minimum of feature and command coverage here. The new features and commands Microsoft has been adding to the boxed product for the last couple of versions just aren’t available in WASD including partitioning, Resource Governor, Service Broker, and CLR. But hey – are you really using those anyway? Most folks aren’t.
Rather than building large databases, WASD encourages developers to shard out their data across multiple smaller databases. While on-premise SQL Servers have had techniques to do this in the past, Microsoft started over and developed a new technique that makes more sense for cloud implementations. Again, though – we’re talking about a difference from the boxed product, something else that developers have to learn differently. As we’ve gone up this ladder into the clouds, we’ve been handling our problems differently. WASD’s partitioning technique is a good reminder that once you’ve gone this far up into the cloud, you’re dealing with something very different from SQL Server. You’re not going to take large volumes of code written for SQL Server and simply point them at WASD – you’re going to be doing a lot of testing and code changes.
Like Amazon RDS, there’s no backup/restore functionality here to get your existing data into the cloud. You’ll be exporting your data to a file, loading it into the cloud, and then…staying there. There’s no database mirroring or log shipping to/from Azure SQL Databases to on-premise SQL Servers.
Which One is Right for You?
If you’ve got an existing app, and you just want to cut costs without changing code, you can get started today with any of the first three options. They support the same T-SQL commands and datatypes you already know and love.
If you’ve got an existing app, and you’re willing to make code changes & do testing, you might be able to save even more plus gain new flexibility by going to Windows Azure SQL Database. In most cases, though, the cost savings won’t come anywhere near the costs required for the code changes and code review. We haven’t seen a case yet where the tradeoff made sense for our clients. I’m sure it’s out there – the perfect example would be a small amount of code that the developers know very well, can easily modify, and produces large server loads in short bursts.
If you’re building a new app from scratch, then let’s take a step back and survey the database industry as a whole. This is an incredible time to be in databases and there’s a bunch of really good options we didn’t even cover here.
People bring me in when they’re having data problems. They can’t store data fast enough, they can’t make it reliable enough, they can’t hire people to manage it, etc. When I’m in the conference room, it’s because there’s a fire in the
You know how it is as a DBA, though – DBA means Default Blame Acceptor. Everybody thinks the fire started in the database, but often it’s the SAN, VMware, crappy code, bad third party apps, or any number of combustible materials. The company gets more and more concerned about the growing smoke, and they ask the DBAs, “Who’s a SQL Server expert you can call to put this fire out?” The DBA thinks about my crazy blog posts and blurts my name out – mostly because he wants to find out if I’ll show up in the Richard Simmons costume. (That costs extra.)
Now put yourself in my shoes: I show up in a conference room or on a WebEx, and there’s a huge problem somewhere in the infrastructure. Everybody involved is pointing fingers at each other, and they’re all armed with volumes of reports proving that it’s not their problem. In a matter of 3-4 days, I need to:
- Find the real root cause of the problem
- Prove it to everyone involved using their own language
- Show a few possible solutions and recommend the right one
- Convince them to implement it as quickly as possible
SQL Server isn’t the only fire extinguisher, and I have to know how to put out data fires with other tools. Amazon’s got a ridiculous list of services that are easy to get started with, including:
- Relational Database Service – point, click, and deploy Microsoft SQL Server, Oracle, and MySQL instances. Amazon manages the backups, patching, and security. The MySQL ones even support readable replicas and replication to multiple datacenters.
- DynamoDB – super-fast NoSQL database hosted on SSDs. You pick how fast you want it to go, and Amazon makes it happen.
- Glacier – store your backups in the cloud for $.01 per gigabyte per month with no cost for incoming data.
- Import/Export – ship them a USB drive, and they’ll hook it up to the cloud. For folks with slow upload links, this is the fastest way to move your data online.
That’s why I’m in Dallas, Texas for a few days attending Amazon Web Services Architect Training. It’s a three-day design session that covers how to design solutions with their services. It’s not going to make me a Certified Master of Cloudiness across their broad range of tools, but that’s not the point. Clients don’t usually want me to do the work myself: they want me to find the right answer fast, get the staff on the right page, and let the staff knock out the work together.
If you’re a data professional, and you’re frustrated when people keep saying it’s a database problem when it’s not, what are you doing to bridge the gap? Are you frustrated that The Other Guy doesn’t know anything about SQL Server? Or are you reaching out to learn The Other Guy’s technology to help him to see where the smoke is coming from?