In the 1950s, the global economy saw a tremendous change – container ships revolutionized global commerce. Shipping costs got 36 times cheaper with the introduction of containerization. What if you could reduce operational costs and revolutionize application and database deployment in the same way?
Containers to the Future
In the last few months, the developer world has been excited about docker. Docker is a container system. Basically, it’s an easy way to do application deployments. Rather than deploy an entire VM, Docker lets developers deploy a consistent stack – the developers can set up individual services with independent configurations and deploy them.
Sounds like a contained database, right?
The best part about these containers is that, if you do it right, you might not need an OS under the covers. Or, if you do, it doesn’t matter which one! The container hold the configuration it needs to run. It doesn’t matter if you deploy on Linux, Solaris, or (maybe) even Windows – as long as that container knows what to do, you’re good.
With the database, should you really care which edition of SQL Server you’re running on as long as it supports the features you need?
Surely My Storage Is Safe From This Madness!
The storage game is rapidly changing. Not that long ago, traditional storage vendors required that you buy expensive and proprietary big systems. That part of the industry was disrupted by modular technology like the Dell EqualLogic. As storage has become cheaper, it has returned into the chassis in specialized servers; you can easily cram 24TB of SSD storage into a server with gear you can buy off the shelf.
Large scale storage is getting even more accessible: Seagate have announced their Kinetic Cloud Storage. It’s storage that uses an API, rather than an operating system. The drives feature new technology and Ethernet connectivity. This makes it possible for application developers to interact with the drives using an API rather than through a storage controller, cache tier, and storage tiering.
The idea might sound crazy, but third party libraries already exist that take advantage of this technology. Come launch time, developers will have drive simulators at their disposal, more libraries, and you’ll have a better idea of what this costs. You’ll be able to put 2.4 petabytes into a full rack of hardware. All without buying a complex storage appliance.
How Much Farther Can It Go?
Think about it:
- Applications can be deployed as containers.
- Storage will soon be deployed as containers.
- Databases are on their way to being deployed as containers.
Just as the cargo container had far reaching implications for the shipping industry, the software container has far reaching implications for our industry. The operating system and even the database platform become raw materials that are used to support the applications deployed on them.
What Does It Mean?
What’s it all mean for IT professionals? It means that the times, they are a changin’. If the application can be deployed as a container, there’s likely to be a decrease in managing the complexity of production applications. Once you can treat storage as an API, you change how storage administrators interact with storage. Over the next few years, containerization is going to change the way you manage the IT infrastructure.
Right now, someone in your company is thinking about moving your operations up into the cloud. If your company is like 95% of companies out there, “the cloud” is really a synonym for Amazon Web Services (AWS). Thinking about a move to AWS is happening across many companies, and it’s not something that you need to be scared of – nobody is going to lose their job. However, it’s important to understand what you and your company can do to make this transition successful, painless, and an opportunity for growth.
How fast is your application right now? Do you know when you’ve hit peak load or even which metrics signify peak load?
Having a performance baseline is the only way to make sure that changes are helping. A general feeling that things are getting better isn’t enough, you need to know. During our engagements with customers, we evaluate how applications are performing today. We gather metrics around disk and CPU performance and look for signs of existing and potential bottlenecks. Once we know where things are, it’s easy to figure out where things are going.
As you’re looking to move to AWS, take a look at your current performance baseline. Are you happy with those numbers? Things probably won’t get better after the move; virtualization has overhead and everything in AWS is virtualized. Once you’ve got a baseline of your performance, it’s easy to test your theories about performance and make decisions about how aggressively you need to tune your application to meet your promises to your customers.
Service Level Agreements
SLAs are so important that near the beginning of every engagement I ask, “Do you have an SLA with your customers? What about internal SLAs between your teams?”
An SLA can be as simple as responding to outages within a certain window or as complex as measuring the performance of specific actions in the application. Having an SLA in place early can make it easy to evaluate your ability to move up to AWS and still maintain acceptable performance. If you know that a specific set of web pages must return faster than an average of 250ms during peak load, you have an easy measurement to determine if performance is acceptable.
Distilling performance down to a set of metrics makes it easy to make decisions, spot problems, and design for the future. In addition to a performance baseline, SLAs also give guarantees. If query performance drifts outside of agreed upon norms, the SLA can describe who is going to work on improving performance and how quickly that work gets scheduled.
SLAs aren’t just performance related – they relate to how fast you can bring a system online in the event of an outage, how you will respond to potential performance issues, and what the shape of that response looks like. Everyone aspires to five 9s of uptime, but what’s important is how you handle unforeseen outages.
An SLA shouldn’t be a detailed, painful document that resembles of a software license. A well designed SLA will serve to drive customer interaction and push teams to take responsibility for making the application perform well.
What does your monitoring environment look like today? If you have monitoring in place, I suspect that you’re looking for problems as they’re happening. To maintain good performance in AWS, you need look for performance problems before they occur.
You can use your existing monitoring, or you can use Amazon’s CloudWatch to create alerts around trends in resource utilization. If you know your performance baselines, you can configure alerts to notify you when things are out of the ordinary. If CPU utilization has never gone higher than 55%, even under peak load, it’s helpful to set up an alarm to fire when CPU utilization has been higher than 60% over a period of 15 minutes.
It’s better to be aware of potential problems than to respond to a fire. Make sure performance warnings are different from alerts about actual problems, but also make sure that you’re doing something about warnings as they arrive. Being proactive about monitoring performance does more than help you keep things running smoothly; proactive monitoring gives you insight into where you can tune your applications and make things better in the long run.
What are your thoughts around AWS? What’s the big picture? Having a vision around AWS is going to be critical to a successful transition.
A vision around AWS can’t be a simple statement like “We’re going to move our operations into the cloud and save a lot of money.” While saving money is a nice goal, the vision around AWS needs to be something more than fork lifting your existing infrastructure into the cloud.
Having a strong vision about how AWS can help your organization meets its goals is critical. This doesn’t have to be highly detailed and include specific features, but make sure that the vision includes an understanding about how you will be deploying and refining your application over time. Moving to AWS is not a quick fix for any problem. Your vision needs to include what today looks like, what your goals look like, and how you’ll be working toward them over the next 3, 6, 9, and 12 months.
Taking full advantage of the rich feature set in AWS takes time, and your vision should reflect how you will make the move and monitor application behavior in order to make good decisions about direction and functionality.
Most important is buy in. Everyone involved needs to buy into the idea that a move to AWS makes sense for the organization. This may be the hardest item to accomplish, but it’s worth making sure that your team is on the same page.
Having a team that’s accepting of the move will make the transition easier. Part of this acceptance is the realization that things will not be the same as they were before. Gone are the days of giant back end servers with hundreds of gigabytes of memory and multiple network cards. Performance is no longer a purchase order away. Your team needs to accept these statements as facts; you can’t easily upgrade your way out of performance problems in AWS.
Instead of buying their way out of problems, the team needs to be committed to investigating new ways to solve their problems. With AWS it’s simple to design a rapid prototype and direct load to the new prototype system. It’s important that everyone is on board with testing changes at any level – from a single function to the entire infrastructure. In AWS these sweeping changes are easy to do, but you need buy in from everyone involved that things don’t need to be the same as they were before.
Drifting Away Into Cloudy Success
If you don’t have all of these factors in place, are you going to fail? Probably not. These are distinct traits that I’ve found in companies who have successfully moved their infrastructure into AWS. The more of these that a company has possessed, the happier I’ve found them to be with AWS. As you’re considering a move into AWS remember that there are more than technical challenges to moving into AWS – there are human and organizational challenges that need to be met in order to ensure success.
We’re happy to announce that we’ve released CorrugatedIron 1.4.0. This release contains a large number of bug fixes, new features, and enhancements. That’s right, brand new Riak and .NET straight to you! Get it while it’s hot or keep reading for more details.
Riak 1.4 Support
CorrugatedIron 1.4.0 brings Riak 1.4.0 feature support including:
- Index pagination & continuations
- Returning index terms with keys
- Better streaming support (now supported by Riak!)
- Query timeout support
- Resetting bucket properties
These new features make it even easier for developers to create rich, full featured applications with Riak and .NET.
CorrugatedIron maintains backwards compatibility with Riak 1.3. Although a large amount of functionality has changed with Riak 1.4, we still allow clients to communicate with the previous version of Riak. In most cases, no code will need to change. Wherever possible, we’ve made sure to let you know in comments or via warnings that code is Riak 1.4 only. Admittedly, you should be able to figure that out – the API is new and you aren’t using it yet.
While we were building support for new Riak 1.4 features, we found a number of places to smooth out the API. We’ve added fluent setters for a number of properties, removed methods marked for deletion, stopped you from storing
nulls in indices (thanks to evanmcc for helping with that one), and host of other syntactic consistency issues.
CorrugatedIron should be much more consistent across the entire API. I’m sure there are more places for us to iron out issues, but we’re aiming to keep improving the API. Speaking of which, there’s now XML documentation for the vast majority of public methods and objects. Within the next few releases, we’ll continue to add additional documentation to the public facing classes so that your IntelliSense is even more useful.
Oh, and all of our unit and integration tests pass.
Get it now!
This also marks our first attempt at locking major and minor version with Riak – throughout the development of Riak 1.4, we’ll keep our major and minor version pegged alongside Riak’s. As Riak development shifts to the next release, so will we. This should make it easier to track which Riak features and functionality are supported by CorrugatedIron.
Until recently, database benchmarks have been performed by vendors in carefully controlled labs or by engineers at companies reporting on application specific workloads. TPC benchmarks, like TPC-H, provide metrics about the number of queries per hour and a cost per query per hour. While these results can give us a guess about the total cost to implement a system, they have no bearing on the cost to operate a system. How much will it cost to maintain a high throughput system?
UC Berkeley’s AMPLab has provided a benchmark that makes it easier to compare both performance and cost of different database solutions. The AMPLab Big Data Benchmark provides a benchmark for several large scale analytic frameworks. Most importantly – it’s possible for anyone to reproduce these benchmarks using the tools provided by AMPLab.
Performance AnalysisFirst and foremost, the AMPLab benchmark provides a performance analysis of four products – Amazon Redshift, Hive, Shark, and Impala. Several query types are used to provide a general view of analytic framework performance. Not all frameworks are implemented in the same way, and providing a broad set of queries makes it possible for users to evaluate how a workload might perform in production.
Several exploratory queries, an aggregate, an aggregate with joins, and custom UDFs are tested at several sizes and with several variations. If these aren’t representative of a given workload, it’s possible to extend the benchmark framework to include representative queries on the sample data set. Ambitious teams could even go so far as to point the benchmark their own data to discover which product provides the most benefit.
A great deal of flexibility is available to let teams benchmark potential solutions in Amazon Web Services – different servers and data sets can be repeatably tested and evaluated before settling on a platform.
Typically, benchmarks are based on hardware, different database engines are compared on the same hardware. Other benchmarks are based on performance: how much performance (based on an arbitrary metric) can be eked out of any set of hardware.
Neither approach addresses the real concern of many businesses: cost.
Interestingly, the AMPLab benchmark is not based on hardware configuration. Instead of fixing on specific hardware types, the AMPLab benchmark is based on a cost metric. All systems were created in Amazon Web Service making it easy to compare cost based on published instance costs. In the case of the initial AMPLab benchmark, the systems cost $8.20 per hour (the Amazon Redshift system cost $8.50 per hour).
This is important for the simple reason that we now have a far more important way to compare the performance of different databases. For $8.20 – $8.50 an hour, on the workloads tested, I can make an easy decision about how I should perform my data analysis.
Taking the AMPLab benchmark a step further, we can customize the benchmark and how our workloads will perform at different cost levels. If you’ve wondered whether you should use one Hadoop variant or another, SQL Server, or Amazon Redshift for cloud analytics, you can easily find out. For teams already using cloud based analytics frameworks, it’s easy to use these benchmarks to determine how workloads would fare on different systems or with different instance sizes.
The AMPLab benchmark produces results that most people in the RDBMS world would be happy about – Amazon Redshift comes out ahead of the competition. Equally unsurprising, the results are fastest when the entire result set can be coerced into memory. What’s surprising, though, is how well newcomers Shark and Impala perform when stacked up against an MPP database like Redshift. Sure, Redshift is about twice as fast as Shark, but Shark is a new product (the first source code commits occurred on April 23, 2011) and I’m sure we can expect big improvements in the future. This is important, though, because it shows that tools like Shark and Impala complete in the same realm as MPP databases like Redshift, Teradata, and PDW.
What’s It All Mean?
Using the AMPLab benchmark we have an easy tool that lets us compare analytic database performance in a hosted environment. We can perform multiple tests to understand how our workload will perform within different database products and hardware environments. Continued improvements to both the underlying database platforms and the test framework itself should lead to interesting discussions, prototypes, and technology decisions.
When people say “cloud”, they’re simplifying a lot of different solutions into a single catchphrase. Let’s break out the different options and compare them.
1. SQL Server in Amazon EC2 and Azure VMs
Amazon EC2 is a virtualization platform. Amazon buys servers, installs their secret sauce software, and rents you Windows virtual machines by the hour. Microsoft offers a similar product, Windows Azure Virtual Machines, that just went officially live.
You can rent a blank Windows VM without SQL Server installed, and then install SQL Server yourself just like you would on-premise. That’s a licensing mess, though – you have to use your existing SQL Server licenses or buy new ones for your VMs. That doesn’t make much financial sense. Instead, Amazon and Microsoft will rent you a Windows VM with SQL Server already configured, and your hourly fee includes the SQL Server licensing.
SQL Server runs just as it would in your own datacenter, which means you can use this as a disaster recovery option for your on-premise SQL Servers. You can do log shipping or database mirroring up to SQL in the cloud, running in Amazon EC2 or Microsoft Azure. When disaster strikes, fail over to your EC2/MS infrastructure, and you’re off and running.
The term “cloud” conjures up images of easy scalability and redundancy, but that’s not really the case here. We’re talking about a single virtual machine. This isn’t much different from running SQL Server in a VMware or Hyper-V guest in your own datacenter or in a colo datacenter. You can use all your traditional tools and techniques to manage SQL Server, which is both a pro and a con. If you need to patch it or scale out to multiple servers, there’s no tools included here. I still consider this the cloud, though, because the infrastructure and licensing are managed by somebody else. It’s easy to get started with one – or a hundred – virtual SQL Servers with no initial investment.
This method is the most conventional, and as I explain other options, I’m going to move from conventional to really-far-out-there. SQL in EC2 or MS VMs just works – it’s easy to understand and leverage without changing your code or your techniques – but it doesn’t bring a lot of the cloud’s benefits.
2. Amazon RDS for SQL Server
Instead of running SQL Server in an EC2 virtual machine, let’s start giving up a little bit of control in order to get more of cloud benefits. The next layer of clouds is Amazon Relational Database Service (RDS). Here, Amazon builds a Windows VM, installs SQL Server, configures it, and manages both Windows and the SQL Server service for you.
This is still the real, true blue SQL Server you know and love – all of your commands still work exactly the same as you’re used to, as long as you don’t try to access the server’s local drives directly. (Example: you can’t upload a flat file to the server’s C drive and then try to BCP data from that file into SQL Server.)
Amazon RDS is kinda like they’re the DBA, and you’re a very, very powerful developer. You can create and drop databases and users, but you can’t Remote Desktop into the SQL Server, nor can you access the drives.
Amazon RDS takes a few job duties away from you:
- Storage management – want faster storage? Just pick (and pay for) more IOPs. There’s no arguing with the SAN guy.
- Monitoring – Amazon CloudWatch tracks performance metrics and sends you emails when they’re outside of your thresholds.
- Patching – You pick the major/minor versions you want and when you want patches applied.
- Backups – You pick a time window for the full backups, and Amazon manages it using storage snapshots. You can restore from snapshots just by pointing and clicking in the management console.
But herein lies the first real compromise: you can’t restore from anything except snapshots. You can’t upload your own database backup file and restore it. To get data into Amazon RDS, you’ll want to export it to a file, upload that file to Amazon S3 (cloud-based file storage), and then import it. This also means you can’t use Amazon RDS as a participant in log shipping or database mirroring.
Microsoft doesn’t have a competitor to Amazon RDS for SQL Server today. Well, I say that, but some shops already manage their SQL Servers this way – they have an internal admin team that manages Windows and SQL. Departments get access to create & drop databases, change code, etc, but they don’t get access to the server’s desktop or backups. This doesn’t really compete with Amazon RDS, though – RDS is for companies who are too small to have this kind of internal engineering infrastructure. (Or for companies that want to get rid of this large engineering burden, I suppose.)
3. VMware vFabric Data Director
vFabric Data Director is a lot like running Amazon RDS in your own datacenter, but you can control the base Windows virtual machines. You build a Windows template to be used by default whenever a new SQL Server is created. VMware vFabric manages the implementation details for backups and high availability.
vFabric also supports Oracle, PostgreSQL, and Hadoop for a single pane of glass to create and manage your database servers. When someone in your company wants a new database instance, the sysadmins open up vFabric, configure it, and within a few minutes, it’s up and running.
vFabric makes sense for 100% virtualization shops who aren’t interested in moving their databases up to the cloud, but they want easier database management integrated into their virtualization tools.
Microsoft doesn’t have a competitor to VMware vFabric Data Director today. In theory, you could build your own alternative using System Center and a whole lotta scripting. That is left as an exercise for the reader.
4. Microsoft Windows Azure SQL Database (WASD)
The artist formerly known as SQL Azure takes the least conventional approach of all. While it’s technically built on Microsoft SQL Server, that’s like saying your pants are built on your underpants. They both cover your back end, but you can’t use them interchangeably.
Microsoft essentially built a new product designed for common database storage requirements. Like your underpants, you only get a minimum of feature and command coverage here. The new features and commands Microsoft has been adding to the boxed product for the last couple of versions just aren’t available in WASD including partitioning, Resource Governor, Service Broker, and CLR. But hey – are you really using those anyway? Most folks aren’t.
Rather than building large databases, WASD encourages developers to shard out their data across multiple smaller databases. While on-premise SQL Servers have had techniques to do this in the past, Microsoft started over and developed a new technique that makes more sense for cloud implementations. Again, though – we’re talking about a difference from the boxed product, something else that developers have to learn differently. As we’ve gone up this ladder into the clouds, we’ve been handling our problems differently. WASD’s partitioning technique is a good reminder that once you’ve gone this far up into the cloud, you’re dealing with something very different from SQL Server. You’re not going to take large volumes of code written for SQL Server and simply point them at WASD – you’re going to be doing a lot of testing and code changes.
Like Amazon RDS, there’s no backup/restore functionality here to get your existing data into the cloud. You’ll be exporting your data to a file, loading it into the cloud, and then…staying there. There’s no database mirroring or log shipping to/from Azure SQL Databases to on-premise SQL Servers.
Which One is Right for You?
If you’ve got an existing app, and you just want to cut costs without changing code, you can get started today with any of the first three options. They support the same T-SQL commands and datatypes you already know and love.
If you’ve got an existing app, and you’re willing to make code changes & do testing, you might be able to save even more plus gain new flexibility by going to Windows Azure SQL Database. In most cases, though, the cost savings won’t come anywhere near the costs required for the code changes and code review. We haven’t seen a case yet where the tradeoff made sense for our clients. I’m sure it’s out there – the perfect example would be a small amount of code that the developers know very well, can easily modify, and produces large server loads in short bursts.
If you’re building a new app from scratch, then let’s take a step back and survey the database industry as a whole. This is an incredible time to be in databases and there’s a bunch of really good options we didn’t even cover here.
People bring me in when they’re having data problems. They can’t store data fast enough, they can’t make it reliable enough, they can’t hire people to manage it, etc. When I’m in the conference room, it’s because there’s a fire in the
You know how it is as a DBA, though – DBA means Default Blame Acceptor. Everybody thinks the fire started in the database, but often it’s the SAN, VMware, crappy code, bad third party apps, or any number of combustible materials. The company gets more and more concerned about the growing smoke, and they ask the DBAs, “Who’s a SQL Server expert you can call to put this fire out?” The DBA thinks about my crazy blog posts and blurts my name out – mostly because he wants to find out if I’ll show up in the Richard Simmons costume. (That costs extra.)
Now put yourself in my shoes: I show up in a conference room or on a WebEx, and there’s a huge problem somewhere in the infrastructure. Everybody involved is pointing fingers at each other, and they’re all armed with volumes of reports proving that it’s not their problem. In a matter of 3-4 days, I need to:
- Find the real root cause of the problem
- Prove it to everyone involved using their own language
- Show a few possible solutions and recommend the right one
- Convince them to implement it as quickly as possible
SQL Server isn’t the only fire extinguisher, and I have to know how to put out data fires with other tools. Amazon’s got a ridiculous list of services that are easy to get started with, including:
- Relational Database Service – point, click, and deploy Microsoft SQL Server, Oracle, and MySQL instances. Amazon manages the backups, patching, and security. The MySQL ones even support readable replicas and replication to multiple datacenters.
- DynamoDB – super-fast NoSQL database hosted on SSDs. You pick how fast you want it to go, and Amazon makes it happen.
- Glacier – store your backups in the cloud for $.01 per gigabyte per month with no cost for incoming data.
- Import/Export – ship them a USB drive, and they’ll hook it up to the cloud. For folks with slow upload links, this is the fastest way to move your data online.
That’s why I’m in Dallas, Texas for a few days attending Amazon Web Services Architect Training. It’s a three-day design session that covers how to design solutions with their services. It’s not going to make me a Certified Master of Cloudiness across their broad range of tools, but that’s not the point. Clients don’t usually want me to do the work myself: they want me to find the right answer fast, get the staff on the right page, and let the staff knock out the work together.
If you’re a data professional, and you’re frustrated when people keep saying it’s a database problem when it’s not, what are you doing to bridge the gap? Are you frustrated that The Other Guy doesn’t know anything about SQL Server? Or are you reaching out to learn The Other Guy’s technology to help him to see where the smoke is coming from?