We all hope that we’re going to succeed beyond our wildest expectations. Startups long for multi-billion dollar IPOs or scaling to hundreds, or even thousands, of servers. Every hosting provider is touting how their new cloud offering will help us scale up to unheard of heights. I’ve built things up and torn them down a few times over my career
Build it to Break
Everything you make is going to break, plan for it.
Whenever possible, design the individual layers of an application to operate independently and redundantly. Start with two of everything – web servers, application servers, even database servers. Once you realize that everything can and will fail, you’ll be a lot happier with your environment, especially when something goes wrong. Well designed applications are built to fail. Architects accept that failure is inevitable. It’s not something that we want to consider, but it’s something that we have to consider.
Distributed architecture patterns help move workloads out across many autonomous servers. Load balancers and web farms help us manage failure at the application server level. In the database world, we can manage failure with clustering, mirroring, and read-only replicas. Everything computer doesn’t have to be duplicated, but we have to be aware of what can fail and how we respond.
Everything is a Feature
As Jeff Atwood has famously said, performance is a feature. The main thrust of Jeff’s article is that making an application fast is a decision that you make during development. Along the same lines, it’s a conscious decision to make an application fault tolerant.
Every decision that has a trade off. Viewing the entire application as a series of trade offs leads to a better understanding about how the application will function in the real world. The difference between being able to scale up and being able to scale out can often come down to understanding key decisions that were made early on.
Scale Out, Not Up
This isn’t as axiomatic as it sounds. Consider this: cloud computing like Azure and AWS is at its most flexible when we can dynamically add servers in response to demand. To effectively scale out means that we need also to be able to scale back in.
Adding additional capacity is usually in the application tier; just add more servers. What happens when we need to scale the database? The current trend is to buy a faster server with faster disks and more memory. This process keeps repeating itself. Hopefully your demand for new servers will continue at a pace that is less than or equal to the pace of innovation. There are other problems with scaling up. As performance increases, hardware gets more expensive for smaller and smaller gains. The difference in cost between the fastest CPU and second fastest CPU is much larger than the performance gained – scaling up often comes at a tremendous cost.
The flip side to scaling up is scaling out. In a scale out environment, extra commodity servers are added to handle additional capacity. One of the easiest ways to manage scaling out the database is to use read-only replica servers to provide scale out reads. Writes are handled on a master server because scaling out writes can get painful. But what if you need to scale out writes? Thankfully, there are many techniques available to horizontally scaling the database layer – features can be broken into distinct data silos, metadata is replicated between all servers while line of business data is sharded, or automated techniques like SQL Azure’s federations can be used.
The most important thing to keep in mind is that it’s just as important to be able to contract as it is to expand. As a business grows it’s easiest to keep purchasing additional servers in response to load. Purchasing more hardware is faster and usually cheaper than tuning code. Once the application reaches a maturity level, it’s important to tune the application to run on fewer resources. Less hardware equates to less maintenance. Less hardware means less cost. Nobody wants to face the other possibility, too – the business may shrink. A user base may erode. A business’s ability to respond to changing costs can be the difference between a successful medium size business and a failed large business.
Buy More Storage
In addition to scaling out your servers, scale out your storage. If you have the opportunity to buy a few huge disks or a large number of small, fast disks give serious thought to buying the small, fast disks. A large number of small, fast drives is going to be able to rapidly respond to I/O requests. More disks working in concert means that less data will need to be read off of each disk.
The trick here is that modern databases are capable of spreading a workload across multiple database files and multiple disks. If multiple files/disks/spindles/logical drives are involved in a query, then it’s possible to read data from disk even faster than if only one very large disk were involved. The principle of scaling out vs. scaling up applies even at the level of scaling your storage – more disks are typically going to be faster than large disks.
You’re Going to Do It Wrong
No matter how smart or experienced your team is, be prepared to make mistakes. There are very few hard and fast implementation guidelines about scaling the business. Be prepared to rapidly iterate through multiple ideas before finding the right mix of techniques and technologies that work well. You may get it right on the first try. It may take a number of attempts to get it right. But, in every case, be prepared to revisit ideas.
On that note, be prepared to re-write the core of your application as you scale. Twitter was originally built with Ruby on Rails. Over time they implemented different parts of the application with different tools. Twitter’s willingness to re-write core components of their infrastructure led them to their current levels of success.
Don’t be afraid to change everything.