Answering questions about performance is almost becoming a catechism: “How can I make my database faster?” Use a caching server. “Which queries should I cache?” All of them. A lot of people stop there. They implement a cache in their application using some kind of in-memory hash table and call it a day. There’s more to it than that, of course. Not all caching solutions are cut from the same cloth.

Back to Basics: Memcached

Memcached is a big hash table: it’s a key/value store that lives entirely in RAM. Memcached has a few basic commands that correspond to the basic CRUD commands in any database but in memcached CRUD works by finding data based on a specific primary key – you can’t search through RAM. Like many key/value stores, memcached doesn’t have any ability to query based on the data that is stored in the value portion of the key/value pair.

What’s more, the memcached server has no native support for high availability features – it is just a cache. If the cache server goes down, it goes down. Losing a cache server might not sound terrible, but if your application depends on a fast cache for performance then losing a cache server can cause a critical performance problem.

What’s the solution to high availability with memcached? There isn’t an out of the box solution for high availability with memcached. Different drivers can use [consistent hashing][ch] to spread data across multiple memcached servers and there are libraries that support memcached replication between data centers, but none of these features are baked right in to the product. Ultimately it’s up to you to implement this yourself.

The upside of memcached is that it has been in development since 2003, it speaks a well known protocol, and many developers have run into it before. Amazon’s ElastiCache even speaks the memcached protocol; if you move into the cloud, there’s already something waiting for you. The memcached documentation also includes suggestions on ways to cache SQL queries, so your developers will have a leg up on when they start examining caching. There are also plugins for many languages, frameworks, and products – there’s a lot of support for memcached out there.

The Middle Ground: Flexible Tools

No matter which software stack your development team is using, there is going to be at least one caching solution that they can pick up and run with. Not all caching tools are as simple as memcached, others are more feature filled; Microsoft have created AppFabric Cache and developers in the Java world can use Ehcache. These two packages have the concept of availability baked into their core – both Ehcache and AppFabric Cache can cluster right out of the box. There are multiple advantages with this approach: high availability isn’t a bolt-on that depends on a third party library and management becomes much easier.

Since we mainly talk about SQL Server around this place, I’m going to keep talking about AppFabric Cache from here.

Although AppFabric Cache has only been around for a short period of time, it is being used in a number of places. Caching services are available in Azure, there’s an ASP.NET session state provider, and I’ve personally used AppFabric Cache to supply fast paging in large reports.One of the primary advantages of using AppFabric Cache is that it’s very easy for Windows admins to configure and administer – everything is handled using similar tools that administrators are already familiar with.

Like memcached, AppFabric Cache supplies a simple set of APIs to read, write, or delete data. The simplicity of AppFabric Cache makes it a logical choice for developers working with SQL Server and the Microsoft stack. On top of its simplicity, AppFabric Cache adds two features to make developers’ jobs easier. The first feature is cache expiration. Instead of relying on the cache to expire data when it is no longer used, developers can specify a time to live when saving a value to cache. As data is read, the expiration can be refreshed, but if data isn’t read for a long time, it will be marked as expired and will no longer be able to be read. The second feature is high availability – developers can save some values in multiple places at once, ensuring that data will survive the failure of a cache server.

On a feature by feature level, AppFabric Cache provides features that meet the needs of almost every application. It still has one downside – there’s no way to query data in an ad hoc fashion. Every data access is key/value. It’s possible to use different pools in the cache, but those are only separations in a logical sense. If you need to retrieve a range of data, your only option is to build inverted indexes: key/value pairs where the key is the index key (e.g. the state of Oregon) and the value is a list of indexes values (e.g. all zip codes in Oregon).

Fast and Furious: An In Memory Database

In memory databases have several advantages over pure caches. A cache is a simple key/value database and the value is nothing more than a collection of bytes. Databases, however, offer increased functionality – range scans, sorting, strongly typed data, and a host of rich commands.

Redis is a fast in-memory database. At first glance, you might think that Redis is a lot like a a cache – it looks in many ways like a simple in memory key-value store. However, Redis hides a lot of additional power: sorted sets, lists, queues, and replication are all supported features. By combining these features, it becomes possible to use Redis as something more than a cache – it becomes the primary database for fast querying. It’s easy to store user session properties in a hash, a user’s last 50 viewed pages in a list, and any number of objects as simple strings.

Once you start working with an in memory database like Redis, it’s important to start thinking of creative ways to use your database; Redis can be used as far more than a glorified key-value database. Developers are using Redis for realtime metrics, analytics, and Redis has even been called the AK–47 of databases because it’s simple, powerful, and reliable.

Wrapping Up

No matter how you choose to cache, make sure that you start caching in your applications as soon as you can. Yes, it does add extra work for developers to implement caching. But tis’ not as much work as many people would like to think. Careful and judicious use of caching will have immediate benefits for application performance – database CPU, memory, and I/O requirements will decrease and application response times will improve.

Jeremiah Peschka
When I’m not working with databases, you’ll find me at a food truck in Portland, Oregon, or at conferences such as DevLink, Stir Trek, and OSCON. My sessions have been highly rated and I pride myself on their quality.
Jeremiah Peschka on sabtwitterJeremiah Peschka on sablinkedin
↑ Back to top
  1. I was with you up until you wrote, “make sure that you start caching in your applications as soon as you can”. How about “build support for caching into your application, but don’t use it unless you have to”?

    Here’s the thing: caching is absolutely what you need in *some* instances, and it covers up bad design the rest of the time. For example, you mentioned that most caches don’t support querying of the value portion of the data. This is good news, because caching should be used to pull known values that are frequently used out of memory as quickly as possible. The “known values” part is what makes “querying cache” an inappropriate technique. If you consistently need the same collection of data (i.e. you have the same query running repeatedly) then that set of data becomes the value and the query parameters are the cache key. If the query is changing and the set isn’t consistent, then caching the data isn’t the right approach.

    How about this for a proposed solution: Your application should be unaware of where the data is coming from. Using something like a Repository pattern, the application should be able to get and put data in a consistent manner regardless of the repository implementation. This means that your repository can start as a simple SQL data access layer and then evolve to use cache, or be swapped for MongoDB, or even graduate to a full-blown CQRS message-based implementation if need be. Regardless of the approach taken, though, architects/designers/developers/DBAs should always ask “Is this the *right* approach or is this a crutch for a bad design?”

    • Great comments. I think you hit the nail right on the head with your comment about building your application to support caching. Too often I see applications that are tightly coupled to their storage implementation. It’s always tempting to whip through application development and accumulate a bit of technical debt in the data access layer because, hopefully, you’ll be able to go through and clean it up later when it’s time to scale. Unfortunately, that never comes around.

    • But AppFabric is going to be dropped out by Microsoft themselves ? Always getting nothing when trying to search for AppFabric 2.0.
      Don’t force us to use Azure AppFabric please, we want Windows Server AppFabric.

      • I don’t think any one outside of Microsoft could answer roadmap questions about AppFabric Cache. There are a wealth of options available to run a caching server on your own servers. I recommend looking into memcached or Redis – even if AppFabric Cache doesn’t go away, look into them.

  2. Nice article just FYI your twitter/FB links on the bottom are duplicating the host url twice.


  3. Great article, Can you put some comparison of EL caching block vs AppFabric vs memCache

    • I somewhat did over in the article How Much Cache Do You Have. To the best of my knowledge, the EL caching block is much like first level cache in an ORM – it’s local to the current process that’s hosting the consumer of the caching block (IIS or an application server). The EL caching block won’t share state across multiple servers by default, but it appears that it can be extended to use an external caching provider. AppFabric and MemCache are your more traditional second level cache solutions that can be scaled independently of your application and database tier.

  4. “What’s more, the memcached server has no native support for high availability features”

    If you have your cache spread across 200 machines, and 1 goes down then how much is availability affected?

    Remember memcached was developed by people with massive web traffic.

    • I can’t answer your question without knowing your architecture.

      • Are you targeting cache servers on a feature by feature basis?
      • Is a user pinned to a cache server?
      • Is a user’s cache spread out across all of the cache servers?
      • Have you set up high availability in your memcached clients?

      And, lest we forget, memcached was developed to support LiveJournal and take the load off of MySQL circa 2003.

  5. Hi I am looking forward to use Redis. I want some suggestion for offline caching. ie. cache server (redis) will be once in updated the data from the underlying data sources and the application can query cache server to show in front end.

Leave a Reply

Your email address will not be published. Required fields are marked *