Answering questions about performance is almost becoming a catechism: “How can I make my database faster?” Use a caching server. “Which queries should I cache?” All of them. A lot of people stop there. They implement a cache in their application using some kind of in-memory hash table and call it a day. There’s more to it than that, of course. Not all caching solutions are cut from the same cloth.
Back to Basics: Memcached
Memcached is a big hash table: it’s a key/value store that lives entirely in RAM. Memcached has a few basic commands that correspond to the basic CRUD commands in any database but in memcached CRUD works by finding data based on a specific primary key – you can’t search through RAM. Like many key/value stores, memcached doesn’t have any ability to query based on the data that is stored in the value portion of the key/value pair.
What’s more, the memcached server has no native support for high availability features – it is just a cache. If the cache server goes down, it goes down. Losing a cache server might not sound terrible, but if your application depends on a fast cache for performance then losing a cache server can cause a critical performance problem.
What’s the solution to high availability with memcached? There isn’t an out of the box solution for high availability with memcached. Different drivers can use [consistent hashing][ch] to spread data across multiple memcached servers and there are libraries that support memcached replication between data centers, but none of these features are baked right in to the product. Ultimately it’s up to you to implement this yourself.
The upside of memcached is that it has been in development since 2003, it speaks a well known protocol, and many developers have run into it before. Amazon’s ElastiCache even speaks the memcached protocol; if you move into the cloud, there’s already something waiting for you. The memcached documentation also includes suggestions on ways to cache SQL queries, so your developers will have a leg up on when they start examining caching. There are also plugins for many languages, frameworks, and products – there’s a lot of support for memcached out there.
The Middle Ground: Flexible Tools
No matter which software stack your development team is using, there is going to be at least one caching solution that they can pick up and run with. Not all caching tools are as simple as memcached, others are more feature filled; Microsoft have created AppFabric Cache and developers in the Java world can use Ehcache. These two packages have the concept of availability baked into their core – both Ehcache and AppFabric Cache can cluster right out of the box. There are multiple advantages with this approach: high availability isn’t a bolt-on that depends on a third party library and management becomes much easier.
Since we mainly talk about SQL Server around this place, I’m going to keep talking about AppFabric Cache from here.
Although AppFabric Cache has only been around for a short period of time, it is being used in a number of places. Caching services are available in Azure, there’s an ASP.NET session state provider, and I’ve personally used AppFabric Cache to supply fast paging in large reports.One of the primary advantages of using AppFabric Cache is that it’s very easy for Windows admins to configure and administer – everything is handled using similar tools that administrators are already familiar with.
Like memcached, AppFabric Cache supplies a simple set of APIs to read, write, or delete data. The simplicity of AppFabric Cache makes it a logical choice for developers working with SQL Server and the Microsoft stack. On top of its simplicity, AppFabric Cache adds two features to make developers’ jobs easier. The first feature is cache expiration. Instead of relying on the cache to expire data when it is no longer used, developers can specify a time to live when saving a value to cache. As data is read, the expiration can be refreshed, but if data isn’t read for a long time, it will be marked as expired and will no longer be able to be read. The second feature is high availability – developers can save some values in multiple places at once, ensuring that data will survive the failure of a cache server.
On a feature by feature level, AppFabric Cache provides features that meet the needs of almost every application. It still has one downside – there’s no way to query data in an ad hoc fashion. Every data access is key/value. It’s possible to use different pools in the cache, but those are only separations in a logical sense. If you need to retrieve a range of data, your only option is to build inverted indexes: key/value pairs where the key is the index key (e.g. the state of Oregon) and the value is a list of indexes values (e.g. all zip codes in Oregon).
Fast and Furious: An In Memory Database
In memory databases have several advantages over pure caches. A cache is a simple key/value database and the value is nothing more than a collection of bytes. Databases, however, offer increased functionality – range scans, sorting, strongly typed data, and a host of rich commands.
Redis is a fast in-memory database. At first glance, you might think that Redis is a lot like a a cache – it looks in many ways like a simple in memory key-value store. However, Redis hides a lot of additional power: sorted sets, lists, queues, and replication are all supported features. By combining these features, it becomes possible to use Redis as something more than a cache – it becomes the primary database for fast querying. It’s easy to store user session properties in a hash, a user’s last 50 viewed pages in a list, and any number of objects as simple strings.
Once you start working with an in memory database like Redis, it’s important to start thinking of creative ways to use your database; Redis can be used as far more than a glorified key-value database. Developers are using Redis for realtime metrics, analytics, and Redis has even been called the AK–47 of databases because it’s simple, powerful, and reliable.
No matter how you choose to cache, make sure that you start caching in your applications as soon as you can. Yes, it does add extra work for developers to implement caching. But tis’ not as much work as many people would like to think. Careful and judicious use of caching will have immediate benefits for application performance – database CPU, memory, and I/O requirements will decrease and application response times will improve.