Blog

Querying a Key/Value Database – Learning with Riak

On the continuum of databases, key/value databases are the simplest. Data is stored in an opaque value that’s identified by a key. Although key-value databases might seem overly simple, that simplicity hides flexibility. As key/value databases go, Riak provides a solid starting point with a large number of advanced features.

One key, one object... no locks

One key, one object… no locks


Key-Value lookups

Querying a key/value database is simple: you provide the database with a known key and you get back a value. That value can be anything – a PDF, XML, plain text, or the binary representation of an application object.

The beautiful part is that Riak doesn’t care what you’re storing; it just stores bytes.

Although Riak itself only stores a series of bytes on this disk, we have the ability to keep some knowledge of what we’re storing. Metadata is stored with each record; information like the content type, encoding, modification time stamp, and even user supplied metadata. Although Riak doesn’t worry about the value it stores, we can store enough metadata that a program can make the right decisions about the data.

Finding Data

Riak uses buckets to provide a logical namespace for identifiers in the database. You can think of buckets as roughly analogous to tables in an RDBMS. You wouldn’t store orders in the Employees table, likewise you wouldn’t want to put an order in the employees bucket. When we query Riak, we ask a single bucket to give us back a key. If you’re using CorrugatedIron (the .NET client), retrieving data looks something like this:

var response = _client.Get(ApplicationName, ObjectId);
var riakObject = response.Value;
var employee = riakObject.GetObject<Employee>();

One upside of Riak’s key/value approach is that data access speeds are predictable. Reading a 100KB object from disk will take a predictable amount of time: time to locate data (max disk seek) + time to read data (100KB) + time to stream data to client. Without a lock manager or any of the other intricacies of an RDBMS involved, predicting database response time is relatively easy. We don’t need to be as concerned about concurrent requests, either, because there are many active servers in a Riak cluster. The load will be spread among all of them.

Once the data has been retrieved, it’s up to the user to figure out what to do with it. Riak sends back metadata and a stream of bytes; Riak has no knowledge of what to do with those bytes, so it leaves that decision up to the caller. This is very different from what we’re used to with an RDBMS like SQL Server, but there’s an advantage to this approach: we can store data in flexible – as the application changes, we can easily change the way we store data. There’s no need for an all or nothing migration that can prevent features from rolling out.

Migrating Data

With SQL Server, if you want to change the way data is stored on disk, you probably need to re-write all of the data at the same time. This can cause tremendous problems for global businesses – if the fundamental data model behind something changes significantly, migrations can become very complicated.

Remember when I mentioned Riak just stores bytes on the disk? That means if we want to change the way an employee is structured, we can do it by saving the new version of the data. That’s it! From a database perspective, there’s no change; bytes are saved to the disk, bytes are read from the disk. We can slowly change each employee as they’re read, or we could make an upgrade that sweeps across the cluster.

Flexible data schema is a side benefit of a key/value database like Riak. Developers can change the way objects are stored in the application and immediately start saving data in the new format. Migrating data can happen when old versions of data is read:

  • The data’s schema is read from metadata, either in the Riak headers or from within the object.
  • If the schema is correct, keep moving!
  • If the schema is out of date, kick off a process to migrate the data to the new schema.
  • Save data with the new schema back to disk and keep moving!

Flexibility is a virtue of key/value databases – the database is as flexible as the developers are imaginative.

Riak that house!

Riak that house!


Heavy Lifting with Riak

At some point, the business will need functionality more complex than simple key/value lookups. Maybe somebody will want to query a range of data or see an aggregate view of user activity or sales data. With SQL Server, we can easily write a reporting query with multiple joins, aggregations, and group by statements. With Riak, there’s no such feature.

There are two ways to do large reads with Riak – reading multiple keys or writing a MapReduce query.

Before you think I’m crazy about reading multiple keys, hear me out. When you write a query in SQL Server, you’re searching for something based on known attributes of the data – customers who live in Oregon who have one dog. For most line of business applications, it’s even simpler – there is a fixed set of activities that users can perform and adding any new activities corresponds to adding a new feature. Searches are generally limited in scope and don’t give end users exploratory functionality – it’s a rare line of business application that lets users write their own SQL.

Users request data in ways that are predictable; they want to see new messages, get the price of their auctions, or just see what the top news of the day is. We know what they’re going to ask, the behavior of the application is well known. As a result, it’s easy to look at a user’s activity data and make a decision about the last data they saw and then retrieve all newer data. The hardest part of this approach is storing the data in a way that makes sense for the application’s queries.

To answer a question like “What’s new since I was here last?” we can create many buckets and time box them. Although buckets are roughly analogous to SQL Server tables, it’s a very rough analogue; buckets are a logical namespace within Riak. In this example, we could create a bucket per minute for messages and then name our buckets appropriately: messages-2013-03-01T13:00:00Z, messages-2013-03-01T13:01:00Z, messages-2013-03-01T13:02:00Z. Requesting the new messages since a user’s last login becomes as simple as looking at the user’s last login, the current time, and determining which ranges of keys contain the right messages.

Submitting many simultaneous requests to Riak will almost certainly be faster than performing a single MapReduce query, and we’ll get to that in a second. More importantly, we can submit many simultaneous requests to Riak and start displaying information as soon as requests start returning data; there’s no need to wait for the all data to be collected and streamed back to our client.

A great deal of application functionality can be written to take advantage of these patterns. Rather than use data normalization techniques to reduce storage and atomicity, we can write data in ways that fit with our application needs.

What About MapReduce?

MapReduce is on everyone’s lips in the world of NoSQL, distributed databases, and technology punditry. But not all MapReduce is created equal. MapReduce itself refers to a way of thinking about data problems – a function is used to map inputs and then a reducer is applied that performs some kind of aggregation. Riak’s MapReduce operates in this way, too.

At first glance, it would seem like Riak’s MapReduce functionality would be a great way to handle ad hoc requests for data. Unfortunately, it’s not a good option.

Riak’s MapReduce (MR for short) works like this – a request for an MR job is sent to one server (the coordinator). At this point, a large amount of magic happens, but the first map phase is sent to every node in the cluster. The nodes perform the map phase (in part the read every key in the bucket), and then stream the results back to the coordinator. Astute readers might notice some problems with this approach:

  • All data is streamed back to the coordinator (which may run out of memory).
  • Reading all data in the cluster (or even from a large bucket) may cause performance problems as every node in the cluster becomes busy.

MR has some great functionality in it, but you don’t want to use it for ad hoc querying. What you can do is use MR to retrieve multiple objects in one query (just like issuing multiple simultaneous reads), or you can use MR to transform a large amount of data. One of the best uses of MR is to perform a global update on your data – all data in a bucket will be read, manipulated, and then written back to disk.

Let’s say that the business wants to add two more reports to the application. The reports are complex enough that running them on the fly is expensive. You’ve figured out a way that you could do this by pre-aggregating data on write – it’s a lot more work up front but you can do it asynchronously and it means that getting results back to the user is as simple as a single lookup. There’s no work to do computing and aggregating results when they’re read – the work was done when the data hit disk.

There are other uses for MR, but it’s important for new users to realize that Riak’s MapReduce isn’t going to supply the ad hoc querying capabilities that they’re looking for. MR can be combined with other techniques and technology to supply rich query functionality, but plain MR won’t do it.

N.B. As Gerry points out in the comments, it’s possible to perform a Riak Search or secondary index query to narrow down the initial input to a MapReduce job.

Wrapping Up

Key/value databases like Riak provide a simple way to work with large amounts of data. Riak is an excellent starting point for teams looking to augment SQL Server – it provides an easy to use API and can be used to make many workloads easier and faster. Advanced use of Riak may require additional effort from developers and the business alike – it’s necessary to know the queries that will be run and adjust the shape of the data to fit the queries (potentially by storing multiple copies of data), but for many teams this additional work yields great payoffs by providing predictable performance, latency, and scalability.

You Need This One Skill to Succeed in IT

The ability to program in five languages, including one machine-level? Not it.

Project management skills, up to and including a PMP certification? Not that either.

Excellent oral and written communication skills, as noted on every job description ever? That doesn’t hurt, but can be learned.

All of the best IT professionals I have worked with have excellent problem solving skills.

That’s it.

We face problems in IT on a regular basis. From the help desk technicians who are asked, “Why am I locked out of my computer?” to the SAN administrators who have to balance the needs of different servers and workloads to the DBA who is asked, “Why is the server so slow?” we are all given problems to solve.

How we go about solving those problems is what sets the great professionals apart from the good or the average.

Problem Solving Methodology

In high school, I was introduced to the scientific method. The process is:

Does this remind you of your server room?

Does this remind you of your server room?

  1. Formulate a question.
  2. Make a hypothesis.
  3. Make a prediction.
  4. Test the hypothesis. Measurements are emphasized.
  5. Analyze the results.

Can this be applied to your problems? Yes, it can.

Formulate a question – usually, the question is asked of you. “Why is the server slow?” “Why can’t I connect to the database?” “Why is this report execution timing out?”

Make a hypothesis – perhaps a patch was installed on the server or SQL Server the night before. Maybe a network cable could have been unplugged. Maybe a developer changed a line of code in the stored procedure. Make a list of what could affect the system in question, so you have items to test.

Make a prediction – take a guess at what your results will be. If this is an error or problem you’ve encountered before, you’ll probably have a good idea of what to expect. If it’s a new problem, use your past experiences and deductive skills to determine what changes you make will do to the system.

Test the hypothesis – make a plan, make a change, and check if the problem is solved. Don’t make three changes and wonder which one fixed it – fix one at a time. Know what success looks like. If a query is slow, know what performance was before the problem occurred, what performance is when the problem is happening, and what acceptable performance looks like. Metrics are important here. You must be able to measure if things improved, stayed the same, or got worse.

Analyze the results – check those metrics. Did you get the results you expected? If so, is the problem resolved? If not, what is the next item on your list to check? Continue to iterate through your list until the problem is solved.

Anyone Can Do This

It doesn’t require a PhD in computer science. It doesn’t require a master’s degree in chemistry. What it does take is a consistent approach to problems every time. It takes curiosity and an ability to see patterns.

With practice, this becomes much easier. Practice your problem-solving skills often. They will make you a great IT professional, and set you apart from the rest of the crowd.

Cassandra and .NET

Cassandra is a popular NoSQL database – it powers portions of Facebook, Netflix, eBay, and a host of other companies. DataStax, the commercial company behind Cassandra, just released the 1.0 version of their Cassandra driver to a bit of fanfare. There’s connection pooling, LINQ support, load balancing, and automatic failover. This isn’t the first .NET client for Cassandra, but it is the first to support CQL 3 – the new Cassandra protocol. Like many first software steps, this one gets off to a rough start.

The Good Stuff

It’s remarkably easy to get started with the DataStax .NET client. Experienced and novice developers alike can take advantage of the pre-built NuGet package for the driver and immediately add it to their project – it’s ridiculously easy.

Data modeling with Cassandra is different from modeling data for different databases. The C# driver from DataStax makes it easy for developers to easily design and build applications that take advantage of Cassandra’s rich data model.

Take this example (from the github repository):

public class Tweet
{
    [PartitionKey]
    public string author_id;

    [ClusteringKey(0)]
    public Guid tweet_id;

    [SecondaryIndex]        
    public DateTimeOffset date;

    public string body;  
}           

With a few lines of code, developers can create code that will easily persist to Cassandra.

I was able to quickly put a marginally more complex playlist example. The sample creates a playlist using what’s called a wide row. Rather than storing each playlist item in a single row, the playlist data is stored in many columns across the row – there’s one row per playlist, and many columns in each row. It was relatively easy to start from a blank slate, create a data model, create tables in the database, and get started writing code.

The Bad Stuff

Not everything was sunshine and rainbows.

For starters, during a load test the driver started reporting my Cassandra cluster as being unavailable for queries. At this point, the driver itself started throwing exceptions and the load test code ultimately failed. While the load test application was busy failing and consuming 65% of my CPUs, I was still able to use the Cassandra shell, cqlsh, to connect to the Cassandra cluster and query my data. It’s clear that this wasn’t a database issue, but I’m not sure why the driver wouldn’t be able to communicate with the database.

Update: I’ve created a ticket on a performance issue I ran into. You can track it as CSHARP-47.

I was never able to get the DataStax C# driver to connect to anything other than a Cassandra instance running on my localhost. Everything else failed with NoHostAvailableException. I would have chalked this up to networking or VM weirdness, but my set up hasn’t caused any problems during work with Riak through CorrugatedIron, Hadoop, HBase, MongoDB, or SQL Server. There have been a several other reports of similar problems on the mailing list, so I’m pretty confident it isn’t me.

Nothing says "I love you" like a box of locks

Nothing says “I love you” like a box of locks


The Ugly Stuff

Locking. The driver itself uses a lot of locks in multiple places. Locks are most frequently used in heavily threaded code to avoid race conditions. If the driver’s connection pool decides that a connection should be released at the same time that a new process comes in and grabs that same connection there will be undesirable side effects. The downside of locking is that locks can slow down execution.

Update: A ticket exists to remove the heavy use of locks in the concurrent pieces of code, it can be tracked as CSHARP-13.

While I was never able to saturate my system enough to see the problem, several developers are reporting that the driver’s connection pooling is resulting in slow downs under heavy load. Take a look at the increasing 99th percentile latencies in this thread; as the number of concurrent connections increase, so does the latency.

Alternatives

What if you need to connect to Cassandra and you can’t afford to wait for some of the concurrency issues to be fixed? There are a few different routes you can take.

Using a .NET native driver:

Obviously, you could use a driver with a different language. For some teams, this might not be so bad – the developers can give their Haskell skills a whirl.

Or, you could use the magic of IKVM to compile a Java client for the .NET Framework. NativeX didn’t find the functionality they wanted it the .NET drivers, so they used IKVM to compile Hector to run on .NET. They seem quite happy with their solution, or so the slides would lead me to believe.

Conclusion

The DataStax C# driver for Cassandra is a good first release – it has a lot more features than many 1.0 software projects. For teams evaluating Cassandra and .NET, it’s worth giving it a look. There are enough features in place that it will work in many scenarios. Developers needing features that are only available in CQL3 may be willing to work around the driver’s limitations. Otherwise, I’d hold off on deploying into production with the driver – there are enough limiting factors to give me pause.

RICON Recap

RICON. RICON was a distributed systems conference hosted by Basho Technologies and the event sponsors. The event was, hands down, a smashing success.

Yes, the conference was put on by a software company, but this was different than most conferences put on by a vendor. RICON was clearly a conference put on by people who love building distributed systems. All of the talks were given by people building distributed systems in the wild. There were no marketing talks, but there were talks about new features, new ideas, and new products all viewed through the lens of distributed systems.

Both Adron Hall (@adron) and Dan Ostrowski put together good write ups about the talks that they enjoyed. If you want to know more about the talks themselves, check out their posts or start reading the presentations while you wait for the videos. The quality of the presentations was incredibly high and I was not disappointed.

What made RICON great was the focus on the attendees and the community. It was clear that the organizers went out of their way to make sure that attendees had a great time. From the custom hoodies and beautiful signage to vegan meal options and amazing vendor-sponsored party I felt like I was at something bigger and better than a conference. This felt like an inclusive social event where I could learn with and from my technical peers, make new friends, and feel like the dumbest person in the room in the best way possible. It’s not every day that you get to go to dinner with the keynote speaker, a distributed systems researcher, and a team of engineers building a distributed monitoring platform for distributed systems.

RICON drastically changed my reading list. While it’s never been a short list, I’ve re-organized it to help fill up the gaps in my knowledge that this event pointed out. The reading isn’t just about distributed systems – I’ve added reading about databases, programming, networkings, and even from the humanities. In short – RICON did more than get me thinking about distributed systems can solve the problems I see regularly. RICON got me thinking.

Was the conference worth it? Heck yeah. I got to meet up with old friends, make new friends, and learn a lot about a subject I’m passionate about. I’m happy that Mark Phillips (one of the organizers) reminded me that I should buy a ticket before they sold out (the conference did sell out, by the way). I’m happy that I got a chance to go. And I can promise you that if there’s a RICON 2013, I’m going to try to be the first person to buy a ticket.

You wish your hoodie was this awesome.

Learn to Speak DBA Slang

Ever wonder what those big-company DBAs are saying when they start busting out the cryptic terms?  Learn the slang of database administrators with this handy reference guide.

Wizard of Oz – admin who makes everyone think things are automated, but he’s really just duct taping things together. “The executives think we’ve got a reporting dashboard, but the Wizard of Oz over there is just copy/pasting data into Excel and hitting Insert Chart before he prints it out.”

Food court – consolidated server with a bunch of unrelated databases. Typically not known for high quality. “The marketing team wants to install a social media program that needs a database, but they don’t have any budget. Put them in the food court.”

The last guy – the speaker in a previous time frame, like yesterday. Used for blaming someone else when it’s the speaker’s own fault.

How RAID 0 looks in the ads.
Photo by Soapbeard

Ride the unicycle – use RAID 0. “The food court was begging for faster performance, so the last guy decided to ride the unicycle.”

Suicide – killing your own query.

Genocide – killing all queries from a certain application

Two Men and a Truck – generic name for ETL programs like SQL Server Integration Services, Informatica, and DataStage. “We need a nightly job to get data from the sales system to the reporting server. Call Two Men and a Truck.”

Play Tetris – shrinking databases on a server with limited space. “We ran out of space on the L drive again. Run interference while I play Tetris.”

Tinted windows – encryption. “Tell the developers to put tinted windows on the web site database before somebody puts our password list on WikiLeaks.”

Van down by the river – server running ancient, unsupported software. Named because it’s the last thing a database ever sees before it shuffles off this mortal coil.

Blimp – monitoring software. “Jobs are failing all over the place. How’s it look from the blimp?”

100% delicious.
Photo by elizaIO

The Bakery – the department that produces pie charts. Sometimes referred to as Bakery Incorporated.

Escalate it to the documentation team – search Google.

Open a global support ticket – create a StackOverflow question.

56K modem – PCI Express solid state drive like FusionIO or OCZ Z-Drive. Named for their physical resemblance.

Keyser Soze – DBA or developer who looks ordinary but has insanely good skills, hardly anybody knows about it. Taken from the movie The Usual Suspects where the mythical main villain, Keyser Soze, is right in the middle of the group the whole time.

Smoking filtered cigarettes – doing something that appears safe but is really still dangerous.

Groundhog Day – ETL job that reloads all data from scratch every day rather than efficiently processing just the changed data. “The Wizard of Oz populates the database alright – it’s Groundhog Day at midnight.”

Saving Private Ryan – trying to do a row-level restore. Management usually calls for this task without understanding the complexity.

Fireworks store – dangerous server that crashes all the time. “Ever since the Wizard of Oz started writing his own backup software, the food court is turning into a fireworks store.”

Health Insurance – a current backup. “Make sure he has health insurance before you put a 56k modem in him.”

Read the paper – scan the event log looking for problems. “The job failed again last night. I’m going to read the paper.”

Group of blade servers in the wild.
Photo by Kismihok

Trailer park – blade server chassis. “The manufacturing team is bringing in a few new apps next quarter. Is there space in the trailer park?”

Blue jeans – full backups nightly.

Business casual – full backups nightly, log shipping every few minutes. “Does the new project server need blue jeans or business casual?”

Three piece suit – intricate high availability and disaster recovery strategy including clustering, mirroring, and log shipping.

Take a picture, it’ll last longer – advises the listener to perform a snapshot backup to make rollbacks easier. “You’ve been staring at that deployment script for an hour now. Take a picture, it’ll last longer.”

Value meal – Standard Edition server with the database engine, SSIS, SSAS, and SSRS all installed. “They needed SharePoint in a hurry so I gave ‘em a value meal.”

Updating the last step in the Disaster Recovery Plan – working on your résumé.

css.php