Blog

Cassandra is a popular NoSQL database – it powers portions of Facebook, Netflix, eBay, and a host of other companies. DataStax, the commercial company behind Cassandra, just released the 1.0 version of their Cassandra driver to a bit of fanfare. There’s connection pooling, LINQ support, load balancing, and automatic failover. This isn’t the first .NET client for Cassandra, but it is the first to support CQL 3 – the new Cassandra protocol. Like many first software steps, this one gets off to a rough start.

The Good Stuff

It’s remarkably easy to get started with the DataStax .NET client. Experienced and novice developers alike can take advantage of the pre-built NuGet package for the driver and immediately add it to their project – it’s ridiculously easy.

Data modeling with Cassandra is different from modeling data for different databases. The C# driver from DataStax makes it easy for developers to easily design and build applications that take advantage of Cassandra’s rich data model.

Take this example (from the github repository):

public class Tweet
{
    [PartitionKey]
    public string author_id;

    [ClusteringKey(0)]
    public Guid tweet_id;

    [SecondaryIndex]        
    public DateTimeOffset date;

    public string body;  
}           

With a few lines of code, developers can create code that will easily persist to Cassandra.

I was able to quickly put a marginally more complex playlist example. The sample creates a playlist using what’s called a wide row. Rather than storing each playlist item in a single row, the playlist data is stored in many columns across the row – there’s one row per playlist, and many columns in each row. It was relatively easy to start from a blank slate, create a data model, create tables in the database, and get started writing code.

The Bad Stuff

Not everything was sunshine and rainbows.

For starters, during a load test the driver started reporting my Cassandra cluster as being unavailable for queries. At this point, the driver itself started throwing exceptions and the load test code ultimately failed. While the load test application was busy failing and consuming 65% of my CPUs, I was still able to use the Cassandra shell, cqlsh, to connect to the Cassandra cluster and query my data. It’s clear that this wasn’t a database issue, but I’m not sure why the driver wouldn’t be able to communicate with the database.

Update: I’ve created a ticket on a performance issue I ran into. You can track it as CSHARP-47.

I was never able to get the DataStax C# driver to connect to anything other than a Cassandra instance running on my localhost. Everything else failed with NoHostAvailableException. I would have chalked this up to networking or VM weirdness, but my set up hasn’t caused any problems during work with Riak through CorrugatedIron, Hadoop, HBase, MongoDB, or SQL Server. There have been a several other reports of similar problems on the mailing list, so I’m pretty confident it isn’t me.

Nothing says "I love you" like a box of locks

Nothing says “I love you” like a box of locks


The Ugly Stuff

Locking. The driver itself uses a lot of locks in multiple places. Locks are most frequently used in heavily threaded code to avoid race conditions. If the driver’s connection pool decides that a connection should be released at the same time that a new process comes in and grabs that same connection there will be undesirable side effects. The downside of locking is that locks can slow down execution.

Update: A ticket exists to remove the heavy use of locks in the concurrent pieces of code, it can be tracked as CSHARP-13.

While I was never able to saturate my system enough to see the problem, several developers are reporting that the driver’s connection pooling is resulting in slow downs under heavy load. Take a look at the increasing 99th percentile latencies in this thread; as the number of concurrent connections increase, so does the latency.

Alternatives

What if you need to connect to Cassandra and you can’t afford to wait for some of the concurrency issues to be fixed? There are a few different routes you can take.

Using a .NET native driver:

Obviously, you could use a driver with a different language. For some teams, this might not be so bad – the developers can give their Haskell skills a whirl.

Or, you could use the magic of IKVM to compile a Java client for the .NET Framework. NativeX didn’t find the functionality they wanted it the .NET drivers, so they used IKVM to compile Hector to run on .NET. They seem quite happy with their solution, or so the slides would lead me to believe.

Conclusion

The DataStax C# driver for Cassandra is a good first release – it has a lot more features than many 1.0 software projects. For teams evaluating Cassandra and .NET, it’s worth giving it a look. There are enough features in place that it will work in many scenarios. Developers needing features that are only available in CQL3 may be willing to work around the driver’s limitations. Otherwise, I’d hold off on deploying into production with the driver – there are enough limiting factors to give me pause.

↑ Back to top
  1. Hi Brent,

    I just read your post on Cassandra and found it very useful in learning about the driver. If you know anyone that might be interested in a 3 month opportunity in San Antonio working with a Cloud Computing company utilizing Cassandra feel free to let me know!

    Thanks again,

    Crystal
    crystal@cornerstonetek.com

  2. > I was never able to get the DataStax C# driver to connect to anything other than a Cassandra instance running on my localhost. Everything else failed with NoHostAvailableException.

    We looked into the connection issue (https://datastax-oss.atlassian.net/browse/CSHARP-58) and we’ve put together a (short) wiki page with requirements for connection: https://github.com/datastax/java-driver/wiki/Connection-requirements.

    I hope those details will address this problem.

    • Hey Alex,

      Thanks for that. The documentation lists that this is for the Java client, should I assume that this also applies to the .NET client? Would it be possible to get someone to clear up the documentation?

  3. Hi Jeremiah,

    I’ve just checked out https://github.com/reuzel/CqlSharp it’s really nice! It doesn’t support LINQ just yet (Its on the nice to have list). So I’m going to have a go at building a simple “CqlBuilder” class based on this one found in Dapper but make it work for CQL
    https://code.google.com/p/dapper-dot-net/source/browse/Dapper.SqlBuilder/SqlBuilder.cs

    We are going to be using Cassandra rather than Riak, so I never got to try your C# driver out properly.

    Cheers
    Jake

  4. Brent, sorry to bother you about this but your blog pages do not for some reason render correctly in Opera. Especially the posts below the topic are shifted to the extreme right. Thanks for sharing. Best – Darek

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php