Today at Relativity Fest in Chicago, kCura Relativity 9 introduces the option to move some text storage out of Microsoft SQL Server and into kCura’s new Data Grid, a tool built atop the open source Elasticsearch.
Is kCura abandoning SQL Server? No, but understanding what’s going on will help you be a better database administrator and developer.
kCura’s Challenges with Microsoft SQL Server
To recap some of my past posts on Relativity, it creates a new SQL Server database when one of the end users creates a new workspace and starts loading data. Over the coming weeks, data pours into SQL Server at completely unpredictable rates. We have no idea how many documents are going to be acquired from subpoenaed hard drives, file servers, backup tapes, Facebook messages, you name it. I’ve seen workspaces grow from zero to ten terabytes in a single week, all without the systems administration teams even knowing it’s happening. They ran their weekly backup size report, and surprise, surprise, surprise.
Data streams into Relativity during business hours at the very same time hundreds or thousands of document reviewers are running queries against those very same tables. The entire team is under tight time deadlines, and there’s no way to take databases (let alone servers) offline for loads.
And oh yeah, we’re often contractually bound not to lose any attorney work product whatsoever out of the database.
This is all doable with traditional relational databases, but it ain’t easy. It’s made even tougher by the fact that many Relativity hosting partners are understaffed, many without even a full time database administrator.
How We Helped kCura Plan for Change
For the last couple of years, Andrew Sieja has repeatedly asked me a tough question: “If you were going to redesign Relativity from the ground up, and anything was on the table, what would it look like?” He faced a classic case of Innovator’s Dilemma – his team had built a wildly successful product, but there’s innovation coming from everywhere, and sooner or later somebody was going to beat him to the next level. While SQL Server was getting the job done, alternative data storage platforms beckoned with some really cool advantages, and he needed to take advantage of them before his competitors did.
kCura brought us in to work through the options out on the market. There are a gazillion new data storage & search options out there, and some of them claim to do a phenomenal job on absolutely everything. (Hint: may of these vendors are bluffing.) We helped them prioritize the features they needed most, and then recommended the right fit for them.
The end result is the newly announced kCura Data Grid, an extremely scalable and performant search platform built atop the open source Elasticsearch. You might recognize Elasticsearch from my demos of Opserver, Stack Exchange‘s open source monitoring tool, because Stack also migrated their SQL Server full text search out into Elasticsearch. They’re not alone – Elasticsearch has plenty of high profile case studies.
The Benefits and Risks of Elasticsearch
We typically see 70-90% of a Relativity workspace’s space consumed by extracted text and audit logging, both of which are great fits for Elasticsearch. Pushing that data out of SQL Server potentially means:
- Reduced storage costs – while SQL Server relies on expensive shared storage (typically $5k-$10k per terabyte), ES achieves redundancy with multiple commodity boxes (typically $1k-$2k per terabyte). This adds up fast for big workspaces.
- Faster search – ES is mind-numbingly fast. Seriously.
- Easier scale-out – it’s really, really hard (and expensive) to scale out a single multi-terabyte database across multiple Microsoft SQL Servers when people can create new databases at any time. (It’s even hard enough just to scale a single known database across multiple servers!) It’s easy to add ES replicas for higher performance and availability.
It’s not a silver bullet, and as with any technology change, there are risks and limitations:
- Security – ES doesn’t have any built in, so kCura had to build their own.
- Backups – when everything is in one data platform, it’s easy to back up everything at the same moment in time. Split the data, and you run into challenges – but these aren’t really new for Relativity. The databases and the native files couldn’t be backed up to the same point in time either.
- Management – Relativity hosting partners don’t have ES expertise on staff, and they’ll need training as ES becomes a mission critical part of their infrastructure.
What This Means for SQL Server Developers and DBAs
Microsoft SQL Server is an amazing relational database, a Swiss Army knife of a persistence layer. Sure, it can handle tables and joins, but more than that, it can do things like full text search, spatial data, CLR code execution, and scale-out via multiple methods. You can build a product backed by SQL Server and go a long, long, long way.
I don’t think SQL Server ran out of capabilities here, but kCura needed to plan for orders-of-magnitude growth in storage and search capabilities over the coming years. They grew one hell of a big, powerful business solution with a single database back end, and they’ve got the luxury of a large development staff and a bunch of new data storage options.
Premature optimization is the root of all evil. When you’re building the product you need today, the right database is the one you already know well. As your product grows, keep learning your own database, plus learn the other options out there. The storage and search markets are changing so dramatically every year – don’t make a bet on one today unless you have to, because tomorrow might bring an even better solution for your needs.
For more details about Relativity 9, check out kCura’s Relativity 9 page.