Why I Love kCura RelativityOne Even Though I Don’t Use It

At RelativityFest this week, kCura showed more details about how their upcoming software-as-a-service hosted in Microsoft Azure works. I really like where they’re going with it.

Presenting at Relativity Fest 2016
Presenting at Relativity Fest 2016

I’ve blogged about Relativity before, especially about how it uses SQL Server, but here’s a quick recap:

  • It hosts legal data (think lawsuits, cases, investigations)
  • Every case is its own database
  • New databases are created all the time by end users
  • Lawyers work in these databases, putting in their work product
  • Losing data would be extremely expensive

This makes database administration a real pain. A lot of HA/DR technologies require you to do manual work as new databases are added, or else you have to build your own custom apps to protect newly added databases automatically. Databases appear out of nowhere, and suddenly get terabytes of data loaded into them over the span of a week, and can then go idle for weeks, then suddenly get a new load of user activity. It’s really hard to predict and protect this stuff, which also makes budgeting for hardware extremely tough. By the time a case comes in and explodes, you don’t have the money for it, and when you get the hardware in, it’s too late.

It’s absolutely perfect for the cloud.

Today, the first private beta iteration of RelativityOne uses SQL Server 2016 hosted in Azure VMs, protected with Always On Availability Groups. (While that option doesn’t really make sense for on-premises HA failover protection, it works for a hosted version for reasons that are kinda unrelated here.) kCura’s own teams are doing uptime and performance support in real time – just like you’ve been doing for years.

I get excited about this because:

kCura’s DevOps teams are learning the challenges of AGs firsthand. I like to say that Always On Availability Groups are the opposite of the easy button – they’re awesome, but they require a ton of work. kCura’s feeling those pains, and as a result, they’re examining their own software to figure out how to improve the HA/DR situation.

kCura can react faster to performance issues. Before RelativityOne, if a new version of Relativity shipped with some less-than-fast queries, you (the DBA) were the first to know. You’d open a support ticket to alert kCura, and they’d work with you on a fix – but that fix may require a patch, and you might need an outage to apply it, and the business might not give it to you. With RelativityOne, kCura will catch the problem first.

Furthermore, it’s in kCura’s best interest to fix performance issues. When a query burns up a ton of server horsepower, it hits kCura directly in the wallet. They can’t pass on increased Azure resource costs to you if they put out a bad query – they have to temporarily absorb it, and then fix it so they can save money. As they do performance tuning, those same fixes will be in the on-premises version too, since it has the same code base.

Everybody wins:

  • If you use the Azure-based RelativityOne, you can avoid the crappy parts of database administration: backups, CHECKDB, server patching, and outage troubleshooting.
  • If you use the conventional on-premises version, you get the benefit of better application code because the software vendor is now doing the same work you’ve had to do.

And when I eventually do use RelativityOne – because I think all e-discovery software will end up in the cloud sooner or later – I really love RelativityOne because we still get query access to the databases. I’m amazed they’re giving customers that power – and responsibility.

Previous Post
Set Statistics… Profile?
Next Post
The Problem with Windows Server 2016 Licensing

21 Comments. Leave new

  • Jeremy Carroll
    October 12, 2016 11:45 am

    My biggest concern with kCura’s use of SQL Server is that they have done so little with the Enterprise SKU. I manage many TBs of on-prem (gotta love core-licensing…insert sad trombone tunes here) Relativity and Invariant databases, and see no use of table partitioning, compression, or even simple old topics like filtered indexes. Hell, how many non-clustered index keys in Relativity have variable-character keys…almost all of them. If you have ever looked at the Invariant database, especially the Matter table, you will see indexes, all created by some intern using DTA, that barely benefit the user. When an INV database is greater than 100GB, I have to strip the indexes from the DB and replace them with my own. This is a nightmare.

    I for one am very supportive of RelativityOne if for no other reason than that kCura will have to improve their database development and management skills (well, if only I had a nickel for every time I had that thought concerning development teams). My largest complaint about RelativityOne is that it will cost a ton (ever pay for 5TB of SSDs in Azure for SQL Server?) and will force users to pay yet another Microsoft tax (I asked them about SQL Server on Linux, but I doubt we’ll see it soon and even if we do, kCura isn’t going to suddenly run on the LAMP stack). It may also crowd out on-prem companies that provide the same service (me), but that’s just life.

    I look forward to kCUra’s tackling of HADR and AvlbGroups for instance; however, I’d be a hell of a lot happier if they’d take a beginner’s course in relational/design theory and one of Kim Tripp’s classes on basic indexing. That would solve a lot of headaches right now.

    Reply
    • Jeremy – interesting questions, and lemme tackle these individually.

      Why Relativity doesn’t use table partitioning – it just doesn’t work well in an environment where there are no clear partitioning keys. I’d love to hear you elaborate more on how you’d partition, say, the Document table.

      Why Relativity doesn’t use compression – it only works on on-row data, not off-row data (like big extracted text fields)

      Why Relativity doesn’t use filtered indexes – because filtered indexes don’t work with parameterized queries. https://www.brentozar.com/archive/2013/11/filtered-indexes-and-dynamic-sql/ You can totally create your own filtered indexes, though, and users are encouraged to design whatever indexes meet their own search needs.

      Why isn’t indexing better in Invariant – yeah, that’s actually a separate product, and I haven’t been involved in tuning the indexes on that. I would totally open up a support case if I was you there. But note that everything in the post is really focused on that exact problem: kCura is now going to start seeing the same index tuning challenges you’ve seen in Invariant.

      About SQL Server on Linux – Microsoft has already stated that the licensing costs will be the same, so I’m not sure where you’re going with that.

      About kCura needing a beginner’s course in relational/design theory – I would probably ease up a little bit there and spend some time talking to their developers. You might be a little surprised. Many of them have been both to our classes, and SQLskills’ classes, so they understand why partitioning, compression, and filtered indexes don’t work well in Relativity. Zing. 😉

      Reply
      • Jeremy Carroll
        October 12, 2016 12:17 pm

        Thank you for your detailed reply, Brent.

        How would I index the Document table? Well, I would not attempt to change much for the reasons you mentioned; but then again, my tables aren’t many columns wide, do not contain NULLs, and I do not store BLOBs unless there is a very legitimate reason to do so (I am a C.J. Date groupie, if that means anything to anyone). kCura has chosen a different approach, which they appear to be working to correct with the introduction of Data Grid for extracted text and audit data. Why create a design that cannot take advantage of many of the Enterprise SKU’s functionality? I assume that the development team did not complete their initial designs with SQL Server Enterprise in mind. As for filtered index (and indexing in general), I already create my own indexes and suggest people do likewise and work with kCura where it seems appropriate.

        As for Linux, I am not really going anywhere with the comment. I was just wondering about kCura’s desire to perhaps run within another vendor’s cloud ( certainly, one does not need Linux for that) and perhaps even create some version of Relativity that is not so dependent on Microsoft products. It is apparent that they have chosen .NET, SQL Server, and Azure as their tools and are not likely to suddenly switch to PHP and Postgres. So, I have to pay the Bill Gates tax if I go the RelativityOne route.

        As for Invariant and indexing in general, what can I say? If a chosen design does not scale, then all I can do is work with the vendor to improve the codebase. If kCura’s developers are taking both your classes as well as Paul & Kim’s, then I am happy to hear it. I assume that we will see more improvements within SQL Server on both the hosted and on-prem solutions.

        Thank you for replying to my comment and thank you for giving voice to Relativity on your blog. I find very little online about the topic especially for those of us who must support the entire infrastructure, including SQL Servers, as well as work on other complicated topics like elastic & analytic searches. I look forward to more in-depth SQL Server skills seminars at next year’s Relativity Fest.

        Reply
        • Jeremy – ah, it sounds like you’re using Relativity in ways that are pretty different from typical clients. A lot of their smaller shops run Standard Edition, and many (especially big shops) throw in all kinds of columns in the Document table. (I usually see over 100.)

          Reply
          • Jeremy Carroll
            October 12, 2016 12:56 pm

            Brent,

            I am not unfamiliar with big shop modifications to Relativity’s base tables. I only work around what they give me. I do run SQL Server Standard for Invariant, since even when there are a couple thousand databases on the instance, few of them are doing anything and a 128GB can go far in the end. A two-instance A/P cluster seems to behave just fine with Invariant and the use of SIMPLE recovery model for all the INV databases gives me fewer complications/choices with DR. That is one peeve I have with RelativityOne. I’d like to be able to specify that I use only the Standard SKU for Invariant. I wonder if kCura will let me? If not, do you take bribes for quietly communicating to them my wish list? =)

      • As Jeremy eludes to a bit, and being in the eDiscovery business myself, I debate whether to even voice this thought…

        Regarding table-partitioning on the Documents table: the way eDiscovery works is you start off with a glob of documents (often times Millions). Early in the process, one quickly filters out documents that will not be relevant to the particular Matter (or in scenarios of just initial discovery regarding analysis of depths of exposure to the issue being investigated). There’s one partition! Perhaps 80% of documents eliminated just due to some easy filtering (the actual value of 80% is made up, but the gist of it is correct). Next up, review of the remaining documents (20%) to sort based on actual relevance. There’s the next partition, with the final, most important documents being moved over into the 3rd partition.

        This is a Perfect scenario for Table-Partitioning, where documents start off in less-expensive data storage, and as they get promoted into relevance, can leverage from more beneficial (faster) storage options.

        Reply
        • Frank – well, the problem with that approach is that your searching usually happens on documents that haven’t been defined as relevant yet. That means you’re focusing the worst workloads on the slowest storage.

          Even worse, when you find a document and mark it as relevant, you’ll be moving it from one partition to the other. That’s a big no-no in partitioned tables – it’s going to be a locking mess.

          Good idea in theory though!

          Reply
    • Without justifying the lack of renaming of the indexes, and trying to justify whether or not they are optimal, I can at least explain why they exist and where they are used. If you have more optimal versions of these, let the folks at kCura know what they are!

      _dta_index_Matter_9_21575115__K13_K6
      Goes to _sp_Matter_Insert when we upsert documents in the Matter table during ingestion/retry

      _dta_index_Matter_9_21575115__K2_K7_1_8_9_10
      Goes with _sp_Matter_InOrderByJob where we have to organize data for viewing in the RPC inspector, or when publishing documents to Relativity, or exporting load files

      _dta_index_Matter_9_21575115__K3
      Goes with _sp_Matter_Cleanup when work needs to get rolled back by individual jobid (when a worker crashes, or something blows up)

      _dta_index_Matter_9_21575115__K8_K10_1_9
      _dta_index_Matter_9_21575115__K8_K9_K10_1
      Goes with func_Filter_FileIds when we need to gather up a list of files to view/export/report based on applied filters — this occurs during export, publish to Relativity, reporting and many others

      _dta_index_Matter_9_21575115__K9_1_6
      Goes with _sp_Matter_DistinctStorageByJob, which happens during text extraction and RPC imaging, where we want to ensure we only extract/image a single occurrence of the same doc one time.

      Reply
      • Jeremy Carroll
        October 13, 2016 11:15 am

        Robert,

        I Appreciate the response and explanations. As for giving kCura my changes, why I would I do that? It is to my advantage not to do that since my changes give me an edge in a non-open source community, i.e., I pay kCura for this stuff, not the other way round.

        As for the naming conventions, what can I say? They certainly tell us the provenance of the indexes and to the discerning DBA’s eye, makes the stomach turn. I know this matters little to the end users, but I’d appreciate a least some sort of naming convention if for no other reason then I can perhaps assume that they were not authored by an intern, but one never knows. I know that you know this, but I just wanted to leave it here.

        Thanks again for the explanations. They are indeed insightful.

        All the best!

        Reply
        • Jeremy – you may want to step back and think about this for a minute.

          On one hand, you’re complaining that the vendor is giving you inefficient code.

          On the other hand, you’re saying that the vendor’s inefficiencies are giving you an edge because you can improve their code, and host it better than the other hosting partners.

          It sounds a lot like you’re trying to have your cake and eat it too. There’s a saying in consulting: if you don’t sell the solution, you can make good money prolonging the problem. 😉 Either contribute your code back, or keep your mouth shut about the inefficiencies and make money on them, but you can’t do both. (At least, you can’t do both and look good, heh.)

          Reply
          • Jeremy Carroll
            October 13, 2016 11:29 am

            Brent,

            I understand the response, but we all have responsibilities to our employers. I am happy to work with kCura to improve their code, but I can’t give away proprietary code. We all work for someone right? There are many vendors that build on or around kCura as you know and they are not giving their work away for free.

            As for shutting my mouth, why do that? May I not have a legitimate complaint about a product and desire that it be improved? And why is the knee-jerk reaction that I have to give away my work to the person from whom I purchased the product? I can complain and make changes where appropriate to advance my own business. If people think I look bad for doing so, what can I say…boo hoo life’s tough.

          • Jeremy – seriously, just take a breath, walk away from the keyboard for a while, and when you come back, read my comment again.

            Just pick one or the other: be part of the solution, or make money on the problem while keeping your mouth shut about how good your solution is. 😉

  • I think what Brent was trying to say was “shh!! you’re providing an advantage to your clients and now you’ve told kCura how and why!”

    To which my only response is that we’ll look into those indexes (which we admittedly haven’t looked at in a long time) and promise to have those optimized in a patch in the very near future!

    Reply
    • Jeremy Carroll
      October 13, 2016 1:29 pm

      Gentleman,

      Brent, I take your point and went and got a sandwich. Grilled cheese never gets old! I do think you perhaps misunderstand my point. It is difficult to communicate via blog post comment, but I’ll do what I can.
      I cannot give away code that is owned by the company for whom I wrote it. If I did that, it would be back to digging ditches for a livin’ not complaining about software developers.
      I am not against helping kCura where it is practical to do so. For example, my comments concerning the EDDS side of the product and the use of the Enterprise SKU were not meant as destructive criticism. In your first response, you outlined fair solutions to the issues that I raised. Design choices in the past have affected kCura’s ability to make use of Enterprise-level tools, e.g., table partitioning, filtered indexes, page & row-level compression, that would be useful to those of us who are paying for this solution and have to manage it on the backend. This is especially true when one will be paying for the Enterprise SKU in RelativityOne in Azure and one may have a significantly large footprint (10s of TBs). I assume performance SLAs and kCura support will comprise a large part of RelativityOne’s cost to the user. I’d dislike calling about issues that I know could be solved by code re-factoring. I should not have to “partition” the AuditRecord_PrimaryPartition table by archiving it. I’d see the need with SQL Server standard or I’d use partitioned views. I would not archive and delete; but you know that already.
      As for Invariant, the indexes as written have not scaled well for me. I doubt that this is particular to my case, but I could be wrong. The database, minus its dynamically created tables is only a couple dozen tables deep and only two or three tables really matter from a performance perspective. I know this can be corrected and I welcome any feedback from kCura and will help where I can.

      All the best. Now, back to my sandwich!

      Reply
  • Can you comment on kCura’s choice to rely on the default schema behavior rather than to properly schema-qualify object names in their SQL code?

    Reply
    • Jeremy Carroll
      October 14, 2016 2:46 pm

      Hi Tony,

      I am not sure to whom your response is directed, but I am wondering what you mean by schema-qualification? Within EDDS, kCura uses only the EDDSDBO schema and I see two-part naming in most (if not all) of their code. Are you looking for four-part naming? By default schema behavior, do you mean that the schema should not be mapped to a correspondingly named user and be just a container of objects…which it has been since 2005. I apologize for all of the questions, I just do not fully understand your query?

      Reply
      • My comment was addressed to Brent or to anyone who may know more than I do about the subject, so thanks for the response.

        It’s interesting that you’re seeing two-part names, because I mainly see one-part names in kCura queries. For example,
        SELECT artifactId FROM TableName where artifactId = 1234;
        rather than
        SELECT t.artifactId FROM eddsdbo.TableName as t WHERE t.artifactId = 1234;

        The eddsdbo user (which is the context of application execution) is defined with a default schema of eddsdbo, so the queries behave as expected with in the application, but rely on the default schema property of the user. If a human makes a connection with SSMS and authenticates as a user that does not have a default schema of eddsdbo, the queries are invalid.

        Also… Aaron Bertrand and others have shown conclusively that there is a performance cost associated with omitting the schema-qualifier. It’s small, but measurable, so when every clock cycle counts I’m wondering what the reason would be to omit it.

        Reply
        • Jeremy Carroll
          October 14, 2016 4:07 pm

          I assume it is because they are always using the EDDSDBO user, so there would be no cost involved.

          The user EDDSDBO is mapped to the schema EDDSDBO in every database on primary and distributed kCura’s SQL Server instances.

          Do you have a link to Bertrand’s article about a cost to search for an object within a schema to which one is mapped? I assume that in this case, you are just paying for the cost of doing business in SQL Server. It is cheaper than looking for objects mapped to a schema to which one is not mapped.

          Reply
          • I may have been too quick to attribute this testing to Aaron Bertrand because now I can’t find it. His testing is usually so detailed and accurate that I may have wrongly assumed it was his. If so, I apologize. If not, I hope someone will post a link.

            Bertrand does list this as a bad habit to kick, and the comments add additional ammunition, but the article does not include test results:
            http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/11/bad-habits-to-kick-avoiding-the-schema-prefix.aspx

            Microsoft’s own documentation is fairly consistent and clear that using a two-part name is a recommended best practice. One reason involves implications for query plan cache and re-use.

            https://msdn.microsoft.com/en-us/library/ff647793.aspx
            https://msdn.microsoft.com/en-us/library/ee343986(v=sql.100).aspx
            https://msdn.microsoft.com/en-us/library/dd283095(v=sql.100).aspx

            Since Relativity really only needs the plan to be re-used by a single user (eddsdbo), there may be no practical difference in this application. My inclination is to stick with the recommended best practice, so I just wondered why kCura would go a different direction.

          • Yeah, this is one of those issues where I’ve noticed it, but it hasn’t been anywhere near the top 10 in terms of performance optimization. When I picked things to recommend to the devs to focus on, that never made the list.

            I like to tell folks, when you write new code, sure, keep things like this in mind. But once you’re live on an app like Relativity, it’s rarely the biggest concern. (Shops with, say, 174 indexes on the Document table, or not doing CHECKDB even once a week, those are bigger concerns for me.)

  • Jeremy Carroll
    October 14, 2016 6:17 pm

    My list for kCura is a mile long; but, I’ll never be completely satisfied. I just convinced the devs I work with to use SQL Prompt and I programmed two-part naming into the style I gave them. If you can’t convince ’em, then trick ’em.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.