kCura Relativity is an e-discovery program used by law firms to find evidence quickly. I’ve blogged about performance tuning Relativity, and today I’m going to go a little deeper to explain why DBAs have to be aware of Relativity database contents.
In Relativity, every workspace (case) lives in its own SQL Server database. That one database houses:
- Document metadata – where the document was found, what type of document it is
- Extracted text from each document – the content of emails, spreadsheets, files
- Document tagging and highlighting – things the lawyers discovered about the documents and noted for later review
- Workspace configuration – permissions data about who’s allowed to see what documents
- Auditing trails – who’s searched for what terms, what documents they’ve looked at, and what changes they made
For performance tuners like me, that last one is kinda interesting. I totally understand that we have to capture every activity in Relativity and log it to a table, but log-sourced data has different performance and recoverability requirements than other e-discovery data.
The AuditRecord table is append-only. We don’t go back and modify AuditRecord data – we just constantly add to it, tacking on data at the end of the table. This means it has different backup requirements – I certainly don’t want to be doing a full backup on this table every day, repeatedly backing up the exact same data over and over and over when it will never change.
It will never get deleted. We have to keep this data throughout the life of the case because the client may need to go back through time to see who did what during the case. This means it will keep growing and growing, making backups tougher over time.
In big cases, it can dominate the database. I’ve seen instances where the AuditRecord table consumed more than half of the database size – meaning in a 1TB database, 500GB of the contents are AuditRecord. This means backups will take twice as long.
In the event of a disaster, I don’t need it right away. In our 1TB workspace example, I would prefer to restore the 500GB workspace data first, let the lawyers in to do document review, and then take my time restoring a separate 500GB AuditRecordArchive database.
Splitting AuditRecord Into an Archive Database
In order to pull this off, I need two separate databases:
- Workspace database – the normal EDDS12345 database for workspace 12345. Inside this database, I have all the normal Relativity tables, but only a small AuditRecord table with the current audits – say the last 7 days. As people review documents, I’d log that data into this AuditRecord table.
- AuditRecordArchive database – say, EDDS12345_AuditRecordArchive. In here, I have one AuditRecordArchive table that has all of the AuditRecord data more than 1 week old.
Once I’ve set this up, then I need a job that sweeps the EDDS12345.AuditRecord data into the archive database once per week. Immediately after that job finishes, then I do a full backup and DBCC of EDDS12345_AuditRecordArchive – and then I don’t need to back it up again until the next sweep.
If I want to get really fancy, I don’t do daily DBCCs or index maintenance against that AuditRecordArchive database either. If anything goes wrong with it, like database corruption, I just restore to last week’s full backup and I’m off to the races. This means less downtime for database maintenance.
Great News! kCura Supports This
It’s called “partitioning the AuditRecord table”, and your kCura support contacts can walk you through it. It doesn’t involve SQL Server partitioned tables at all – they just call it partitioning because it’s the same basic concept, only done with application-level code.
However, I don’t recommend doing this by default across all your databases. This technique is going to instantly double the number of databases you have and make your management much more complex. However, I do recommend reviewing your largest workspaces to see if AuditRecord is consuming half or more of the database space. If so, consider partitioning their AuditRecord tables to get faster backups, database maintenance jobs, and restores.
At the risk of sounding like a fanboy, this is one of the reasons I love working with the kCura folks. They really care about database performance, they take suggestions like this, and they implement it in a way that makes a real difference for customers.
This is also why database administrators need to:
- Understand the real business purpose of the biggest tables in their databases
- Build working, productive relationships with their software vendors
- Come up with creative approaches to ease SQL Server pains
- Help the vendors implement these approaches in software