As businesses grow, many teams start considering scale out database solutions but they’re cautious – they want to know how to keep data up to date in multiple places. In theory, you can use log shipping, mirroring, or AlwaysOn Availability groups. While this answers the literal question, what people really mean is usually very different. The people who ask this question aren’t typically concerned about creating a read-only replica of Site A over at Site B, they’re interested in being able to write data anywhere and read data back from anywhere. They want a database that lets them read or write data on any server transparently.
SQL Server Bi-Directional Transactional Replication
First, this has more syllables than any feature has a right to have. Second, this feature is limited to two servers. It’s relatively simple and seems like a good enough option, especially since it’s really just two transactional replicas that point at each other. The downsides of SQL Server Bidirectional Transactional Replication are that:
- The feature uses too many syllables.
- Only two servers can be involved.
- Schema changes are necessary –
IDENTITYcolumns won’t work like you think.
- Did I mention that only two servers can be involved?
SQL Server Merge Replication
This is largely used in server to client environments. You have a SQL Server Express database on field laptops and those laptops are synchronized infrequently. Merge replication has the advantage of smarter conflict resolution than other SQL Server replication options (which typically explode and stop working). Merge replication topology assumes that there’s still only one master server, even though writes can be synchronized in both directions. So, yes, in theory this could work, but data modifications won’t be replicated immediately and a large set of changes could make a merge activity take a long time. In summation – merge replication probably won’t meet your needs because:
- Relies on a single master server.
- Replication is typically batched, not continuous.
- Master failover happens outside of replication.
Yes, I understand that merge replication can be run continuously. Search StackOverflow for merge replication questions and you’ll see what I mean.
SQL Server Peer to Peer Replication
In the SQL Server world, we only have one option for having multiple masters: SQL Server peer to peer replication (okay, there’s a second if you count merge replication). We’ve talked about peer to peer replication before in terms of conflict resolution, managing availability, and as a scale out solution. If you’re interested in how SQL Server peer to peer replication might work for you, feel free to check out those articles. If you’re a bit lazy, here’s the 30,000 foot overview:
- It’s complicated.
- It’s expensive (Enterprise Edition and schema changes are required).
- It probably won’t solve your problem.
- Every node has to be able to handle the write load of all nodes.
What Other Options Are There?
While it’s possible to use SQL Server, it’s always good to consider other options – especially since you need to change your data model to support peer to peer replication. If not SQL Server, what else is there?
Although implementing multi-datacenter seems like a difficult task, it’s already possible, just so long as you’re willing to make some changes. If you’re willing to consider changing your schema, what about the possibility of changing your underlying data storage system?
Before closing this browser window, hear me out.
The features of a software product are what drive the technologies used for implementation. When you need to implement a highly available SQL Server you choose between a few different options based a set of requirements. Why shouldn’t that decision extend to the database itself?
Stretch that Database
Different databases have different capabilities. Databases like Cassandra, HBase, and Riak offer different levels of multi-server and multi-site durability.
HBase offers strong consistency – data is written to multiple servers synchronously, just like SQL Server AlwaysOn Availability Groups using synchronous replicas. Using HBase gives you great single site durability – every write sent to the database server is committed to disk before the client receives a write acknowledgment. There’s no eventual consistency to worry about; HBase is strongly consistent within one datacenter.
If you want disaster recovery for an HBase cluster, you need to deploy HBase replication. HBase replication works by pushing entire log edits through a replication queue – this maintains atomicity for all HBase transactions. Since HBase’s replication operates at a column family level (think of a column family as roughly analogous to an RDBMS table), replication can be configured on a column family by column family basis. HBase replication can’t be used for a multi-master scenario across multiple datacenters, but it can be used to provide an asynchronous disaster recovery scenario.
Cassandra does things a little bit differently. Both HBase and Cassandra derive their data model from Google’s BigTable, but Cassandra differs in how data is moved around. Cassandra is loosely based on Amazon Dynamo – this lets the person using the database decide on the consistency of a write. This choice is one of the primary reasons for teams to choose Cassandra; different features of applications can read and write data at different consistency levels depending on feature requirements instead of platform choice.
Cassandra’s eventual consistency and data replication implementation make Cassandra an excellent choice for designing multi-datacenter applications. Within a Cassandra cluster, it’s possible to configure data replication to be rack and datacenter aware – the database will handle sending writes to other datacenters. Writes can even happen in other data centers and, because of Cassandra’s masterless architecture, those writes will eventually be replicated throughout the database cluster. The gotcha is that the last writer will win, so it’s important that applications be designed to take this into account.
Riak brings something a bit different to the table. Riak is a key/value database that shares a bit in common with Cassandra – they’re both based on Amazon Dynamo. Just like Cassandra, Riak offers tunable consistency levels for reads and writes that give developers the ability to change data access based on application functionality. Riak Enterprise is an add on for Riak that provides datacenter awareness and multiple options for data replication.
Riak has a robust conflict resolution system – although Riak can be configured for last write wins, Riak’s strength comes from the option to create a sibling in the event of a write conflict. A Sibling is just a conflicting copy of the data stored in the database. Siblings become valuable when you need to write data to multiple datacenters without routing writes; instead of routing writes for data ‘A’ to a specific datacenter, users can be directed to the closest datacenter. When data is read, conflicts can be resolved and the correct version of the data is saved back to the database.
It’s difficult to come up with a definitive verdict. Each solution has merit. If writes need to happen anywhere and eventually synchronize across all databases in all datacenters, Riak’s support for siblings and sibling resolution is compelling. Cassandra has compelling developer tooling and a rich suite of drivers. HBase has strong consistency within the same datacenter.
Feature requirements mean that sometimes you need to be able to write your data to one of multiple servers in multiple datacenters. Getting the job done with SQL Server is going to be expensive, difficult, and time consuming. Ultimately, it’s worth looking at other database options to determine if you need to stay with SQL Server or if your data storage can be moved to another platform. Remember – it’s the application requirements that dictate the database platform, not the database platform that dictates application features.