Many Masters, One Truth

As businesses grow, many teams start considering scale out database solutions but they’re cautious – they want to know how to keep data up to date in multiple places. In theory, you can use log shipping, mirroring, or AlwaysOn Availability groups. While this answers the literal question, what people really mean is usually very different. The people who ask this question aren’t typically concerned about creating a read-only replica of Site A over at Site B, they’re interested in being able to write data anywhere and read data back from anywhere. They want a database that lets them read or write data on any server transparently.

SQL Server Bi-Directional Transactional Replication

First, this has more syllables than any feature has a right to have. Second, this feature is limited to two servers. It’s relatively simple and seems like a good enough option, especially since it’s really just two transactional replicas that point at each other. The downsides of SQL Server Bidirectional Transactional Replication are that:

  • The feature uses too many syllables.
  • Only two servers can be involved.
  • Schema changes are necessary – IDENTITY columns won’t work like you think.
  • Did I mention that only two servers can be involved?
You heard the sign: merge!
You heard the sign: merge!

SQL Server Merge Replication

This is largely used in server to client environments. You have a SQL Server Express database on field laptops and those laptops are synchronized infrequently. Merge replication has the advantage of smarter conflict resolution than other SQL Server replication options (which typically explode and stop working). Merge replication topology assumes that there’s still only one master server, even though writes can be synchronized in both directions. So, yes, in theory this could work, but data modifications won’t be replicated immediately and a large set of changes could make a merge activity take a long time. In summation – merge replication probably won’t meet your needs because:

  • Relies on a single master server.
  • Replication is typically batched, not continuous.
  • Master failover happens outside of replication.

Yes, I understand that merge replication can be run continuously. Search StackOverflow for merge replication questions and you’ll see what I mean.

SQL Server Peer to Peer Replication

In the SQL Server world, we only have one option for having multiple masters: SQL Server peer to peer replication (okay, there’s a second if you count merge replication). We’ve talked about peer to peer replication before in terms of conflict resolution, managing availability, and as a scale out solution. If you’re interested in how SQL Server peer to peer replication might work for you, feel free to check out those articles. If you’re a bit lazy, here’s the 30,000 foot overview:

  • It’s complicated.
  • It’s expensive (Enterprise Edition and schema changes are required).
  • It probably won’t solve your problem.
  • Every node has to be able to handle the write load of all nodes.

What Other Options Are There?

While it’s possible to use SQL Server, it’s always good to consider other options – especially since you need to change your data model to support peer to peer replication. If not SQL Server, what else is there?

Although implementing multi-datacenter seems like a difficult task, it’s already possible, just so long as you’re willing to make some changes. If you’re willing to consider changing your schema, what about the possibility of changing your underlying data storage system?

Before closing this browser window, hear me out.

The features of a software product are what drive the technologies used for implementation. When you need to implement a highly available SQL Server you choose between a few different options based a set of requirements. Why shouldn’t that decision extend to the database itself?


Stretch that Database

Different databases have different capabilities. Databases like Cassandra, HBase, and Riak offer different levels of multi-server and multi-site durability.

HBase offers strong consistency – data is written to multiple servers synchronously, just like SQL Server AlwaysOn Availability Groups using synchronous replicas. Using HBase gives you great single site durability – every write sent to the database server is committed to disk before the client receives a write acknowledgment. There’s no eventual consistency to worry about; HBase is strongly consistent within one datacenter.

If you want disaster recovery for an HBase cluster, you need to deploy HBase replication. HBase replication works by pushing entire log edits through a replication queue – this maintains atomicity for all HBase transactions. Since HBase’s replication operates at a column family level (think of a column family as roughly analogous to an RDBMS table), replication can be configured on a column family by column family basis. HBase replication can’t be used for a multi-master scenario across multiple datacenters, but it can be used to provide an asynchronous disaster recovery scenario.

Cassandra does things a little bit differently. Both HBase and Cassandra derive their data model from Google’s BigTable, but Cassandra differs in how data is moved around. Cassandra is loosely based on Amazon Dynamo – this lets the person using the database decide on the consistency of a write. This choice is one of the primary reasons for teams to choose Cassandra; different features of applications can read and write data at different consistency levels depending on feature requirements instead of platform choice.

Cassandra’s eventual consistency and data replication implementation make Cassandra an excellent choice for designing multi-datacenter applications. Within a Cassandra cluster, it’s possible to configure data replication to be rack and datacenter aware – the database will handle sending writes to other datacenters. Writes can even happen in other data centers and, because of Cassandra’s masterless architecture, those writes will eventually be replicated throughout the database cluster. The gotcha is that the last writer will win, so it’s important that applications be designed to take this into account.

Riak brings something a bit different to the table. Riak is a key/value database that shares a bit in common with Cassandra – they’re both based on Amazon Dynamo. Just like Cassandra, Riak offers tunable consistency levels for reads and writes that give developers the ability to change data access based on application functionality. Riak Enterprise is an add on for Riak that provides datacenter awareness and multiple options for data replication.

Riak has a robust conflict resolution system – although Riak can be configured for last write wins, Riak’s strength comes from the option to create a sibling in the event of a write conflict. A Sibling is just a conflicting copy of the data stored in the database. Siblings become valuable when you need to write data to multiple datacenters without routing writes; instead of routing writes for data ‘A’ to a specific datacenter, users can be directed to the closest datacenter. When data is read, conflicts can be resolved and the correct version of the data is saved back to the database.

The Verdict

It’s difficult to come up with a definitive verdict. Each solution has merit. If writes need to happen anywhere and eventually synchronize across all databases in all datacenters, Riak’s support for siblings and sibling resolution is compelling. Cassandra has compelling developer tooling and a rich suite of drivers. HBase has strong consistency within the same datacenter.

Why Care?

Feature requirements mean that sometimes you need to be able to write your data to one of multiple servers in multiple datacenters. Getting the job done with SQL Server is going to be expensive, difficult, and time consuming. Ultimately, it’s worth looking at other database options to determine if you need to stay with SQL Server or if your data storage can be moved to another platform. Remember – it’s the application requirements that dictate the database platform, not the database platform that dictates application features.

Previous Post
CorrugatedIron 1.4
Next Post
How to Find ‘Secret Columns’ in Nonclustered Indexes

14 Comments. Leave new

  • The downsides of SQL Server Bidirectional Transactional Replication are that:
    The feature uses too many syllables.
    Only two servers can be involved.
    Schema changes are necessary – IDENTITY columns won’t work like you think.**

    If the db’s are created by an application that they will support, in what ways would we be dealing with schema changes? Thanks!

    • Jeremiah Peschka
      November 14, 2013 12:30 pm

      I’m not sure I entirely follow your question, but with IDENTITY columns, the identity will increment on each server independently. You need to make sure that you’re using a composite key.

      For other types of schema changes, the usual pattern is to stop writes into the database, disable replication, make schema changes, enabled replication, and then start writes.

      In general – it’s a feature best avoided unless you absolutely need it. Even then, you should avoid it.

    • Chuck Hottle
      March 25, 2015 5:01 pm

      I know that it’s been a while since this was posted but we currently use transactional publications with updatable subscriptions, though we call it two-way, and it is deprecated in SQL Server 2014. Between merge and peer-to-peer, which do you feel would be the best replacement? Thanks.

      • That depends on your use case. I suspect that peer to peer might be the SQL Server solution you need. But if we were consulting, I’d ask you a lot of difficult questions about the problems you’re trying to solve and whether or not you need to accept writes in two locations.

      • How we do active-active-active across data center replication in sql server?

  • I installed p2p on SQL Server 2014 but ran into some issues. Do you have step-by-step instruction how to set up p2p in SQL Server 2014? It will help me a lot to avoid any mistake I may make.

  • Bill Devonshire
    June 18, 2015 9:05 am

    I have 2 AlwaysOn clusters (SQL 2014) and am trying to implement P2P between 2 Availability Groups. Is the feasible? Would you know of any gotchas in this process? I am having trouble when AVG is failed over. Thanks

  • You mentioned the possibility of changing your underlying data storage system, yet not Oracle RAC which allows for multiple Oracle instances writing to a single Oracle database when the storage solution is ASM.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.