Availability, Data Locality, and Peer-to-Peer Replication

I want to make something clear: high availability is not load balancing. The two options don’t have to be mutually exclusive, but they aren’t the same thing. Several months ago, I wrote about resolving write conflicts. Some of the approaches I mentioned for resolving write conflicts (such as taking a “last write wins” approach) involved using peer-to-peer replication. It’s important to understand conflict resolution and peer-to-peer replication. Since I’ve already talked about conflict resolution, let’s dig into how peer-to-peer replication fits into the mix.

Peer-to-Peer Replication and You

Peer-to-peer replication is a special and magical kind of replication, it works in a ring or mesh to make sure that one row’s updates will magically spread to all servers. You’d think that this would mean every server is equal, right?

In some distributed databases actions can take place on any server. Using Riak as an example, when you want to add a new record to the database you can write the record to any server and the record will be routed to the server responsible for handling that data. This is part of the beauty of a specific type of distributed database: the database is a collection of nodes that serve reads and writes without regard for hardware failure. There’s a lot of software trickery that goes into making this work, but it works quite well.

SQL Server’s peer-to-peer replication is a distributed database, just not in the sense that I’ve used the term previously. In SQL Server peer-to-peer replication every node is an exact copy of every other node: the same tables, rows, indexes, and views exist on every node. This is where the difficulty begins – if every row exists on every node, how do we know where to update data? The problem is that we don’t know where to update a row. There is no out of the box mechanism for determining row ownership.

Distributed database systems like Riak, Cassandra, HBase, and Dynamo work by assigning an owner to every record, shouldn’t we do the same thing with SQL Server? When we’re spreading data across a number of servers, we have to ensure that writes go to the correct location, otherwise we need to build a large number of checks in to ensure that all nodes have the appropriate updates and that everyone is working on the correct version of data. Otherwise, we run into conflicts. This is the reason I hinted at using peer-to-peer replication combined with write partitioning and a last write wins method of conflict detection. If changes to a row can only occur on server A, we don’t need to worry about updates on other servers – those updates can be ignored since they did occur not in the correct location.

The difficulty lies in finding a way to do all of this. SQL Server’s replication offers no routing functionality, it just replicates data to the appropriate subscribing servers. In order to make sure that data gets to the right place, there must be another piece to the puzzle. There must be a way to correctly locate data.

Record Ownership

If you absolutely must use peer-to-peer replication as a form of load balancing, record ownership is an important concept to consider. Regardless of whether the distributed database is relational or not, software still needs to be aware of where the definitive version of a record can be found. If there’s no way to determine which version of a record is the definitive version of a record, two updates can occur in different locations. This will undoubtedly lead to painful conflict scenarios. Instead of worrying about handling conflicts, we worry about getting data to the right place. Once we know that the data is in the right place, we can trust our database to be as accurate as possible.

I use the term record instead of row for an important reason: a record represents a complete entity in a system. A row may be part of a record (e.g. a line item in an order) but the record is the complete order.

Record ownership is a tricky thing to think about; how do you determine who owns any single row? What’s a fair and efficient way to handle this? Let’s take a look at different techniques and see how they stack up. Here’s a quick list of possible ways we can distribute row ownership in a database:

  • Random
  • Range-based
  • Static

Random Record Ownership

Randomness is frequently used to ensure an even distribution. Randomly bucketizing data turns out to be a very effective way of ensuring that data will be split very close to evenly across any arbitrary number of locations. The difficulty is in ensuring randomness.

Some systems like Riak and Cassandra use a hash function to distribute data ownership around the database cluster. Different nodes are assigned a range of values – if there are four servers in the distributed database, each one is roughly responsible for 1/4 of the data in the database (I’m simplifying, of course, but you get the drift). Special routing code takes care of getting data to clients and sending writes to the appropriate place. The location of a record is typically determined by applying a hashing function to the record’s key. In this way, we can always find a row at a later date: by applying a function to the key we can quickly find the row even if the number of servers in the cluster has changed.

This mechanism provides a reliable way to uniquely identify data and distribute it among many servers. This technique is difficult to accomplish with SQL Server. There is no peer-to-peer replication functionality in the SQL Server space that makes it easy to say “This record belongs on server A and this record belongs on server B.” There’s a reason for this: peer-to-peer replication is a high availability technology. It exists to make life easier in the unfortunate event that your data center slides into the ocean. It’s possible to build some kind of load balancing layer in SQL Server using SQL Server Service Broker (or just about any other technology), but the point remains that SQL Server doesn’t provide out of the box functionality to automatically implement random record based ownership.

Range-Based Record Ownership

Range-based ownership is far simpler than random record ownership. In range-based ownership a range of records are claimed by a single server. This could be users 1 through 100,000 or it could be users whose names start with ‘A’ through users whose names start with ‘K’. At a quick glance range-based record seems like it doesn’t have many down sides: it’s easy to determine where an appropriate record goes. My data goes to server A, your data goes to server B, his data goes to server C.

Range-based record ownership has a major flaw: some servers will experience more load than others. For example, if we’re partitioning by name we will quickly discover that first names aren’t very unique, at least not in Western cultures. In a survey of first names conducted in the UK, one quarter of women were likely to have only one of ten first names in 1994. One in three women was likely to be named Emily. Needless to say, data distribution will cause skew in the activity distribution on different servers. If one server accumulates a clump of very active users (e.g. a group of active early adopters), that server may experience a higher load than the others.

Designing an effective range-based record ownership scheme for SQL Server peer-to-peer replication is possible but very difficult. The effectiveness of the scheme depends on intimate knowledge of write patterns. Most of us don’t have the time to develop a deep understanding of how data is written and then develop a scheme that takes into account those patterns.

Static Record Ownership

With static record ownership, we assign each record to a server when it is created. This could be as simple as assigning a user to the closest server or it could mean assigning records to a server by some other arbitrary means. However this is accomplished, it’s important to remember that some piece code still must able to determine where a record should go and that the mechanism for identifying that initial location should be general purpose enough to meet your user’s needs in the long term.

There are several common ways to split out data. If you have a system that’s multi-tenant, it becomes easy to assign ownership for all of a single client/customer’s data to a single server. If that customer grows, you can buy a separate server or move them onto a different server with fewer users. Every record ends up having a composite key made up of the record identifier and the client identifier, but this is a small price to pay for clearly being able to separate data responsibility by client.

Another way to split out data is geographically. If I sign up for a service, it’s nice if the primary place to write my data is as close to me as possible. In this case, the service might have three data centers: in LA, in New York, and one in London. Much like using a multi-client architecture, a geographic method to determine ownership would use the location as part of the key for each record – records stored in LA would use a composite key with the data center location (‘LA’) and some other arbitrary key value to identify a unique record.

No matter what scheme you decide to use, static record ownership is an easy way to determine which SQL Server should be responsible for writes to a single record. An advantage of static record ownership is that routing can be handled in the application or a sufficiently sophisticated router can handle routing writes without any additional application code being added to the application – just a few load balancer rules will need to be created or changed

In Summation

Here’s the trick: throughout all of this we’ve ignored that order of events is important. We’ve just assumed that when data is being written, we’re guaranteeing the order of events. If the data is being written to random servers, there’s no guarantee of event order. In a naive system, a record might be written to one server and an update applied to a second server before the original record even shows up! Distributing data is difficult. Randomly distributing data is even more difficult. No matter how you distribute your data or distribute writes, remember that distributing data in SQL Server through peer-to-peer replication is a high availability technology. It can be co-opted for scale out performance improvement, but there are some design decisions that must be made.

Previous Post
SQL Server Interview Questions and Answers – Book Review
Next Post
DBA Darwin Awards: Log File Edition

4 Comments. Leave new

  • So the easy (prescribed?) way to deal with this is to partition your use into R and R/W and send the writes to only one box, and the reads everywhere else. Obviously once your writes outpace one servers things get difficult…
    Great blog Jeremiah. I use P2P quite a bit and learned a lot 🙂

  • To add to what John mentioned, the easy way to handle this is to partition the insert/update/deletes so that conflicts do not occur. Peer-to-Peer Replication has an abysmal conflict resolution policy (highest originator ID wins) and is why I much rather prefer Merge Replication for a bi-directional solution. Custom conflict resolvers for Merge Replication are far less invasive from an application standpoint and can be plugged in by a DBA with no change to application logic.

  • Hi Jeremiah,

    Peer to peer replication failed due to Invalid column name (Source: MSSQLServer, Error number: 207). after this we removed replication and configured from scratch again, but still we are getting the same error.

    I would be grateful if you could help me.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.