Anyone who has worked with replication has run into problems caused by conflicting data. I’ve lost enough sleep over replication to know that having an effective plan for conflict resolution is important to keeping replication running and healthy.
What Causes Conflicts?
There are a few ways that conflicts can show up in a database. In strict transactional replication, it’s possible for there to be a conflict when a row exists in one place and not another.
Let’s say we have a row in
Sales.SalesOrderHeader that is present on both our publisher and subscriber. During some maintenance to delete incorrect orders, the row is incorrectly deleted from the subscriber. An update to the order is correctly written on the publisher. When the command to update the record is run on on the subscriber, there’s a conflict. The command contains instructions to modify a row that doesn’t exist. Replication will be brought to a screeching halt while operations staff figure out how to get the data back onto the subscriber.
In a peer-to-peer replication scenario, conflicts can happen in even more ways than with transactional replication. Many situations are caused by similar scenarios – data is inserted on both servers, data is deleted from both servers, or updates happen to a row that isn’t present. Peer-to-peer replication can also suffer from a situation where rows are different on all servers. SQL Server’s peer-to-peer replication can be configured to use optimistic concurrency; row versions are added to make sure that SQL Server is updating the version of a row it thinks it should be updating. If the versions don’t match, then there’s a conflict that needs to be taken care of.
To look at it a different way, let’s say we have two SQL Servers, one in Los Angeles and one in Charlotte. There’s a user in each city; Alfred is in LA and Barney is in Charlotte. Both Alfred and Barney start updating information about the same customer (customer ALFKI, version 1). Alfred finishes quickly, clicks “Save”, and goes out for coffee. In the background, Alfred’s write is sent to the SQL Server in LA where the row version for ALFKI is compared to Alfred’s row version. Since everything is correct, Alfred’s data is saved. Meanwhile, Barney is a bit slower, but he gets his data finished and he clicks “Save”. Alfred’s data hasn’t replicated to Charlotte yet, so Barney’s save goes through the same way.
At this time, the databases on opposite sides of the country both contain a row for customer ALFKI that has a version number of 2. The problem is that we don’t know which version is correct. When replication kicks off from one server to another (let’s say from LA to Charlotte), the originating server is effectively going to say “Here’s version 2 of row ALFKI.” The receiving server (Charlotte) is going to respond “Uh oh, we got a problem: I also have a version 2 of row ALFKI.” At this point, replication will come to a screeching halt until someone can come in and fix the problem.
Let’s Hug It Out
Unfortunately, conflict resolution isn’t as easy as talking about the problem – there’s no way for SQL Server to figure out the best way to solve the problem. There are a few strategies that can be employed to make the process easier.
- Manual intervention
- Logging conflicts
- Master write server
- Last write wins
- Write partitioning
This is the default way that SQL Server handles replication conflicts. It’s up to operations staff to figure out how the conflict happened and how it needs to be resolved. If there are a number of conflicts, they all need to be resolved before replication can continue. This is a DBA’s worst nightmare and why many DBAs try to stay away from replication.
The more critical the business, the less likely manual intervention is a possibility you want to consider; 3:00AM is unfriendly at the best of times, much less when a critical application is down.
Log Your Conflicts
Rather than require immediate manual intervention, it’s possible to modify the replication stored procedures to log that a conflict has occurred and keep on merrily replicating data.
This approach has some advantages over manual intervention. As conflicts are logged, they can be written to a queue where conflict resolution software and kick into action and attempt to resolve the conflicts automatically before alerting operations staff. Depending on the type of data modifications being made, it’s possible that there could be few or no true conflicts in your application. Conflict detection and resolution logs can be periodically audited to ensure that there are no major problems or to adjust the algorithms for additional scenarios and to increase the number of scenarios where data can be automatically corrected.
Master Write Server
This approach makes your network topology look like an airline’s spoke and hub system: one lone master server is used to serve all writes. There’s no possibility of write conflicts because all writes happen in the same location. The downside of this approach (as mentioned previously) is that there’s only so much optimization that can be done before a server is overloaded by writes, locks, and replication latency.
Last Write Wins
The last write wins approach assumes that writes are always issued in the correct order and the last write being sent to the database is assumed to be correct. Instead of relying on any kind of conflict detection, conflicts are ignored. This approach works well when all data is written in an append only manner – there are few updates, only inserts of new data. This can also work well when O/R-M code is use to only update columns that have changed. The likelihood of conflicting writes frequently can be reduced in these cases.
To be quick about it, write partitioning is a mechanism to ensure that all writes for a single row will always happen on the same server. There are multiple ways write partitioning can be performed. In short, a partitioning key is created by applying a hashing function to a row or application entity. This partitioning key is used to find the server responsible for that chunk of data.
Fully Partitioning Data
The last way to manage conflict resolution is to fully partition data. Peer-to-peer replication scenarios assume that all data belongs on all servers. What if that isn’t a necessity? It’s always possible to split up the data across multiple servers. Rather than write to multiple master servers and wait for merges to occur, it’s possible to spread data out over multiple servers with very little shared data. This removes most places for conflicts to occur and allows the load to be spread out across several servers.
Which Way is Right for Me?
There is no easy answer for this: it depends on both the application and network architecture. Requirements are very different when writes can occur in multiple data centers as opposed to a single data center. Some applications (recording sensor data) will never run into potential write conflicts, while others (content management systems) have the potential for write conflicts.