Resolving Conflicts in the Database

Anyone who has worked with replication has run into problems caused by conflicting data. I’ve lost enough sleep over replication to know that having an effective plan for conflict resolution is important to keeping replication running and healthy.

What Causes Conflicts?

There are a few ways that conflicts can show up in a database. In strict transactional replication, it’s possible for there to be a conflict when a row exists in one place and not another.

Let’s say we have a row in Sales.SalesOrderHeader that is present on both our publisher and subscriber. During some maintenance to delete incorrect orders, the row is incorrectly deleted from the subscriber. An update to the order is correctly written on the publisher. When the command to update the record is run on on the subscriber, there’s a conflict. The command contains instructions to modify a row that doesn’t exist. Replication will be brought to a screeching halt while operations staff figure out how to get the data back onto the subscriber.

In a peer-to-peer replication scenario, conflicts can happen in even more ways than with transactional replication. Many situations are caused by similar scenarios – data is inserted on both servers, data is deleted from both servers, or updates happen to a row that isn’t present. Peer-to-peer replication can also suffer from a situation where rows are different on all servers. SQL Server’s peer-to-peer replication can be configured to use optimistic concurrency; row versions are added to make sure that SQL Server is updating the version of a row it thinks it should be updating. If the versions don’t match, then there’s a conflict that needs to be taken care of.

To look at it a different way, let’s say we have two SQL Servers, one in Los Angeles and one in Charlotte. There’s a user in each city; Alfred is in LA and Barney is in Charlotte. Both Alfred and Barney start updating information about the same customer (customer ALFKI, version 1). Alfred finishes quickly, clicks “Save”, and goes out for coffee. In the background, Alfred’s write is sent to the SQL Server in LA where the row version for ALFKI is compared to Alfred’s row version. Since everything is correct, Alfred’s data is saved. Meanwhile, Barney is a bit slower, but he gets his data finished and he clicks “Save”. Alfred’s data hasn’t replicated to Charlotte yet, so Barney’s save goes through the same way.

At this time, the databases on opposite sides of the country both contain a row for customer ALFKI that has a version number of 2. The problem is that we don’t know which version is correct. When replication kicks off from one server to another (let’s say from LA to Charlotte), the originating server is effectively going to say “Here’s version 2 of row ALFKI.” The receiving server (Charlotte) is going to respond “Uh oh, we got a problem: I also have a version 2 of row ALFKI.” At this point, replication will come to a screeching halt until someone can come in and fix the problem.

Let’s Hug It Out

Unfortunately, conflict resolution isn’t as easy as talking about the problem – there’s no way for SQL Server to figure out the best way to solve the problem. There are a few strategies that can be employed to make the process easier.

  • Manual intervention
  • Logging conflicts
  • Master write server
  • Last write wins
  • Write partitioning

Manual Intervention

This is the default way that SQL Server handles replication conflicts. It’s up to operations staff to figure out how the conflict happened and how it needs to be resolved. If there are a number of conflicts, they all need to be resolved before replication can continue. This is a DBA’s worst nightmare and why many DBAs try to stay away from replication.

The more critical the business, the less likely manual intervention is a possibility you want to consider; 3:00AM is unfriendly at the best of times, much less when a critical application is down.

Log Your Conflicts

Rather than require immediate manual intervention, it’s possible to modify the replication stored procedures to log that a conflict has occurred and keep on merrily replicating data.

This approach has some advantages over manual intervention. As conflicts are logged, they can be written to a queue where conflict resolution software and kick into action and attempt to resolve the conflicts automatically before alerting operations staff. Depending on the type of data modifications being made, it’s possible that there could be few or no true conflicts in your application. Conflict detection and resolution logs can be periodically audited to ensure that there are no major problems or to adjust the algorithms for additional scenarios and to increase the number of scenarios where data can be automatically corrected.

Master Write Server

This approach makes your network topology look like an airline’s spoke and hub system: one lone master server is used to serve all writes. There’s no possibility of write conflicts because all writes happen in the same location. The downside of this approach (as mentioned previously) is that there’s only so much optimization that can be done before a server is overloaded by writes, locks, and replication latency.

Last Write Wins

The last write wins approach assumes that writes are always issued in the correct order and the last write being sent to the database is assumed to be correct. Instead of relying on any kind of conflict detection, conflicts are ignored. This approach works well when all data is written in an append only manner – there are few updates, only inserts of new data. This can also work well when O/R-M code is use to only update columns that have changed. The likelihood of conflicting writes frequently can be reduced in these cases.

Write Partitioning

To be quick about it, write partitioning is a mechanism to ensure that all writes for a single row will always happen on the same server. There are multiple ways write partitioning can be performed. In short, a partitioning key is created by applying a hashing function to a row or application entity. This partitioning key is used to find the server responsible for that chunk of data.

Fully Partitioning Data

The last way to manage conflict resolution is to fully partition data. Peer-to-peer replication scenarios assume that all data belongs on all servers. What if that isn’t a necessity? It’s always possible to split up the data across multiple servers. Rather than write to multiple master servers and wait for merges to occur, it’s possible to spread data out over multiple servers with very little shared data. This removes most places for conflicts to occur and allows the load to be spread out across several servers.

Which Way is Right for Me?

There is no easy answer for this: it depends on both the application and network architecture. Requirements are very different when writes can occur in multiple data centers as opposed to a single data center. Some applications (recording sensor data) will never run into potential write conflicts, while others (content management systems) have the potential for write conflicts.

Previous Post
Should SQL Server Denali Run on Windows XP?
Next Post
How to Get Budget Approval for Conferences

11 Comments. Leave new

  • Great overview! We generally go for the last write wins approach, since it’s efficient and almost always correct. The almost part, though, makes me think fully partitioning might be a safer bet.

    Reply
    • Hi Claire / Brent
      Can you send me the process to setup Last Write wins approach. I have configured Peer to Peer Replication but when conflicts happens then rows are not updating same in all the servers. still replication is working because I sent continue on conflicts.

      Reply
      • Jeremiah Peschka
        October 1, 2013 10:44 am

        Hi Venkat,

        With a quick Google, you’ll probably come across either SQL Server Books Online or this fantastic article on Conflict Resolution that was posted on Database Journal. You have to do a bit more than flip the “continue on conflicts” switch.

        Reply
  • Have a situation where we are currently looking into how to handle this. The application is tearing up the CPU on our current Master Write database so I would like to move to peer to peer since all data has to be at both sites. The issue is that this is a content management system of sorts and the application does not allow for the data to be partitioned. So all that being said, does peer to peer in SQL Server 2008 allow the for the last write wins approach?

    Reply
    • Great question, Colin. You can effectively force SQL Server peer to peer replication to use a last write wins approach by disabling conflict detection.

      Unfortunately, I don’t know that peer to peer replication is going to fix your high CPU problem. What’s causing the high CPU problem? Is it inserts? Selects? Updates? If it is inserts, updates, and selects, the next step is to dig into the cause of the high CPU and figure out if it’s triggers, a large number of indexes, or something else causing the high CPU usage. I bring that up because it’s better to isolate the problem before moving to peer to peer replication and then having to deal with both the high CPU use and peer to peer replication.

      Feel free to email me at jeremiah@brentozar.com, this is beyond what I can do in a blog comment, but I’d love to help.

      Reply
  • What about merge replication? It has many different types of automatic conflict resolvers built in. Modifying the replications stored procedures is waaaay more work than using merge replication, espeically if you have a lot of tables.

    The general recommendation I give to people is to code the application so that conflicts cannot occur or use merge replication.

    Reply
    • I’d prefer peer-to-peer over merge replication for a few reasons. Peer-to-peer replication is designed for a masterless, distributed topology where you might have servers spread across multiple datacenters. It gives a lot more redundancy that you’ll get with a few machines in a bi-directional replication scenario.

      I’m not a fan of hub and spoke systems because when the hub breaks, you’re SOL. Failure tolerance is an important goal for most people. As I understand it, merge replication works better when you have a large number of slave systems that sync up with a master, as opposed to many peers that are online and operational at the same time.

      That being said, I agree that conflict detection and resolution should be moved out of the database tier. It can still happen, though, so it’s always best to consider it when designing a system.

      Reply
  • I really like your article as a discussion about how to deal with conflicts in SQL, it’s a really nice article to let people know the causes and ways to handle them. I found another article useful that I’d like to share, it doesn’t give the breadth of conflicts that your article has, but teaches people how to actually resolve a SQL conflict using SQL queries. Very helpful and covers aspects of conflicts that you wouldn’t think about, such as the effect on business data. Here’s the article:

    http://public.madeinengland.co.nz/sql-server-conflict-resolution/

    Reply
    • Jeremiah Peschka
      January 2, 2013 8:41 pm

      That’s a good link, thanks for sharing. I’ve changed my opinion somewhat since writing this, but that’s a different blog post for a different day.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.