Performance Tuning SQL Server Transactional Replication: A Checklist

Last Updated 6 years ago

salmon-swimming-shutterstock_154697786 — Swim faster!

SQL Server transactional replication is a blessing and a curse. It’s a great developer tool that lets you scale out data to multiple servers, even using Standard Edition. But as your business picks up, your datasets get larger, and your customers grow more demanding, replication can start to fall behind. You need to learn how to tune it to keep up.

Before You Start Tuning Replication….

Make sure you can prove when changes you make to replication improve performance. Or find out quickly if you make something worse. (It’s gonna happen.)

If I’m going to make changes with something as complex as replication, here are my basic requirements:

Monitoring must alert the DBA team replication latency exceeds allowed thresholds
Monitoring needs to track historical latency to show if my changes reduce latency
I need a production-like staging environment to test my changes.

If you haven’t configured monitoring for transactional replication, read how to do it here. The “easy” and “medium” steps are a small amount of work and are incredibly useful.

Don’t Skip “Normal” SQL Server Performance Tuning!

I’m going to give you a lot of transactional replication specific performance tuning tips in this post. But don’t skip other elements of SQL Server performance tuning! Wait statistics, virtual file stats, and identifying bottlenecks are still important. Get started with SQL Server performance tuning here.

1) Are You Using the right versions of Windows and SQL Server for Replication?

For replication performance, you want to be on Windows Server 2008 and SQL Server 2008 minimum. It really makes a difference.

2) Have You Scaled up your distributor?

When replication performance is important, use a dedicated distributor server so that your distributor doesn’t have to fight with a publisher or subscriber for CPU, memory, network, or storage resources. If you need high availability for the distribution database, you have limited options: failover clustering is pretty much the only way to go.

3) Is Replication really what you need?

Check your assumptions. Could you get by with Transaction Log Shipping? Are you a good candidate for Availability Groups?

4) Are You Using the right type of subscriptions and tools for replication over the WAN?

Argenis Fernandez shares what he learned from tuning transactional replication over wide area networks. This is a great use of “pull” subscriptions.

5) Have You Made sure Articles are NOT in more than one publication?

Kendal Van Dyke shows that having articles in multiple publications can also magnify the number of commands in your distribution database. That bloats your distribution database and will slow you way down as activity picks up.

(Note: If you’re using row-filtering on your articles, you may be the exception to this rule.)

6) Do You Only run Snapshots when you need them?

I’ve come across cases where the Snapshot agents for a publication were set to run on a schedule, even when replication wasn’t being initialized. I believe that when someone was setting replication up, they had checked off the option to “Schedule the Snapshot Agent to run at the following times” without realizing that it wasn’t needed. Don’t run snapshots on a schedule, it will lock up the publishing database. (When you open “Job Activity Monitor” these jobs show up with the category “REPL-Snapshot”.)

7) Are you Using “Immediate Sync” to your Advantage?

The immediate sync option is hard to spot when you first set up replication. Setting this to false can help minimize the impact of running a replication snapshot if you need to add new articles, or even remove and re-add a few articles. Learn more about it here. As always, test your changes outside of production first! (I personally have a fear of having immediate sync set to true because of this old bug from SQL Server 2005.)

For information on how the immediate_sync can also impact your distribution database, read more here.

Thanks to Allen McGuire for his comment reminding us on the benefits of this setting!

8) Are You Replicating only the articles and columns you need on the Subscriber?

Don’t just “add all.” For scalability, replicate only the articles that must be in replication, and only the columns that need to be replicated. This not only helps overall performance, this reduces the impact of times when you may need to re-initialize replication. This goes especially for large object / LOB data types. If you must replicate LOB types, Microsoft recommends that you use newer types such as nvarchar(max), varbinary(max), etc.

9) Do You Set the ‘Replicate Schema Changes’ subscription option to false when needed?

New columns being added to a published article shouldn’t be replicated to the subscriber unless they really need to be there. You can turn off the replication of schema changes by setting the ‘Replicate Schema Changes’ subscription option to ‘false’. (It defaults to ‘true’.)

10) Have You Considered static row filters?

“Static row filters” allow you to include only certain rows in a given publication. There is overhead to applying the row filter itself: Microsoft only recommends you use the row filters if your replication setup can’t handle replicating the full dataset.

11) Have You Optimized your subscriber databases for re-initialization?

Face it: re-initialization happens. Unfortunately it usually happens when you least expect it and had plans to be doing something else. There are a few things you can do to keep re-initialization from making your publication database unusuable for long periods.

Isolate your replication subscriber tables into their own database, and only keep replicated articles in there. This also typically helps you use use recovery models that are optimized for minimal logging in that database to speed up bulk inserts. Consider using synonyms to quickly “repoint” to replicated articles to give you flexibility.
Evaluate whether initializing replication from backup could help.

12) Have you Considered Using multiple publications?

There’s pros and cons to splitting out publications. Here’s the pros:

You can isolate large tables that are the biggest problems for snapshotting into their own publications so that they get their own snapshots. That can be helpful if there are other tables you might need to remove and re-add to replication more frequently. (The immediate_sync setting can also help with this, see #7 above.)
This will give you multiple distribution agents so changes can be applied to your subscribers in parallel. (This one’s a pro and a con, as you’ll see.)

Cons:

This is more work to manage. You should be checking in scripts for your entire replication into source and have documentation on everything. More publications makes that whole process harder.
All those distribution agents can backfire if you don’t have the resources to support them working on the subscriber at the same time.
Be mindful not to put non row-filtered articles in more than one publication as noted above.

13) Are “Subscription Streams” Right for You? (or not?)

This option allows you to raise the number of connections that the distribution agents use to apply changes to the subscriber. But there’s overhead to managing all these threads, and you can get into situations where transactions aren’t fully consistent if you hit problems.

This feature is primarily recommended for use on situations where you have high network latency and are not changing the setting often. Keep in mind that if you’re splitting your articles into multiple publications for other reasons, you’ve already got multiple distribution agents running in parallel.

14) Are You replicating non-clustered indexes blindly?

Confession: I did this wrong for years. It’s very easy to set up transactional replication and send all the indexes over to the subscriber: you just set “Copy nonclustered indexes” in the articles property to “true”. But you’re only required to replicate the Primary Key and unique indexes. There’s two big problems with replicating all the nonclustered indexes:

It can make re-initialization slower. By default the subscriber will have objects created, bulk load the articles, create “extra” nonclustered indexes, then “catch up” on any changes that came in after the snapshot was pushed. You definitely want to make sure that all “extra” nonclustered indexes are disabled or don’t exist while that bulk load is happening. But if a lot may have changed since the snapshot ran, you may not want the indexes to be created until the very end, anyway. Handling the nonclustered index creation outside of replication gives you that flexibility.
It’s very rare for the exact same queries to run on the publisher and subscriber. You usually want nonclustered indexes that are specific to the workload on the subscriber, anyway.

Identify the “extra” nonclustered indexes specific to the queries that run on the subscriber. Script them out, check them into your source control, and have a process to deploy them whenever replication needs to be re-initialized.

15) Could publishing stored procedure execution work?

If your workload is run entirely by stored procedures, this can be a great option. (FYI, there is a bug/hotfix for this in SQL Server 2012/2014 listed below.)

16) Are You Using Read Committed Snapshot Isolation on replication subscribers?

It’s common for the distribution agent to have to fight with other processes while it tries to insert, update, and delete rows in the subscriber database. One DBA that I worked with removed a huge amount of blocking and speeded up processing by using RCSI on a subscriber database. Learn more about RCSI here.

17) Have You Ruled Out Virtual Log File fragmentation on the publisher database?

If you’ve got more than 10K virtual log files on your publication database, it could slow down replication. I’ve particularly seen this cause replication to get behind when a large operation like an index rebuild was run. Our free sp_Blitz® script will diagnose high numbers of VLFs for you.

18) You Haven’t Been Randomly Fiddling with Settings on the Distribution Agent, Have You?

There’s a lot of little settings you can change on the Agent Profiles in replication. I’m not a huge fan of changing them unless you can prove they helped your performance, though. These are typically “fine tuning” settings after you have the right architecture in place, in my experience.

19) Have You Looked Out for Replication Bugs / Fixes?

Man, these can get you. Like any other complicated tool, things can go wrong. Here’s a few highlights:

KB 2674882 – Deadlocking distribution agents OR replication queries with very high memory grants. This can occur on SQL Server 2005, 2008, or 2008R2.
Unexpectedly inactive subscriptions. There are many performance reasons to upgrade from SQL Server 2005, but if you must be on it then you shouldn’t run anything less than SP4.
KB 2958429 – Service Packs Matter. SQL Server 2012 SP2 added some features to replication logging (and even a few improvements to Peer to Peer replication, oddly enough). Scroll to “Highlights for SQL Server 2012 SP2” and expand “Replication” to see the list. If you apply this service pack, you may also want to apply KB 2969896.
KB 2897221 – Stack dumps/ non-yielding schedulers if you’re replicating stored procedures. SQL Server 2012 or 2014.
KB 949296 – Replication Agents cannot run when you have many agents and the Windows desktop heap is “used up”. (Thanks to Michael Bourgon for suggesting we link to this one.)
The Log Reader Agent may fail to start on SQL Server 2012 Service Pack 1.
KB 2655789 – FIX: Replication Log Reader Agent fails with “18773” and “18805” error messages if you set the ‘large value types out of row’ table option to ON for the publication article table Print Print Email Email

Public Safety ANNOUNCEMENT: Replication Needs Special Treatment for Hotfixes and Upgrades

With any hotfixes, it’s always good to review KB 941232, “How to apply a hotfix for SQL Server in a replication topology.” (You’ll need special steps if you’re using Database Mirroring or AlwaysOn Availability Groups.)

Be careful with the steps you take to upgrade and ‘drain’ replicated transactions. If you don’t do this, in some cases you may have big problems during an upgrade.

Got a Transactional Replication Horror Story or Recommendation?

Or do you have a favorite setting I haven’t mentioned here, or something you disagree with? Replication is full of controversy! Share it with us in the comments.

Backing Up an Oracle Database

Comparing Estimated and Actual Execution Plans in SQL Server

76 Comments. Leave new

Patrick Cahill
July 15, 2014 10:06 am

Very useful post. I’ve been using transactional replication for years, but this gave some more things to consider. Just a FYI, your link for “configured monitoring for transactional replication” is not working. Thanks for the post.

Reply
- Kendra Little
  July 15, 2014 10:08 am
  
  Oh thank you! I fixed that link– thanks so much for mentioning that.
  
  Reply
Allen McGuire
July 15, 2014 10:51 am

One setting I’ve found invaluable is @immediate_sync – when you use the wizard to set up replication and accept the defaults, it’s set to true. That means any time you add an article to a publication and run the snapshot agent, the snapshot contains all articles. By setting it to false, it will only contain the newly published article(s).

This is very, very useful. The following post is the one I keep handy for the syntax:
http://www.replicationanswers.com/TransactionalOptimisation.asp

Reply
- Kendra Little
  July 15, 2014 11:00 am
  
  Oooo, nice link. I’ll update the post in a bit to add something about immediate_sync.
  
  I have a fear of having immediate_sync set to true because of the SQL Server 2005 bug I hit that’s listed under “Unexpectedly inactive subscriptions.”
  
  Reply
- Kendra Little
  July 15, 2014 12:25 pm
  
  This is now #7 in the list. It ranks high enough not to be added at the end, I think!
  
  Reply
Taiob
July 15, 2014 10:52 am

If you have a large shop you definitely want to change some of the settings for logreader/distribution agents. For example: CommitBatchSize, CommitBatch Threshold,Packet size, MaxBcpThreads.

Reply
- Kendra Little
  July 15, 2014 10:57 am
  
  Hey Taiob,
  
  These are the settings that I think people need to be careful with and prove that changing them has a positive impact on performance. I’ve known many people to randomly tweak these and not know if it helps or hurts, or if they set them to a good or bad value. Then they’re afraid to change them back.
  
  So I don’t think it’s bad to tweak these, I just think these are more “fine tuning” settings and would look at everything in this post first, and also make sure I have some way to prove these help (which can be tricky).
  
  kl
  
  Reply
mbourgon
July 17, 2014 8:41 am

You might include something about the whole Desktop Heap issue (TL;DR – too many agents use up a particular memory bucket which can only be a certain size, regardless of the amount of RAM you have),

We’ve run into it because we have a large number of subscribers and a large number of publications. The desktop heap issue is a silent, deadly one. The first time you hear about it is because it turns out one of your agents hasn’t done anything since it started. If you’re lucky, you have periodic subscribers that fall out of sync, alerting you something’s amiss. I’ve written 3 different automated repl monitors to try and catch them in different ways.

Reply
- Kendra Little
  July 17, 2014 9:09 am
  
  Ooo, I’m glad you mentioned this! I was reading KB 949296 (http://support.microsoft.com/kb/949296) the other night and wondered, “does this really happen to a lot of people?”
  
  Couple questions if you wouldn’t mind answering them. Do you have more than 200 agents? Or has it hit you with less than that? (That’s the number they mention in the KB article.)
  
  What’s the best approach you’ve found to monitoring this?
  
  I added a link to this to #11 in this post, and also added a mention over in our post on Monitoring Transactional Replication (https://www.brentozar.com/archive/2014/07/monitoring-sql-server-transactional-replication/)
  
  Reply
  - mbourgon
    July 18, 2014 10:13 am
    
    Ha! I suspect we’re an outlier (we have our subs broken up because we have several large tables replicated, and different levels of latency are required for others – just because table X is massively updated once a week, we don’t want slowdowns in table Y).
    
    (Aside: SQL Server 2000 didn’t have this problem, and it was because the agent worked differently, and (IIRC) all pubs going to the same sub used the same agent. When 2005 came out they changed it, which was good (one bad pub couldn’t invalidate others), but caused issues with number of the distrib.exe running.)
    
    200 seems very…. optimistic. We had to change things around 150. The way we initially caught it was because we had a monitor to make sure tables were in sync across servers, and some just wouldn’t sync, or would (and then others would fail. We wound up writing 3 separate replication monitors for it (they’re on my blog, fwiw).
    
    The initial fix at that point was to move periodic replications from continuous to every X minutes, and bump up the desktop heap by 64kb (and later 128kb). I don’t remember seeing the error messages that the KB mentions, and I remember the jobs were succeeding. Once we added more subs, though, we had to move to the two-tier architecture, which gave us more heartache (but a ton more flexibility).
    
    Reply
    - Leon
      July 22, 2014 11:13 am
      
      If you’re an outlier then I’m there with you. I agree with your threshold of 150, ours was a little under that actually. We hit a similar wall with desktop heap on Server 2012/SQL 2012 (same was true for 2005/2008). We’re at 180 agents now (log reader + distribution agents) which run continuously. Our snapshot agents only run once/week and they’re staggered so they don’t overlap. If they all run at the same time then our distribution server will croak. Currently our desktop heap is at 2048KB which I think allows us to go past the 150 mark. I call baloney on the 200 mark in the KB, at least without changing the heap settings.
      
      Reply
      - Steve J
        April 19, 2017 9:12 am
        
        Leon/Kendra – I am a late comer to this thread, though I have been running replication since it first came out in 6.5. Desktop (Interactive and non-interactive) is a modifiable entry which can increase above those shown in the KB considerably, based upon the memory in the server. In my distributors I have it set at 20480KB, running 2800 agents without issue. (Yes, we have a very large replication system) I have 32GB of RAM on the box with 15GB to SQL and the rest for the OS which is where the Desktop Heap comes from. Also I have found that making changed to the options for the logreader/distribution agents may be overkill as this applies to all subscribers. You can go into the SQL Agent and make these modifications at the individual subscriber level (https://docs.microsoft.com/en-us/sql/relational-databases/replication/agents/replication-log-reader-agent and https://docs.microsoft.com/en-us/sql/relational-databases/replication/agents/replication-snapshot-agent) In this way you can tailor specific subscribers who may be problematic.
Ron Wheeler
July 22, 2014 11:19 am

Kendra,
Thank you for your post. I read it with much interest. I am new to replication and am trying to take it all in. Can you recommend any of your training, in person or video, which would be useful related to replication?

Reply
Leon
July 22, 2014 11:23 am

Also, wanted to add tablediff.exe to the mix. Even though this utility has existed since SQL 2005, we’ve just discovered it. We have it running across our entire environment and it automatically corrects any data discrepancies within our replicated tables. Here’s the MSDN article explaining it for anyone interested.

http://msdn.microsoft.com/en-us/library/ms162843.aspx

Thus far the tool has been wonderful and has relieved my team from the manual data synch burden that has plagued us for years. The larger question I have for MSFT is, why can’t this tool just be rolled into the native replication toolkit, why the separate .exe outside of SQL? But I suppose that’s another topic for another day…

Reply
Nico Botes
July 27, 2014 5:43 pm

Thanks Kendra, great stuff! We encountered the “watermark”-bug which lead to this: http://support.microsoft.com/kb/2922526 – fixed in CU13 (SP2) for SQL 2008 R2. It was nasty to say the least…

I would strongly recommend that if you do use [@sync_type=] ‘replication support only’ that you only do so on SQL 2008 R2 and higher…

The -SkipErrors parameter of the Distribution Agent did help (not advised), as did redgate Data Compare!

Reply
Clayton Hoyt
August 3, 2014 1:13 pm

Hi Kendra

Your RCSI link above hits a 404. Can you post a new link or, better yet, give us your opinion on the use of RCSI on a transactional replication subscription?

Thanks!

Reply
- Kendra Little
  August 4, 2014 10:16 am
  
  Thanks for letting me know about the bad link! I fixed it. (https://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/)
  
  RCSI can be truly great to use on a subscription if blocking is a problem. It’s a database wide setting, so if your replication subscriber is in a database that has other tables/ other application tables, you’re turning it on for the whole thing: and that may not be supported by your subscriber application. You also need to make sure tempdb can handle it on the subscriber and that you monitor for longrunning transations/ a version store that can’t be cleaned up.
  
  Reply
Sam
August 4, 2014 5:19 am

I think it’s also important to have structure and find a good way too organise this all. Some things you’d written here are very applicable to my own behavior. Like only making snapshots when I need them. That’s very frustrating, because I often need them and then I have to invest time in this. Even though I can rather do a bunch in just one try. But I like the way you think, so I think your checklist is definitely going to help me out.

Reply
Marcio
September 3, 2014 1:06 pm

We have a number of publications based off of GreatPlains, and an issue we’ve ran into is that many GP tables only have non-clustered indexes. Some of these tables grow to be very large, and once they’re replicated (with the flag to copy non-clustered indexes set to off), regular update statements that run ok on production will take many, many hours (or days) to complete on the subscribers, due to the lack of those indexes.

Since I dislike the idea of just copying all the indexes to all the subscribers, I’ve been having to watch the distribution run ok until it slows to a crawl, figure out which query need indexes from prod, stop the distributor for that one sub, deploy a new index and then resume the replication. It’s been touch and go, but it’s working…

Reply
- Brent Ozar
  September 3, 2014 1:07 pm
  
  Marcio – wow, I’m surprised to hear that doing replication is supported with Great Plains.
  
  Reply
  - Marcio
    September 3, 2014 2:48 pm
    
    I’ve been to a few shops that have it setup (and it seems to works ok, asides from the occasional maintenance headache), but that’s the first time I hear that it might not actually be supported. Yikes! A bit of googling and I found this: http://support.microsoft.com/kb/926490
    
    It turns out it is supported for only snapshot and one-way transactional, but there are additional requirements so they’ll play nice. Good to know!
    
    Reply
Bill Kline
September 8, 2014 2:35 pm

An issue we ran into lately had to do with collation. The publisher database had a different collation than the subscriber database (which was the default instance collation). Collation errors and warnings were popping up in queries when comparing character columns to literals and columns from TempDB tables.

Then I discovered the “CopyCollation” option and recommended we set it to false. This solved the problem – the destination database collation was applied to the character columns.

Reply
Itai
September 19, 2014 4:10 am

“One gotcha: the row-filter is only evaluated when the row is inserted, not when the row is updated, so you really want this to be a value that doesn’t change.”

That’s actually incorrect… I tested this on SQL 2012 with two publications configured to use the same set of tables but with different static row filters, each publication was associated with a different subscriber DB and sure enough every insert and *update* done on the master table was reflected in the appropriate subscriber DB with no need to specify the static row filter column in the update operation itself.

Reply
- Kendra Little
  September 19, 2014 8:13 am
  
  Thank you for this comment! I just ran through a simple test on SQL Server 2012 and the update worked and the value went through the row filter, so I removed that sentence from the post.
  
  Here’s what’s going to drive me nuts though: that gotcha warning came from a discussion in the comments of another post, and *I tested it then too* and I got different test results than I see today. So now it’s going to drive me crazy trying to figure out what was different and why I had different results. It’s possible that my test was done incorrectly and the update was still in the distribution database, but I’m usually pretty good at checking for undistributed commands, etc, before believing the results. But my results this morning do match what you’re seeing, so the blog post is now consistent with that.
  
  Reply
  - Itai
    September 19, 2014 1:17 pm
    
    No worries 🙂 I also experienced replication config to be somewhat tricky…
    
    Here is another “gotcha”: Consider the scenario I described above (1 Publisher w/ 2 publications) now associated with only one Subscriber and configured to publish the same set of tables to different *destination object names* (e.g PubTable split to SubTable1 and SubTable2 using row filter). Well … be advised that changing the destination object name will not automatically generate a destination specific name for the Statement Delivery Stored Procedures (default), which means that the second publication SP’s will overwrite the first Publication SP’s at the Subscriber. You can imagine what that means… Anyway, manually renaming the SP’s during publication definition resolves the problem (Subscriber will get distinct SP’s for each publication) but I doubt if this is actuality a “supported scenario”.
    
    Reply
Dave Hall
February 10, 2015 2:49 pm

So I have a peculiar issue replicating a table to a non-trusted standalone machine. The table is about 30Gigs, so I’ve set it up on its own Publish/Distribute Object at the source database instance. I can get the snapshot to push, and data changes following the snapshot application will replicate for a few hours after the snapshot is finished bulk loading in to the destination database — but then updates stop replicating.

Distributor says: ‘Delivering Replicated Transactions’ with a current [Action Time] found in [Distributor to Subscriber History] tab in Replication Manager.

No errors or indications of Performance Latency beyond 3-7 seconds are reported on Publisher, Distributor, or Subscription Agents, but [Undistributed Commands] continues to build into the tens of millions as reported in Replication Manager under the specific Subscription Watch List object.

This is a Transactional non-updatable Replication object.

This is a PUSH Replication process.

Replicating from SQL Server 2008 to SQL Server 2008 R2

Other Replicated Objects from the same database instance with the same logins, destination database and so forth appear to be up-to-date, showing many distributed commands up to current.

There are no filters on the replication object, no insert/update/delete constraints, no triggers on destination table.

This is a highly busy updated table in their production environment.

If I could find an ERROR notification somewhere, or some indication as to why no updates are flowing to the Subscriber — that would be terrific!

Reply
- Kendra Little
  February 10, 2015 4:58 pm
  
  Oh, ouch. Are some commands being applied to the subscriber and it’s just slow? Or are no commands being applied at all?
  
  I remember one case where something like this happened and the issue was that indexes on the subscriber tables weren’t correct, and each command was having to do a ton of IO due to index scans. But your scenario could be totally different.
  
  Reply
Krunal Mehta
May 13, 2015 2:07 pm

I cam across your post and i found point #5 link is broken.

correct link is :

http://www.sqlservercentral.com/blogs/kendalvandyke/2010/08/31/replication-gotcha-_2D00_-including-an-article-in-multiple-publications/

Reply
- Kendra Little
  May 13, 2015 3:25 pm
  
  Thanks! Fixed it.
  
  Reply
gayathri shenoy
May 21, 2015 11:32 am

I get random ‘Row not found’ errors from the distributor to subscriber .

my subscriptions are not updatable. I’ve tried to verify the logs, no luck. Please help

Reply
- Kendra Little
  May 22, 2015 10:29 am
  
  It sounds like someone may have modified rows in your subscriber tables. Even if you aren’t using updatable subscriptions, it’s possible for any user to modify rows in those tables, and this can wreak havoc with replication.
  
  You may be able to see more information on the failing commands using a trace, but the safest way to fix this is to re-initialize replication if it’s not clear what happened.
  
  Reply
Archie
June 23, 2015 6:20 am

Good day,

If I wanted to calculate the memory overhead introduced specifically from transactional replication, how would I go about it? Would it be a DBCC command or a specific DMW, just some direction.

Many thanks
Archie

Reply
- Kendra Little
  June 23, 2015 9:25 am
  
  Hi Archie,
  
  The overhead is going to vary by many factors: the number of publications / agents you have, where the distributor is/cleanup settings, whether you have it set up as push/pull.The overhead isn’t all inside the SQL Server itself, the agents run as their own programs. And there’s memory for processing, as well as possibly for buffer cache for the distribution database. So there’s no simple way to predict or measure that, sorry.
  
  Kendra
  
  Reply
Roman
July 2, 2015 11:34 am

Good day,

Ive been tuning my transactional replication recently and Id like to add a couple more tips that helped me:
1. Strive to use separate distribution database for each publisher.
2. Set subscription expiration to 48 hours or less depending on your business goals of course.
3. Set snapshot always available to false.

P.S. I was using SQL Server 2014

Thanks,
Roman

Reply
Mk
August 27, 2015 1:10 am

We can use windows sync tool for sync in merge replication or a batch file with parameters defined in to it will help transaction or merge replication to sync.
I have worked on transaction replication and merge replication with 400 subscriber which gave me lots of exposure in replication. … There are many bugs and constrain with replication. … In 2005 … I had an issue where a table called account was having a column amount …. so when the data was transferred from subscriber to publisher the column of that table contains a jargon values … rest of the data was ok within the table.I had a discussion with Microsoft regarding the same sens them the database of subscriber and publisher thy checked but couldn’t came in to conclusion why it was happening due which client was frustrated and removed the replication and migrated the database to ORACLE.

Reply
Mk
August 27, 2015 1:21 am

Remember some system tables need to be re indexed which helps in gaining the performance and also expired to 24 hrs … for These u can create a job insert a record in a table (which shud be included in ur pub and sub )then truncate it on daily basis which will help ur sub never expire and will gain performance and re initialization is not required.

Reply
kevin3NF
September 29, 2015 9:46 am

Great article and comments…we are getting latency here like crazy, across all subs which seems to be related to the “too much data at year-end” issue 😀

Reply
shrikant kale
November 2, 2015 10:46 am

Hi

Thanks for a wonderful post and great discussion. I am in process of analyzing performance of a transnational replication, and had a query about same.

Does the method of publishing stored procedure works even if mirroring is setup. As per my understanding log of transactions is shipped from primary server to mirroring server. So not sure that if I publish a stored procedure to save number of commands for replication, would they still be generated to maintain mirroring.

Regards
Shrikant

Reply
- Kendra Little
  November 2, 2015 1:49 pm
  
  The setting to publish commands only works for replication, it does not impact mirroring. So if you are mirroring a replication publisher, changing that setting does not impact the mirroring technology.
  
  Reply
  - shrikant kale
    November 3, 2015 3:24 pm
    
    Thanks Kendra, that helps.
    
    Reply
Thomas Ng
November 9, 2015 2:41 am

Does merge replication’s snapshot agent also like transactional replication no need to schedule to run a regular job?

Reply
- Kendra Little
  November 9, 2015 10:07 am
  
  Yep– snapshot is for initialization.
  
  Reply
  - Thomas Ng
    November 9, 2015 7:10 pm
    
    Is that we need to run the snapshot again when a article is added to publication or a table column is altered?
    
    Reply
    - Kendra Little
      November 10, 2015 10:20 am
      
      Make sure you have a test environment if you’re going to change articles so you can see exactly what you need to do to get the desired result. Sometimes it doesn’t work like you think it’s going to work (or how it’s documented to work) — I’ve had weird stuff happen particularly when I have had downlevel subscribers.
      
      Reply
Amy
November 16, 2015 9:08 am

Hello,

Is there a limit on how many transactions from one table a transaction replication can handle? I have to purge historic data which is about 5 million rows from one table which is replicating.

Please advise.

Reply
- Kendra Little
  November 16, 2015 10:04 am
  
  There’s no hard limit, and performance varies dramatically by environment depending on your tolerance for replication latency, hardware, size of tables replication, placement of distribution database, number and location of subscribers, etc. If you’re replicating rows (not stored procedure commands), then every row you delete will need to have a command created in distribution and then it’ll have to be applied to subscribers.
  
  I was once in an environment with a very low tolerance for latency (5 minutes behind was a huge problem), and we needed to delete millions of rows. It was row based replication, no table partitioning. We wrote code to delete a configurable amount of rows in a loop, and with a configurable delay between runs. We tested the code to death in a similar environment with replication to find the right numbers to start with for rows to delete and delay amount. We then ran the script in bursts in production in less-critical times and watched replication latency super closely while we did it and slowed it down if it became a problem.
  
  Reply
  - Amy
    November 16, 2015 11:19 am
    
    Thanks Kendra.
    
    I thought there is a max limit in distributor or log reader agent. Of course it depends on the network connection, server configuration, replication traffic, latency etc.
    
    I will start with 100,000 and see how it goes.
    
    Thanks again
    Amy
    
    Reply
    - Kendra Little
      November 16, 2015 11:25 am
      
      You might be thinking of the distribution retention period that defaults to 3 days? I would recommend not letting it get nearly that far behind. But might be worth checking it in your case to see if someone lowered it for some reason. https://msdn.microsoft.com/en-us/library/ms151188.aspx
      
      Reply
      - Amy
        November 16, 2015 11:32 am
        
        Retention is 72 days. So that should be a problem. I was just worried about the number of transactions distributor have to release to subscriber if I delete all 5 M at once.
  - shrikant kale
    November 17, 2015 2:31 pm
    
    Hi
    
    I am novice in the field of replication. But i see one more solution to the problem.
    
    Lets say we add an SP which deleted these millions of rows. Add it to articles(with marked for execution) and then let it get created on replicated database.
    
    Once this is done, we invoke this SP, which would generate only one command to delete those rows.
    
    The approach may not work as is, because deleting 5 million rows from database may have other impacts. So it can be done in 500 chunks each of size 10000.
    
    This new SP later can be removed from replication and deleted.
    
    Another shortcoming of this approach is, you need generate a snapshot whenever new article is added to list. And tables do get locked for some time while generating snapshot.
    
    As I said I am novice, so I may be missing something. But would like to hear from you on this.
    
    Regards
    Shrikant
    
    Reply
shrikant kale
November 17, 2015 2:25 pm

Hi

I am working on optimizing transactional replication. I am planning to add one stored procedure to the list of articles, to reduce number of replication commands. I am facing one challenge with it.

This SP has insert commands. Let’s say it has a loop of insert commands with iteration count 1000. So when I replicate this SP, number of replications commands would be reduced from 1000 to 1. However this SP inserts data in table which has identity column. This column is not an identity column in replicated database (this is a default behavior, I did not do anything specific to achieve it). So the SP which can insert data on primary side cannot be replicated as is to insert data in replicated database, it will fail because column count wont match.

One way to address this issue would be temporarily turn IDENTITY_INSERT ON at start of SP and turn it OFF at end. So on primary side, I can invoke insert commands by passing explicit ids, and same SP can then work on replicated database.

Other way is to have a different version of SP on replicated database. However I am not sure, what precaution I need to take to make sure that this SP is not overwritten by version from primary database.

Appreciate your suggestions…

Regards
Shrikant

Reply
Hassan Umair
February 11, 2016 2:31 am

Hi,
I have a question, i am managing 2 SQL servers
1 is Master Server (Publisher + Distributor)
2 is Client Server (Subscriber)
client basically owned by a client who is buying data from us.
Both Servers are replicated with each other
Replication type is Transactional-replication
only selected Tables of a single DB are replicated from Master-Server to Client Server.
Now i am going to tell actual problem. Basically there are several problems i want to discus

Problem 1:
When i add or delete any data table and take snapshot of replication than all db vanishes from customer side and after that it rebuilds in 20-30 minutes (have limited access so don’t get logs)

Problem 2:
When i reinitialize subscription than it takes lots of time to completing it and showing partially applied scripts of tables in monitoring.

Problem 3:
Publisher to Distributor showing high commands and transactions count but actually on distributor to subscriber commands and transaction counts are more than 100’s of times lower as compare to publisher section.

Problem 4:
Latency is Very high between Publisher-Distributor (around 5+ minutes)(Server-1) and jumps to 7 minutes then automatically less and stay on some seconds but average is really very high. Distributor-Subscriber latency seems fine.

Problem 5:
Log Size of Master DB increases and occupying space around 3.5 GB (Normally 3 GB with 80-87% free space) whether Data File Size also increases and now it’s 4 GB+ (Normally 3.3 – 3.5 GB)

Really Appreciate Your Suggestions and Guidelines.

Thanks in advance,
Hassan

Reply
- Brent Ozar
  February 11, 2016 5:35 am
  
  Hassan – when you’ve got a detailed question (or in this case, 5 questions!), you’re probably not going to get the solution you want quickly in a blog post comment.
  
  You can either write a question (or a couple) over at http://dba.stackexchange.com, or click Contact at the top of the page about getting a consulting engagement started.
  
  Reply
susan
March 3, 2016 11:28 am

Kendra,

I am at a bit of a loss, I had to rebuild the replication process recently and have had nothing but issues and I can not find the answers.

1) SQL 2005 (not for much longer)
2) Snapshot creation worked great once I got all of the junk out of it
3) initial reinitializing of subscriber worked (it seems)
4) the Transactional replication isn’t working and not erring. it is running and saying Transactions with commands delivered. but when I check the actual data it hasn’t been updated since the snapshot

Reply
- Brent Ozar
  March 4, 2016 10:57 pm
  
  Susan – you’ll want to go ahead and open a ticket with MS support. This is beyond something we can troubleshoot in a blog comment.
  
  Reply
Kim Cuthbert
February 14, 2017 4:22 am

Hi Kendra

I would love to hear from you – and Brent – you guys rock and are always my first source for any sql queries/problems I have and need figuring out

Can you confirm whether the following would work? Ive done a lot of searching the net but cannot find the answer.

Publisher = SQL 2014 Enterprise (on its own windows server)
Distributor = SQL 2014 Standard (on its own windows server)
Subscriber = SQL 2014 Standard (on its own windows server)

I have AlwaysOn enabled on the Publisher(primary) – currently we have the distributor database on the Publisher so that when we fail over to our secondary server – replication stops working due to the distribution database being on the primary(publisher). You cant add distribution database to AlwaysOnAvailability group so I need to get the distribution database onto another separate server – I’m just struggling with whether the SQL Server Instance has to be the same as the publisher/primary (enterprise) OR if I can save money and set it up as SQL Server Standard.

I’m really just wondering if the distribution database needs to be on a SQL 2014 Enterprise Server as I don’t think my company can afford that

Reply
- Brent Ozar
  February 14, 2017 6:17 am
  
  Kim – for questions, your best bet is to head over to http://dba.stackexchange.com. (I’d rather teach you to fish.)
  
  Reply
- KH
  February 16, 2017 11:14 am
  
  YES it works i have implemented the same We have 200 subscriber few on express edition.
  Pub on Enterprise and distribution on standard. Difficult to maintain when subscriber database gets curpt.
  We have merge replication.
  
  Reply
Rafael Rojas
August 17, 2017 1:14 pm

Question, we are using transactional replication and we found out that the data on master for a couple of tables is going faster that Distributor. So we get a queue that is increasing on the Distributor during a high load of the system.
Basically the implicit Delete that occurs on that Distributor table (default > 48 hours) was changed to 1h in order to relieve the time it takes to seek for commands after some time of system load.

We were wondering if it’s advisable to somehow tell SQL to partition the commands table on the Distribuitor or what could be a solution that makes sense for that?
Or it might be we are reaching a SQL threshold as far as replication is concerned (although I would not expect that, as the commands that are stored might get up to 5 millions)

Reply
- Brent Ozar
  August 17, 2017 2:38 pm
  
  For questions, head on over to a Q&A site like https://dba.stackexchange.com.
  
  Reply
  - Rafael Rojas
    August 17, 2017 6:36 pm
    
    You could spent more time helping than redirecting to other links.
    
    Out of the questions could come out good points for this article.
    
    My input for this article is that setting Immediate_sync may cause Transactional Replication Distribution database growth.
    So if replication is configured with Only ONE subscriber, you better disable Immediate_sync, hope it helps someone.
    
    Reply
    - Rafael Rojas
      August 17, 2017 6:36 pm
      
      https://blogs.msdn.microsoft.com/chrissk/2009/07/27/how-replication-setting-immediate_sync-may-cause-transactional-replication-distribution-database-growth/
      
      Reply
    - Steve Jimmo
      September 24, 2017 9:35 am
      
      There are actually several other things that can be done when using transactional replication, especially in high volume transactions. The name currently escapes me, but if you look in the SQL Agent, you will see some jobs that run every 10 minutes, all day. Reschedule these jobs to run only during off hours. This job cleans up the old and completed transactions from the distribution database. Every time this runs it will perform DBCC’s against the distribution database. This will create blocking at the distributor, causing high latency from the publisher to the distributor as well as from the distributor to all subscribers. Also make sure that the transaction log to the distributor database is exaggeratedly large and DOES NOT AUTOSHRINK. Ensure that the transaction log is on a separate drive from the data fie. There are more things that can be done, but I would need much more information from you.
      
      Reply
Rooster
September 29, 2017 5:48 am

If you’re using transactional replication to push data to a secondary server which is used entirely for reporting purposes should you run index/statistic maintenance on the replicated database? If so are there any risks in doing so?

Reply
- Steve Jimmo
  September 29, 2017 9:53 am
  
  I follow the same rules for a replicated database as a non-replicated database. Remember, you are making changes to the data, so based upon the changes you will need to maintain the statistics in order to maintain optimization. BUT, the indexes on the replicated tables do not have to match the master tables. Only the primary key must be there. Each replicated database is indexed based upon the requirements of the data on that database.
  
  Reply
indoanz.Com
December 4, 2017 1:11 pm

I truly appreciate your work, Great post.

Reply
Anny Ekanem
January 4, 2018 12:38 pm

Thanks for the CHECKLIST, I always get this error and I am trying to find out why it happens.

“Availability group – replication falling behind”

Reply
- Brent Ozar
  January 4, 2018 2:46 pm
  
  Anny – glad you like the checklist. For Q&A, you can head to a Q&A site like https://dba.stackexchange.com, or open a support case with MS.
  
  Reply
kumar
July 3, 2018 1:37 pm

Thank u 4 ur valueble post M impressed. Here I need a help on merge replication performance tuning could you pls assit me .My requirement is I was configured replication 1 pub to 1 subscriber with one publication&subcription, both r in USA site and current users 450.Now everything is fine in my replication,but some times slowness issue raising out even though I updating index rebuild&statistics daily.Here main problem is now client need to increase users count of 450 to 1000 users,so it may cause a performance issue on slowness.So what are the steps i need to do suggest me on this please to maintain good service without slowness and performance issues on replication.specifications of replication primary DB has 800 GB,1450 tables involved in only one publication (note:I need in only one publication),provided RAM 36 GB,4core cpu.(note:I cannot improve hardware specifcations),drives have 5TB space.So I think I provided my question clearly .Please help me on this .Thanks in advance.

Reply
krishna
August 2, 2018 1:44 pm

Thanks for this post, this article covered lot of things related to replication. We have transactional replication setup in our environment, Recently i installed latest Service pack (SQL2014sp2) to lower environments. After that upgrade we are seeing so many issues with replication. We have 2 publications for same DB (1 publication is using transactional and 2nd one is snapshot) and distribution job’s are blocking with each other at subscriber end. Snapshot distribution job is using “begin tran” command and then going to sleep state, it is blocking other distrbution job. I already reinitialized subscription, dropped and recreated the publications, but still we are seeing this issue. Did anyone faced this type of replication issue before or any ideas to resolve this issue, please post ur suggestions.

Reply
- Brent Ozar
  August 2, 2018 8:53 pm
  
  For general questions, head to a Q&A site like DBA.stackexchange.com.
  
  Reply
Mike Jones
September 14, 2018 5:13 am

Very helpful. Thanks, Kendra. You have a dead link: ‘Learn more about it here’ I think needs updating to point at http://replicationanswers.com/2017/01/17/adding-an-article-do-i-really-want-a-complete-snapshot/.

Reply
- Brent Ozar
  September 14, 2018 8:05 pm
  
  Mike – thanks, updated!
  
  Reply
Jimit trivedi
June 29, 2019 5:17 am

Need help. Have set up transactional replication with Publication and Distribution on 1 windows server. It has 2 subscriptions 1 remote to the server and one on another server (remote). I have 10 articles. My concern is during sometime daily replication channel is being stopped and subscribers are stucked on status “Delivering Replicated Transactions”. When I further checked processes, its showing sp_MSupd_dboTableName running continuosly that is updating the table. Also yesterday I found one more replication process running in input buffer “sys.sp_replcmds” for quite a long period of time which was blocking other running processes. We are not able to track the exact issue. Is there any replication configuration which we are missing or not setting up properly. Need help. Thanks in advance

Reply
- Brent Ozar
  June 29, 2019 10:03 am
  
  Sure, for on demand help, contact Microsoft support or post a question on DBA.stckexchange.com.
  
  Reply
kennny
July 27, 2022 6:55 pm

Are there any benchmarks on how fast sql transactional replication can move data for a given configuration (CPU, bandwidth, disk speed, network latency, etc)? Is 1 GB / hour attainable? 10 GB / hour? 100 GB / hour? …

Reply