The best part about transactional replication is that it’s flexible. And the worst thing about transactional replication is that it’s flexible.
A poor replication configuration will cause you performance pains and increase your risks of development and administrative errors. Here’s one important design principle: isolate your replication subscriber tables from one another, and from non-subscriber tables.
Our example environment
We’ve got an environment with two SQL Server databases used for OLTP purposes: SalesDB and CRMDB. Each database uses SQL Server Standard Edition. The databases support different applications, and they’re on totally separate SQL Server instances (and failover clusters).
The SalesDB and CRMDB support completely different applications, but data from selected tables in them both is needed for internal reports. SQL Server transactional replication has been chosen as the technology to keep a copy of the data up to date on the reporting instance. Replication is already in use in the environment and a dedicated SQL Server has been configured for distribution.
There’s still choices to make: how do you configure the subscriber? My recommendation is that each replication subscription be isolated in individual databases. Let’s take a look at why.
The all-in-one subscriber (Avoid This!)
In this model, we have one big database named ReportDB on the Warehouse01 subscriber server. This database holds reporting tables as well as tables from replication subscriptions.
SQL Server will let you do this, but it’s got some downsides:
1. You’ve got a single transaction log (and a single recovery model) for ReportDB. Not all of the data in ReportDB may be recoverable from other sources. If for any reason you need to be in the FULL recovery model for ReportDB, you’re in a sticky situation if you have to reinitialize replication and reload the subscriber tables: you’ve got to do it fully logged, and your transaction log backups will be bigger and slower.
2. It’s easy for DBAs and Developers to screw this up. You may want to drop all the replication subscriber tables at some point if you have to re-initialize replication. I would much rather isolate drop table commands to a database where everything in the database is replication related, and I don’t have to worry about accidentally dropping the wrong thing! Similarly, it’s much easier to set up security so that application accounts can only read subscriber tables (not write to them), when they’re in their own database.
3. You’ve got fewer options when it comes to reinitialization. The ability to initialize a replication subscriber from a restored backup is huge– running snapshots on your publishers makes your users sad.
Isolated subscribers (Much Better Over Time)
Here’s the alternate model, where each subscriber has its own database:
Note that the subscriber databases are named in a way that you can identify their role. (As noted by David in the comments, it’s much easier if you don’t re-use the name of a publishing database.)
This model has a lot of benefits:
- More options to re-initialize
- Supports different recovery models on replication subscriber databases and ReportDB (which can allow minimal logging during bulk inserts if you’re re-initializing that way)
- Safer for administrators
- Easier to set up security
- Easier for new DBAs to and developers to understand (the architecture is much more transparent)
What If I’m Not Doing it This Way? Do I Have to Change All My Code?
Not necessarily. I’ve worked with this with two different models.
Synonyms: ReportDB could contain synonyms for all the replication subscriber tables. This makes the tables seem like they’re in ReportDB, while they’re actually in other databases.
Dynamically configured procedures. In one environment I worked in where replication was common, there was a configuration table where you configured the name of the replication subscriber databases needed by an application. When code was deployed, the installation process for stored procedures checked this configuration table and dynamically compiled the stored procedures referencing the name of the database in that environment. This sounds like more work, but it was very useful for the developers, who might need to deploy builds to instances configured differently than production.
Want To Learn More About Replication?
We’ve got you covered. Start with these articles:
- Introduction to SQL Server Transactional Replication
- Performance Tuning SQL Server Transactional Replication
- Monitoring SQL Server Transactional Replication
4 Comments. Leave new
Great article, Kendra!
I’ve worked through this same scenario, starting with a few tables from one database replicated into another database. Eventually we separated the databases just like you suggested, and named the subscriber database the same as the publisher database. Eventually we found there was confusion and limitations because the replicated databases were named the same, and the subscriber only had a small portion of the data from the publisher.
Therefore we went through another process of renaming the subscriber database. This addressed the confusion problem, and allowed us to restore all 3 databases on the same development instance, and instead of using replication in development we created views to reference the source database. This was a big help, because the developers knew early on what tables they did and did not have access to on their reporting database. Therefore we were able to work with them more carefully to plan for the additional tables they needed, before they were trying to deploy there changes.
We actually never considered using synonyms between the databases, we always used views. This meant we had to ensure the views were updated anytime the underlying tables changes, which was a pain. It looks like the only place that views are required in our scenario is in development to replace the subscription tables. The views are required in that database only because synonym chaining is not allowed.
Great point about the database names. I added a line to the article to specifically call that out so people don’t miss it, because I agree that reusing the publishing database name removes a lot of the transparency you get from that architecture.
For anyone who doesn’t know the problem with views that David is talking about, it’s absolutely a bit of a pain. Meet sp_refreshview, the command you will wish you never had to use: http://msdn.microsoft.com/en-us/library/ms187821.aspx
Kendra,
I have an environment with replication between two SQL Server 2008 R2 Datacenter Edition. I Work with a transactional replication and I am publishing an article at a time.
However, I have an article that has 3 million lines and can not generate snapshot for it. Only lack this article to complete replication. Got any tips for me? pleases help me.
Check out our post on performance tuning replication here: https://www.brentozar.com/archive/2014/07/performance-tuning-sql-server-transactional-replication-checklist/
Among the items in the list are some that talk about alternatives to running snapshots altogether, which sounds like it’d be pretty attractive in your case.