You know how to design indexes, but you’re not sure how good your skills really are. In this quiz-packed session you’ll get a chance to test your skills! Kendra Little will walk you through a set of index design challenges. You’ll have time to answer each problem on your own, then find out whether the SQL Server engine thinks you made the right choice. At the end of the session you’ll get to tally your score (but you can keep it a secret if you like).
Take the quiz while you watch the video. If you’d like to dive straight into the questions, skip to 2:30.
Compare your answers to the group
Our webcast had 416 attendees and 80% of people answered each question. To find out the right answers (and why) watch the video above, but here’s the stats on how webcast viewers answered each question.
This is NOT the answer key– don’t assume the majority of people got every question right! This just lets you compare what you thought to what others thought.
Q1) A NonClustered Index is…
Q2) What will this query probably due given this table definition?
Q3) What is most likely the biggest problem with this table?
Q4) INCLUDED columns are…
Q5) A unique constraint is…
A only) 11%
A and B) 11%
A, B, and C) 16%
A, B, C, and D) 14%
A and C) 41%
A, C and D) 8%
A and D) 4%
(There were some other combos, but they all got low percentages)
Q6) Which one of these statements is ALWAYS true?
Q7) How many indexes were written to?
Q8) Why wouldn’t SQL Server Ask for an Index?
SQL Server transactional replication is a blessing and a curse. It’s a great developer tool that lets you scale out data to multiple servers, even using Standard Edition. But as your business picks up, your datasets get larger, and your customers grow more demanding, replication can start to fall behind. You need to learn how to tune it to keep up.
Before You Start Tuning Replication….
Make sure you can prove when changes you make to replication improve performance. Or find out quickly if you make something worse. (It’s gonna happen.)
If I’m going to make changes with something as complex as replication, here are my basic requirements:
- Monitoring must alert the DBA team replication latency exceeds allowed thresholds
- Monitoring needs to track historical latency to show if my changes reduce latency
- I need a production-like staging environment to test my changes.
If you haven’t configured monitoring for transactional replication, read how to do it here. The “easy” and “medium” steps are a small amount of work and are incredibly useful.
Don’t Skip “Normal” SQL Server Performance Tuning!
I’m going to give you a lot of transactional replication specific performance tuning tips in this post. But don’t skip other elements of SQL Server performance tuning! Wait statistics, virtual file stats, and identifying bottlenecks are still important. Get started with SQL Server performance tuning here.
1) Are You Using the right versions of Windows and SQL Server for Replication?
For replication performance, you want to be on Windows Server 2008 and SQL Server 2008 minimum. It really makes a difference.
2) Have You Scaled up your distributor?
When replication performance is important, use a dedicated distributor server so that your distributor doesn’t have to fight with a publisher or subscriber for CPU, memory, network, or storage resources. If you need high availability for the distribution database, you have limited options: failover clustering is pretty much the only way to go.
3) Is Replication really what you need?
4) Are You Using the right type of subscriptions and tools for replication over the WAN?
Argenis Fernandez shares what he learned from tuning transactional replication over wide area networks. This is a great use of “pull” subscriptions.
5) Have You Made sure Articles are NOT in more than one publication?
Kendal Van Dyke shows that having articles in multiple publications can also magnify the number of commands in your distribution database. That bloats your distribution database and will slow you way down as activity picks up.
(Note: If you’re using row-filtering on your articles, you may be the exception to this rule.)
6) Do You Only run Snapshots when you need them?
I’ve come across cases where the Snapshot agents for a publication were set to run on a schedule, even when replication wasn’t being initialized. I believe that when someone was setting replication up, they had checked off the option to “Schedule the Snapshot Agent to run at the following times” without realizing that it wasn’t needed. Don’t run snapshots on a schedule, it will lock up the publishing database. (When you open “Job Activity Monitor” these jobs show up with the category “REPL-Snapshot”.)
7) Are you Using “Immediate Sync” to your Advantage?
The immediate sync option is hard to spot when you first set up replication. Setting this to false can help minimize the impact of running a replication snapshot if you need to add new articles, or even remove and re-add a few articles. Learn more about it here. As always, test your changes outside of production first! (I personally have a fear of having immediate sync set to true because of this old bug from SQL Server 2005.)
For information on how the immediate_sync can also impact your distribution database, read more here.
Thanks to Allen McGuire for his comment reminding us on the benefits of this setting!
8) Are You Replicating only the articles and columns you need on the Subscriber?
Don’t just “add all.” For scalability, replicate only the articles that must be in replication, and only the columns that need to be replicated. This not only helps overall performance, this reduces the impact of times when you may need to re-initialize replication. This goes especially for large object / LOB data types. If you must replicate LOB types, Microsoft recommends that you use newer types such as nvarchar(max), varbinary(max), etc.
9) Do You Set the ‘Replicate Schema Changes’ subscription option to false when needed?
New columns being added to a published article shouldn’t be replicated to the subscriber unless they really need to be there. You can turn off the replication of schema changes by setting the ‘Replicate Schema Changes’ subscription option to ‘false’. (It defaults to ‘true’.)
10) Have You Considered static row filters?
“Static row filters” allow you to include only certain rows in a given publication. One gotcha: the row-filter is only evaluated when the row is inserted, not when the row is updated, so you really want this to be a value that doesn’t change. There is overhead to applying the row filter itself: Microsoft only recommends you use the row filters if your replication setup can’t handle replicating the full dataset. Be careful with indexing if you use row filters.
11) Have You Optimized your subscriber databases for re-initialization?
Face it: re-initialization happens. Unfortunately it usually happens when you least expect it and had plans to be doing something else. There are a few things you can do to keep re-initialization from making your publication database unusuable for long periods.
- Isolate your replication subscriber tables into their own database, and only keep replicated articles in there. This also typically helps you use use recovery models that are optimized for minimal logging in that database to speed up bulk inserts. Consider using synonyms to quickly “repoint” to replicated articles to give you flexibility.
- Evaluate whether initializing replication from backup could help.
12) Have you Considered Using multiple publications?
There’s pros and cons to splitting out publications. Here’s the pros:
- You can isolate large tables that are the biggest problems for snapshotting into their own publications so that they get their own snapshots. That can be helpful if there are other tables you might need to remove and re-add to replication more frequently. (The immediate_sync setting can also help with this, see #7 above.)
- This will give you multiple distribution agents so changes can be applied to your subscribers in parallel. (This one’s a pro and a con, as you’ll see.)
- This is more work to manage. You should be checking in scripts for your entire replication into source and have documentation on everything. More publications makes that whole process harder.
- All those distribution agents can backfire if you don’t have the resources to support them working on the subscriber at the same time.
- Be mindful not to put non row-filtered articles in more than one publication as noted above.
13) Are “Subscription Streams” Right for You? (or not?)
This option allows you to raise the number of connections that the distribution agents use to apply changes to the subscriber. But there’s overhead to managing all these threads, and you can get into situations where transactions aren’t fully consistent if you hit problems.
This feature is primarily recommended for use on situations where you have high network latency and are not changing the setting often. Keep in mind that if you’re splitting your articles into multiple publications for other reasons, you’ve already got multiple distribution agents running in parallel.
14) Are You replicating non-clustered indexes blindly?
Confession: I did this wrong for years. It’s very easy to set up transactional replication and send all the indexes over to the subscriber: you just set “Copy nonclustered indexes” in the articles property to “true”. But you’re only required to replicate the Primary Key and unique indexes. There’s two big problems with replicating all the nonclustered indexes:
- It can make re-initialization slower. By default the subscriber will have objects created, bulk load the articles, create “extra” nonclustered indexes, then “catch up” on any changes that came in after the snapshot was pushed. You definitely want to make sure that all “extra” nonclustered indexes are disabled or don’t exist while that bulk load is happening. But if a lot may have changed since the snapshot ran, you may not want the indexes to be created until the very end, anyway. Handling the nonclustered index creation outside of replication gives you that flexibility.
- It’s very rare for the exact same queries to run on the publisher and subscriber. You usually want nonclustered indexes that are specific to the workload on the subscriber, anyway.
Identify the “extra” nonclustered indexes specific to the queries that run on the subscriber. Script them out, check them into your source control, and have a process to deploy them whenever replication needs to be re-initialized.
15) Could publishing stored procedure execution work?
If your workload is run entirely by stored procedures, this can be a great option. (FYI, there is a bug/hotfix for this in SQL Server 2012/2014 listed below.)
16) Are You Using Read Committed Snapshot Isolation on replication subscribers?
It’s common for the distribution agent to have to fight with other processes while it tries to insert, update, and delete rows in the subscriber database. One DBA that I worked with removed a huge amount of blocking and speeded up processing by using RCSI on a subscriber database. Learn more about RCSI here.
17) Have You Ruled Out Virtual Log File fragmentation on the publisher database?
If you’ve got more than 10K virtual log files on your publication database, it could slow down replication. I’ve particularly seen this cause replication to get behind when a large operation like an index rebuild was run. Our free sp_Blitz® script will diagnose high numbers of VLFs for you.
18) You Haven’t Been Randomly Fiddling with Settings on the Distribution Agent, Have You?
There’s a lot of little settings you can change on the Agent Profiles in replication. I’m not a huge fan of changing them unless you can prove they helped your performance, though. These are typically “fine tuning” settings after you have the right architecture in place, in my experience.
19) Have You Looked Out for Replication Bugs / Fixes?
Man, these can get you. Like any other complicated tool, things can go wrong. Here’s a few highlights:
- KB 2674882 – Deadlocking distribution agents OR replication queries with very high memory grants. This can occur on SQL Server 2005, 2008, or 2008R2.
- Unexpectedly inactive subscriptions. There are many performance reasons to upgrade from SQL Server 2005, but if you must be on it then you shouldn’t run anything less than SP4.
- KB 2958429 - Service Packs Matter. SQL Server 2012 SP2 added some features to replication logging (and even a few improvements to Peer to Peer replication, oddly enough). Scroll to “Highlights for SQL Server 2012 SP2″ and expand “Replication” to see the list. If you apply this service pack, you may also want to apply KB 2969896.
- KB 2897221 - Stack dumps/ non-yielding schedulers if you’re replicating stored procedures. SQL Server 2012 or 2014.
- KB 949296 – Replication Agents cannot run when you have many agents and the Windows desktop heap is “used up”. (Thanks to Michael Bourgon for suggesting we link to this one.)
Public Safety ANNOUNCEMENT: Replication Needs Special Treatment for Hotfixes and Upgrades
With any hotfixes, it’s always good to review KB 941232, “How to apply a hotfix for SQL Server in a replication topology.” (You’ll need special steps if you’re using Database Mirroring or AlwaysOn Availability Groups.)
Be careful with the steps you take to upgrade and ‘drain’ replicated transactions. If you don’t do this, in some cases you may have big problems during an upgrade.
Got a Transactional Replication Horror Story or Recommendation?
Or do you have a favorite setting I haven’t mentioned here, or something you disagree with? Replication is full of controversy! Share it with us in the comments.
Sometimes you end up in a good plan / bad plan situation: an important query runs just fine most of the time. The query is parameterized, a good execution plan gets re-used, everything is cool.
But sometimes, a “bad plan” gets compiled and starts to be reused. This is “bad” parameter sniffing. “Bad plans” can come in a few varieties: maybe it’s slow some parameter combinations and can cause page timeouts sometimes. Maybe the “bad” query plan has a very large workspace memory grant that just isn’t needed, and it causes problems because lots of different queries are using it — then you get all sorts of nasty Resource Semaphore waits and everything gets slow.
Whatever the situation is, sometimes you want to stabilize a particular execution plan that’s “good” for all the different parameters that the query can run with.
Option 1: Change the code
The very best option is changing the code so you don’t have to resort to shenanigans behind the scenes. You can rewrite the TSQL, change indexes, or use hints to get a specific plan. But sometimes this is difficult to do: maybe it’s vendor code you can’t change. Maybe there’s a long code release process and it will take a very long time to get the code changed.
If you can tune the code, absolutely do it. If you can’t, at least get the request to fix the code noted by the vendor or software development team. Don’t skip it altogether, because the options I describe below aren’t all that fantastic.
Option 2: Plan guide that thing
Plan guides are like duct tape: it’s something you want to have on hand for emergency quick fixes, but you don’t want to rely on it long term as a building material. It’s also not suited for every kind of fix.
Plan guides let you do a few things:
- Apply query hints like “optmize for value”, “optimize for unknown”, “recompile”, and “maxdop” to a query
- Selectively turn on trace flags for a query, such as TF 4199 (performance optimizer changes), TF 9481(Older cost-based optimizer if running on SQL Server 2014), TF 2312 (newer cost-based optimizer if running on SQL Server 2014)
- Add selected table hints, like forceseek and specific index hints. (You cannot add NOLOCK or change isolation levels, don’t get too excited.)
- “Template” plan guides let you “force parameterize” a specific query, or enforce “simple parameterization” for a certain query if the database is using forced parameterization
But duct tape isn’t perfect. Here’s the biggest gotchas that I’ve found with plan guides:
- I’ve found that trying to use an index hint in a plan guide can cause queries to silently fail. That’s awkward.
- Plan guides don’t work with all types of queries. I haven’t been able to get them to work with temporary tables or table variables referenced in the query, for example.
- Making sure that a plan guide is working and is picking up your query is tricky. Plan guides are very picky about matching query text exactly.
- Plan guides can make code releases fail. If you’ve got a plan guide that references a stored procedure and something goes to alter it, SQL Server’s going to stop the ALTER with an error.
My biggest advice for plan guides: test them out on a non production system first. Verify that the plan guide is working and that the queries are doing exactly what you want before deploying to production. Treat the plan guide like real code as much as possible: put it into all environments, check it into source, use change control, and document it.
Option 3: Manually Force the “Right” Parameter Sniffing
If you can’t change the code and a plan guide doesn’t work, you can get a little creative. (By “get a little creative”, I mean that everything I’m about to describe can go horribly wrong.)
First, find the bad plan in cache. In SQL Server 2008 and higher, we get a nifty plan_hash for query plans. You can identify the “bad plan hash” that you don’t want to keep in cache. When it’s in cache, you then:
- Remove it from the cache. You can do this using DBCC FREEPROCCACHE and the plan_handle value (you can get this by running: sp_BlitzCache @results=’expert’). Or if it’s a stored procedure, you can use the sp_recompile procedure against the stored procedure to blow away the plan. (Thanks @DBArgenis for this tip!)
- Run a version of the query that puts the plan you want to be in cache. Usually this means running a version of the query with the parameters that give you the “right” plan.
You’ve got the burden of doing a some testing. Be careful with how you remove plans from cache: sp_recompile requires a schema level (exclusive) lock, so I don’t recommend running that against tables.
You need to generate the plan in a way that’s going to be re-used and make sure it works. You need to make sure that the plan you’re putting in cache really is good for re-use!
Stabilizing execution plans with NORECOMPUTE
If this doesn’t sound quite crazy enough for you, you can go a little farther and try to increase the chances of your “good” plan staying in cache longer. One tool you can use for this is NORECOMPUTE.
When you update statistics on a table with NORECOMPUTE, you tell SQL Server not to automatically update statistics on the table as data changes. This will automatically happen when approximately 20% of the rows in the table have been modified (the algorithm is more complicated, but 20% is pretty easy to remember). Updated statistics will cause the optimizer to consider a new execution plan for your query. So NORECOMPUTE reduces the chance of the “good” query being bumped out.
If you use NORECOMPUTE, be aware that this could have a negative effect on some queries and cause them to get a terrible estimate on queries that they’re running. You probably want to manually update statistics for the table at least once a day if data changes in it. You can do this using a built in command like sp_updatestats, custom code you write yourself, or Ola Hallengren’s free index maintenance scripts (see Examples B & D for two options). Just please don’t do it with a maintenance plan.
Document the heck out of this. It’s easy for people to have no idea it’s in place, or find it and change it without knowing what it impacts. If plan guides are duct tape, this is more like Scotch Tape.
These Are EMergency Fixes – Don’t Start Your Performance Tuning Process With These Techniques
A lot of times you don’t need any of this crazy stuff. Remember: many times you can fix these issues with good indexing or simple code changes. Safety first!
It was the best of times, it was the worst of times. I was a SQL Server DBA, and if something went wrong in Transactional Replication I needed to find out about it right away and help keep things healthy, day or night. Here’s what I learned from that experience about monitoring replication.
If you’re just getting started and need an introduction to transactional replication, head over here.
Tracer Tokens Aren’t Really Your Friend
“Tracer Tokens” were introduced in SQL Server 2005. They sound awfully good. Books Online explains that you can automate them using sys.sp_posttracertoken and report on them using sp_helptracertokenhistory.
There’s a big problem: tracer tokens are too patient.
Let’s say my replication is incredibly overwhelmed and I send out a tracer token. I won’t hear back until it reaches its destination or definitively fails. That could be a very, very long time. The fact that it’s potentially unknown means I don’t want to rely heavily on it for monitoring.
Don’t Rely Too Much on Replication Monitor (REPLMON.exe)
When replication is behind, it’s natural to turn to Replication Monitor. The first five links in “Monitoring Replication” in Books Online point to it, after all.
Replication Monitor isn’t all bad. But don’t depend on it too much, either.
- Replication Monitor is a tool to help you answer the question “how are things doing right now?” It doesn’t baseline or give the kind of historical info that your manager wants to see.
- Replication Monitor may run queries to count the number of undistributed commands that may take a while to run and be performance intensive (particularly when things get backed up in the distributor).
I’ve personally seem some cases where running more than one instance of Replication Monitor while a publication snapshot was being taken also caused blocking. Too many people checking to see “how much longer will this take?” actually caused things to take longer. It’s not just me, Microsoft recommends you avoid running multiple instances of Replication Monitor.
Replication Monitor is useful, but you’re better off if people can get information on replication health without everyone having to run Replmon. You can do this fairly easily by using simpler tools to create dashboards to chart replication latency.
Easy Replication Monitoring: Alert on Latency with Canary Tables
It’s easy to build your own system for tracking replication latency for each publication. Here are the ingredients for the simplest version:
- Add a table named dbo.Canary_PubName to each publication
- dbo.Canary_PubName has a single row with a datetime column in it
- A SQL Server Agent job on the publisher updates the datetime to the current timestamp every minute
- A SQL Server Agent job on the subscriber checks dbo.Canary_PubName every minute and alerts if the difference between the current time and the timestamp is greater than N minutes
It’s very simple to extend this to a simple dashboard using a third party monitoring tool or SQL Server Reporting Services: you simply poll all the dbo.Canary tables and report on the number of minutes of latency on each server.
This simple process gets around the weaknesses of tracer tokens, and also gives you immediate insight into how much latency you have on each subscriber. Bonus: this exact same technique also works well with logshipping and AlwaysOn Availability Groups. Tastes great, less filling.
Medium Replication Monitoring: Notify when Undistributed Commands Rise in the Distribution Database
The distribution database is a special place for Transactional Replication. The log reader agent pulls information on what’s changed from the transaction log of the publication database and translates it into commands that hang out in the distribution database before the changes go out to subscribers.
If you have a lot of data modification occurring on the publisher, you can get a big backup of commands in the distribution database.
If replication performance is important, set up a SQL Server Agent job on your distribution server to regularly check the amount of undistributed commands with a script like Robert Davis provides here. Have it alert you when the commands go above a given threshold.
Real world example: When I was the DBA for an environment with mission-critical replication, we would warn when undistributed commands rose above 500K and create a severity-1 ticket when they rose above 1 million. We did this after setting up dashboards to baseline replication latency and also baselining the amount of undistributed commands in distribution, so that we knew what our infrastructure could recover from and what might need DBA attention to recover in time.
Difficult Replication Monitoring: Alert When Individual Articles are Unhealthy
Here’s where things get tricky. It’s very difficult to prove that all articles in replication are healthy. The steps up to this point have tracked latency for the entire publication and bottlenecks in the distribution database.Things get pretty custom if you need to prove that individual tables are all up to date.
I once had a situation where a code release removed some articles from replication, modified the tables and data significantly, then re-added the articles to replication.
There was an issue with the scripts and one of the articles didn’t get put back into replication properly at the end of the process. Replication was working just fine. No script had explicitly dropped the table from the subscriber, so it just hung out there with stale data. The problem wasn’t discovered for a few days, and it was a bit difficult to track down. Unfortunately, the next week was kind of a downer because a lot of data had to be re-processed after that article was fixed.
Here’s what’s tricky: typically some articles change much more often than others. Monitoring individual articles typically requires baselining “normal” latency per article, then writing custom code that checks each article against the allowed latency. This is significantly more difficult for any large articles that don’t have a “Last Modified Date” style column.
(Disclaimer: in the case that you don’t have a “Last Modified” date on your subscriber, I do not suggest layering Change Tracking on top of the replication subscriber. If you are tempted to do that, first read my post on Performance Tuning Change Tracking, then go through all the steps that you would do if you needed to re-initialize replication or make schema changes on articles. You’ll change your mind by the end.)
Special Cases: The “Desktop Heap” is Used Up
This is a special case for replication. If you have a large amount of replication agents on a single server (such as 200 or more), you may run into issues where things just silently stop working due to desktop heap exhaustion. This is an issue that can be hard to identify because the agents just stop working!
Canary tables can help monitor for this, but you’ll need a lot of them since this can happen on an agent-by-agent basis. Read more about fixing desktop heap problem in replication in KB 949296. (Thanks to Michael Bourgon for suggesting we include this.)
Test Your Monitoring out in Staging
The #1 mistake I find with transactional replication is ignoring the staging environment. This is critical to supporting replication and creating effective monitoring for it.
The staging environment isn’t the same thing as development or QA. It’s a place where you have the same number of SQL Server instances as production, and the same replication setup as production. You test changes against staging before they go to production. You can also use it to test replication changes.
Staging is also where you confirm that your replication monitoring works. Data probably doesn’t constantly change in your staging environment, but that’s OK. Use canary tables and get creative to simulate load for test purposes.
Do You Have a Technique for Monitoring Replication Not Listed Here?
Tell us about it in the comments!
Writing blog posts on transactional replication is like revisiting childhood trauma.
— Kendra Little (@Kendra_Little) June 23, 2014
Change Tracking is a developer tool introduced in SQL Server 2008 to help you sync data between SQL Servers, or between SQL Servers and devices. It’s designed to help you answer the request, “I need to get the most recent value for rows that have been modified since I last grabbed data.”
First, I’ll give you the top three reasons people use Change Tracking. Next, I’ll give you queries! If you’re using Change Tracking, these will help you document how it’s configured and measure how big your secret internal tables used by Change Tracking have grown to.
Then we’ll peek into query plans for queries using Change Tracking and talk about some common pitfalls that can impact performance.
Finally, we’ll talk blocking and data consistency. There may be some tradeoffs you need to make to get consistent results out of change tracking.
Why do people use Change Tracking?
Here are the top three reasons that Change Tracking with SQL Server sounds attractive to people:
1. Change Tracking doesn’t send every change over the wire. With a technology like transactional replication, every change that’s made is sent off into replication. If I update a single row a million times, that usually means at least a million “commands” head off into my distribution database. But I don’t always want all of those values. Often, I just want the most recent value, so Change Tracking offers a lightweight solution.
2. Change Tracking doesn’t care what platform your client is running. With a technology like merge replication, all the partners need to run merge replication. With Change Tracking, your client application just needs to come in and talk to a web server. The web server then queries out data that’s changed and sends it back to the client. The client doesn’t need to know a thing about SQL Server. (Clarification: there’s a part of merge replication itself which is also called “Change Tracking”. That’s totally different than what I’m talking about in this post.)
3. Change Tracking is cheap. It works in every edition of SQL Server (Standard Edition, Web Edition, you name it). This isn’t true for all your replication options – Change Data Capture requires Enterprise Edition, for example. Licensing SQL Server can be a huge part of your IT budget, so features that work with Standard Edition are often very attractive.
But as you’ll see in this post, there are tradeoffs, and Change Tracking doesn’t fit every scenario.
Are you cleaning up, and do you have a retention period set?
Change Tracking isn’t free. As you’ll see up ahead, it creates secret tables in your database. The first thing to find out is where you have change tracking on, if you have cleanup enabled, and what it cleans up. Check that out by running this query against any database on your SQL Server instance:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO SELECT db.name AS change_tracking_db, is_auto_cleanup_on, retention_period, retention_period_units_desc FROM sys.change_tracking_databases ct JOIN sys.databases db on ct.database_id=db.database_id; GO
Are the right tables in Change Tracking?
You need to know which tables are being tracked. The more tables you track changes on, the more SQL Server has to hoard in other secret tables, and the more it will have to clean up. (Assuming you’re cleaning up. If you don’t clean up, that could be bad.)
Run this query against every database that has Change Tracking enabled to find out which tables have changes tracked:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO SELECT sc.name as tracked_schema_name, so.name as tracked_table_name, ctt.is_track_columns_updated_on, ctt.begin_version /*when CT was enabled, or table was truncated */, ctt.min_valid_version /*syncing applications should only expect data on or after this version */ , ctt.cleanup_version /*cleanup may have removed data up to this version */ FROM sys.change_tracking_tables AS ctt JOIN sys.objects AS so on ctt.[object_id]=so.[object_id] JOIN sys.schemas AS sc on so.schema_id=sc.schema_id; GO
How many committed transactions show in sys.dm_tran_commit_table?
We’ve looked at which tables you’re tracking and how long you’re retaining changes. But how many changes have been committed in that time period? This is the number committed across all the tables in Change Tracking (so the more tables you’re tracking, the bigger this can get).
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO SELECT count(*) AS number_commits, MIN(commit_time) AS minimum_commit_time, MAX(commit_time) AS maximum_commit_time FROM sys.dm_tran_commit_table GO
How big are your hidden Change Tracking tables?
There’s two primary places where Change Tracking keeps data about what’s changed:
- sys.syscommittab – this is the system table behind the sys.dm_tran_commit_table view, which you saw above
- sys.change_tracking_objectid tables – these are per each table tracked.
This query will show you all the internal tables with their size and rowcount:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO select sct1.name as CT_schema, sot1.name as CT_table, ps1.row_count as CT_rows, ps1.reserved_page_count*8./1024. as CT_reserved_MB, sct2.name as tracked_schema, sot2.name as tracked_name, ps2.row_count as tracked_rows, ps2.reserved_page_count*8./1024. as tracked_base_table_MB, change_tracking_min_valid_version(sot2.object_id) as min_valid_version FROM sys.internal_tables it JOIN sys.objects sot1 on it.object_id=sot1.object_id JOIN sys.schemas AS sct1 on sot1.schema_id=sct1.schema_id JOIN sys.dm_db_partition_stats ps1 on it.object_id = ps1. object_id and ps1.index_id in (0,1) LEFT JOIN sys.objects sot2 on it.parent_object_id=sot2.object_id LEFT JOIN sys.schemas AS sct2 on sot2.schema_id=sct2.schema_id LEFT JOIN sys.dm_db_partition_stats ps2 on sot2.object_id = ps2. object_id and ps2.index_id in (0,1) WHERE it.internal_type IN (209, 210); GO
Change Tracking Cleanup: No, you’re not crazy
We’ve just looked at two things that might not completely line up: your minimum commit time, and your cleanup/retention period.
If these don’t match, the root cause might be a few things:
- You’ve hit a bug in SQL Server 2008 and cleanup just isn’t running. See KB 973696, “FIX: Change tracking cleanup does not clean up the sys.syscommittab system table in SQL Server 2008″
- The Auto Cleanup Thread hasn’t run yet. This is a background task. I have observed that if I change the retention period on an existing database with change tracking it can take quite a while for the CHANGE_TRACKING_MIN_VALID_VERSION to update for that table. It takes about 30 minutes. The best documentation I can find for this is the comment from Jan 5, 2010 this Microsoft Connect Item.
If you adjust cleanup thresholds and you don’t see cleanup happening right away, you’re not crazy.
I bet I know what you’re thinking right now. “How do I make cleanup run when I want it to?”
Well, I can’t give you a documented way to do it. However, Nic Cain will show you how to use the undocumented sys.sp_flush_commit_table_on_demand procedure. Nic has also written on some limitations that he found in Change Tracking cleanup here, which might require you to run cleanup manually if you have a lot of changes flowing through your system.
This is undocumented, so test carefully and use it at your own risk. You can review the code it runs by running:
exec sp_helptext 'sys.sp_flush_commit_table_on_demand'
As Nic warns, your manual cleanup could conflict with automatic cleanup. So if you choose to use this undocumented procedure, you may need to disable auto-cleanup for a time period. That requires running an ALTER DATABASE command, which might impact your execution plan cache and will require its own locks.
Note: I have not seen this automatically start cleaning up rows immediately after I change the retention period on a restored database. The cleanup procedure relies on results from the hidden internal functions change_tracking_hardened_cleanup_version() and safe_cleanup_version(), and I have not found a way to force updating the value used by that function on SQL Server 2012 SP1, I’ve had to wait 30 minutes like I mention above.
So again, even using this procedure, you might at times wonder if you’re crazy, or if Change Tracking is haunted. Be patient, be careful, and write some lightweight code to monitor and alert you if those internal tables get too big for your tastes.
Is your code using the CHANGE_TRACKING_MIN_VALID_VERSION function?
Now that you know if cleanup is turned on and how it’s working, check if your code is calling CHANGE_TRACKING_MIN_VALID_VERSION() at any point.
This is important to make sure the data you’re pulling is correct. Let’s say I’m using Change Tracking to move data to a secondary SQL Server. Normally it polls data every hour, but there’s a hardware problem and my secondary server is offline for 3 days. The next time it polls, it needs to make sure that all the data it requires is still on the publishing server.
This means your clients need to store off the highest version they pull down, and check it against CHANGE_TRACKING_MIN_VALID_VERSION() each time they poll.
Your client may also need to use a special isolation level called SNAPSHOT. (More on this soon!)
How are you querying with CHANGETABLE?
Here’s an example of the columns you can work with when querying:
SELECT p.FirstName, p.MiddleName, p.LastName, c.SYS_CHANGE_VERSION, CHANGE_TRACKING_CURRENT_VERSION() AS current_version FROM Person.Person AS p JOIN CHANGETABLE(CHANGES Person.Person, 2) AS c ON p.BusinessEntityID = c.BusinessEntityID; GO
For each row that’s changed in the Person.Person table since revision number 2, I get the latest version of the FirstName, MiddleName and LastName for that row. I can see what the highest version number of those changes were, and I can also see the current highest version number for all tracked changes in the current database.
Could Trace Flag 8295 Help Your Performance? (TF 8295)
Trace Flag 8295 is documented in KB 2476322 and applies to SQL Server 2008 R2 CU7 / SQL Server 2008 SP2 CU4. After the trace flag is turned on, an additional nonclustered index is created on internal tables when you enable Change Tracking for that table.
There’s a big old disclaimer on that KB that says you need to test thoroughly in a non-production environment before making this change. (Hey, that’s a good general rule for trace flags in general.) I personally have not tested this trace flag, but it is used to support Change Tracking features by default by some System Center / SCOM products. This blog post by Santos Martinez gives some details.
What do your execution plans look like?
I’ve been giving you lots of ways to look at the internal objects for Change Tracking for a reason: the number of rows in the internal tables impacts performance.
Let’s say that my client last synced at revision number 1 million. A lot of changes have happened since then– 4 million in this table, and a handful of changes in other tables. I run a query like this to see what’s changed:
SELECT th.TransactionID, th.ProductID, th.TransactionDate FROM Production.TransactionHistory AS th JOIN CHANGETABLE(CHANGES Production.TransactionHistory, 1000000) AS c ON th.TransactionID = c.TransactionID GO
The execution plan looks like this:
All of the tables with thick lines to the right side of the plan are the change tracking related tables. Looking at our STATISTICS IO and STATISTICS TIME output, here’s where we spent all our CPU usage:
Table 'TransactionHistory'. Scan count 0, logical reads 3, physical reads 0, Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0 Table 'change_tracking_574625090'. Scan count 4001000, logical reads 12795765, physical reads 0 Table 'syscommittab'. Scan count 1, logical reads 16158, physical reads 0 SQL Server Execution Times: CPU time = 7735 ms, elapsed time = 7755 ms.
(For readability, I removed the printed messages for readahead, lob, etc– those were all 0.)
Here’s the story the execution plan told me as I stepped through it:
- First, SQL Server pulled the commit_ts (it’s a timestamp!) and xdes_id (transactional identifier) columns for every commit that occurred since our revision number from sys.syscomittable. It used a function, an index scan, and a merge join to do this. It found 4,001,000 rows with these columns.
- SQL Server then looked for the corresponding TransactionID values in the sys.change_tracking_574625090 table. Unfortunately it did an index seek 4,001,000 times to do this. It found 4,000,000 values matching the commit_ts and xdex_ids. (The other 1K were changes made for a different tracked table.)
- The 4 million rows were pushed into a hash operation to find the unique TransactionIDs for all those revisions. It was a single row. (It had just been changed 4 million times.)
- The TransactionID key was then used to look up the related columns in a single, efficient, clustered index seek. Hey, that part was fast!
The whole process took 7.5 seconds. Of this, 99% of the work was just identifying what had changed.
This may seem like an extreme example, but it’s really not. If you’re using Change Tracking on very frequently modified table or you’re not cleaning up very often (or at all), your overhead may in fact be far greater. I’ve seen undersized hash joins and tempdb spills in queries like this where revisions have piled up.
Change tracking, statistics, and parameter sniffing
Sometimes Change Tracking can be smart enough to give you a better execution plan based on the revision number you pass in. Whether or not it can do this well depends on two things:
- The freshness of statistics on your tables, including the internal change tracking tables
- Whether or not your revision number might be having a problem with parameter sniffing
If you have highly variable runtimes on reads from your Change Tracking tables, either of these might be be part of the problem.
Blocking, Cleanup, and Snapshot Isolation
Let’s talk about two problem scenarios that can happen with Change Tracking. They’re both related to that pesky background cleanup thread that you need to run so that your tables don’t bloat terribly, but which you can’t schedule.
Problem 1: blocking. When the cleanup thread runs, things get slow and you see locking and blocking. Investigation shows that it’s on the Change Tracking tables.
Problem 2: inconsistent results. A client comes in and says, “Hi, I need everything that changed after revision 99!” It checks the minimum revision and it’s below 99, so it begins a transaction and starts pulling down data from multiple tables. While it’s happening, new data is coming into some tables. And also, cleanup starts before it ends, and cleans up through revision 105 on some tables. Oh, we’ve got problems — we will have pulled down MORE revisions for some tables than others. And we will also have missed some revisions for some tables. We’re out of sync.
Because of these two problems, Microsoft strongly recommends that you use transactions and Snapshot isolation when reading and synchronizing data. It will protect you from those inconsistencies, which makes coding simpler, but of course there are tradeoffs. Learn more about implementing Snapshot isolation here.
Takeaways: Performance Tuning Change Tracking
Here’s the steps we’ve covered that will help you make Change Tracking go faster:
- Make sure you’re only tracking changes on tables that needed it. Extra tables in change tracking will bloat internal tables, and this can impact query performance.
- Make sure that your code properly checks CHANGE_TRACKING_MIN_VALID_VERSION so that you can run cleanup (or you’ll get troublesome bloat in internal tables).
- Validate that your cleanup jobs can keep up (and monitor it).
- Review and test using the undocumented procedure SYS.SP_FLUSH_COMMIT_TABLE_ON_DEMAND for cleanup if needed, but be sure to get approval since… well, it’s undocumented.
- Use query execution plans and STATISTICS IO and STATISTICS TIME to measure how much of your query runtime is being spent on Change Tracking tables.
- Look out for out-of-date statistics on Change Tracking internal tables, and parameter sniffing on procedures using Change Tracking.
- Use SNAPSHOT isolation to ensure you get consistent sets of data, even with cleanup running and new data coming in. It will also help you avoid pesky blocking situations.
Should you use Change Tracking?
As you may have guessed from this post, Change Tracking isn’t something that just configures itself and works perfectly for every application. But do you know of any other built-in data replication features which are like that?
Synchronizing data is hard. Change Tracking offers you some built in features that may save you time from writing custom code, but it’s not a great fit for databases with a very high rate of transaction commits to the tracked tables. For these databases there are major downsides when it comes to cleaning up those internal tables. You should also consider that the changes written to all the internal tables are also fully logged, which adds further overhead if you have a high level of writes. So you’ve got to choose wisely.
Could writing entirely custom code do better than Change Tracking? That’s up to you!
Are you sure functions in your SQL Server aren’t dragging you down? Join Kendra in this free 30 minute talk to find out the secrets of sneaky scalar functions. You’ll learn which tools misleadingly hide the impact of scalar functions and how you can snoop out the secrets of what’s really going on in your SQL Server.
Note: this is a recording of a live event. Apologies if the text is small– fullscreen view may help read the screenshots.
Run the demo query yourself!
This works with AdventureWorks2012
SELECT ProductID, Name, [dbo].[ufnGetProductListPrice](ProductID,'2008-01-01') as ListPrice2008, [dbo].[ufnGetProductListPrice](ProductID,'2007-01-01') as ListPrice2007 FROM Production.Product; GO
Free tools I mention during the presentation
For more videos like this:
SQL Server has a few options for configuring tempdb. It’s great that Microsoft gives some advice for how to battle some kinds of contention that can impact tempdb in KB 2154845, but things can still be a little confusing.
Trace Flag 1118 – Full Extents Only
KB 2154845 advises that Trace Flag 1118 can help in some situations. That trace flag tells SQL Server that it should avoid “mixed extents” and use “full extents”. (Learn more about extents here.)
This means that each newly allocated object in every database on the instance gets its own private 64KB of data. Tempdb is usually the place where most objects are created, so it makes the most difference there.
Since this trace flag is advocated in KB 2154845, it’s clearly documented as safe to use. But for one reason or another it hasn’t made its way into the list of Trace Flags in SQL Server Books Online.
Trace Flag 1117 – Grow All Files in a FileGroup Equally
Trace flag 1117 changes the behavior of file growth: if one data file in a filegroup grows, it forces other files in that filegroup to ALSO grow. This can be useful for tempdb, which is commonly configured with multiple data files as KB 2154845 advises.
Not everyone likes to implement this trace flag, particularly because it impacts every database on the instance and not just tempdb. Personally, I prefer to pre-grow tempdb files so they fill the tempdb drive, just leaving room for any “free space monitoring” you have. Whenever I can avoid using a trace flag, I do it: just less room to hit an edge condition.
But Trace Flag 1117 is something that some folks like, and it’s recommended in the Fast Track Architecture Guide from Microsoft.
Trace Flag 1117 is also not in the list of Trace Flags in SQL Server Books Online.
Why Care About Trace Flag Documentation?
Imagine this: a critical database server starts throwing frequent stack dumps which are very hard to interpret. Every time it happens, the instance freezes up. You need to fix it fast. You start up a ticket with Microsoft Support. You want that ticket to move as quickly as it possibly can. And if your manager starts asking about your configuration, it’s really nice to have a single link showing you haven’t turned on anything too weird, right?
Your life is much easier if every trace flag you have enabled is clearly documented as safe to run. A blog post like this, or even a blog post on Technet, doesn’t prove that for you. (Technet’s great, but every blogger is fallible, and there’s some old content out there.)
Trace Flag 1118 is documented in the Tempdb KB, so it’s not all that controversial. But Trace Flag 1117 is more of a judgement call at this point. (It’s in the Fast Track Guide, sure, but are you running Fast Track?)
Clarify the Situation: Vote for My Connect Item!
Both of these trace flags have been around for a while. I suspect that they’re fine to use and that Books Online is just a little out of date. But wouldn’t it be nice to be sure?
One of the most popular posts on this site of all time is a SQL Server Setup Checklist that Brent started way back in March 2008. Over the years, we’ve updated the post and kept it fresh for installing and configuring SQL Server 2005, 2008, 2008R2, 2012, and 2014.
Now we’ve taken it one step further: we’ve created a free PDF of the setup guide for you. Download it, print it, and check items off as you work your way through your next SQL Server installation. We’ll be your co-pilots making sure you don’t miss a critical step.
The guide helps you:
- Validate you’ve got the right basic redundancy in your hardware
- Configure and test HBAs and multi-pathing
- Configure your system drive, and page file for SQL Server
- Provision storage for backups
- Create Service Accounts and grant permissions for settings like Instant File Initialization
- Format drives for the best performance
- Set up Anti-Virus with Microsoft recommended best practices
- Install and configure SQL Server
- Set up maintenance to run backups, manage index fragmentation, and check for corruption
- Get basic monitoring and alerting from the SQL Server Agent
- Design the right number of files and layout for tempdb
- Copy SQL Server logins from other servers
- Install free tools and run a health check against SQL Server
Occasionally people ask me if I’m responsible for the cool artwork on our site.
I wish I had those kind of skills!
The artist behind the artwork on our website is the one and only Eric Larsen. Eric’s been great to work with: he doesn’t laugh at our crazy ideas, he makes thoughtful suggestions, and he’s always open to feedback like, “Could you make the public toilet larger, please?”
Check out this writeup on theispot.com that includes a fun collection of the work Eric has done for our website.
Should you use temp tables or table variables in your code? Join Microsoft Certified Master Kendra Little to learn the pros and cons of each structure, and take a sneak peek at new Memory Optimized Table Variables in SQL Server 2014.
This video is a recording of a live webcast, so please excuse any audio and video imperfections. Want the scripts from the webcast? Scroll on down.
We’ll be changing some things in the database, so I like to restore it to a database named “Showdown” for easy cleanup, but you don’t have to do it that way.
use master; GO RESTORE DATABASE Showdown FROM DISK=N'S:\MSSQL\Backup\AdventureWorks2012-Full Database Backup.bak' WITH REPLACE, MOVE 'AdventureWorks2012_Data' TO N'S:\MSSQL\Data\Showdown_Data.mdf', MOVE 'AdventureWorks2012_Log' TO N'S:\MSSQL\Data\Showdown_Log.ldf'; GO ALTER DATABASE Showdown SET RECOVERY SIMPLE; GO ALTER DATABASE Showdown SET COMPATIBILITY_LEVEL=120; GO USE Showdown GO EXEC dbo.sp_changedbowner @loginame = N'sa', @map = false GO
Test Temp Tables: Old School
In this example we create a nonclustered index after the temp table is created. By running it repeatedly you can see that the temp object is recreated each time.
IF OBJECT_ID ('dbo.TempObjectFightClub') IS NULL EXEC ('CREATE PROCEDURE dbo.TempObjectFightClub AS RETURN 0') GO ALTER PROCEDURE dbo.TempObjectFightClub @StateProvinceID int AS CREATE TABLE #TestMe ( i INT IDENTITY PRIMARY KEY CLUSTERED, City VARCHAR(60), StateProvinceID INT ); CREATE NONCLUSTERED INDEX ixTestMe_StateProvinceID ON #TestMe(StateProvinceID); INSERT #TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM [Person].[Address] SELECT City FROM #TestMe WHERE StateProvinceID=@StateProvinceID; SELECT name, ps.index_id, ps.row_count, create_date, modify_date FROM tempdb.sys.objects AS so join tempdb.sys.dm_db_partition_stats ps on so.object_id = ps.object_id WHERE name LIKE N'#%' and row_count > 0 GO EXEC dbo.TempObjectFightClub @StateProvinceID=1; GO
Test Table Variables
Here we use the new SQL Server 2014 inline index creation syntax to create a nonclustered index on the table variable. (We couldn’t do that before!) But looking at the execution plan, things get a little weird.
IF OBJECT_ID ('dbo.TempObjectFightClub') IS NULL EXEC ('CREATE PROCEDURE dbo.TempObjectFightClub AS RETURN 0') GO ALTER PROCEDURE dbo.TempObjectFightClub @StateProvinceID int AS DECLARE @TestMe TABLE ( i INT IDENTITY PRIMARY KEY CLUSTERED (i), City VARCHAR(60), StateProvinceID INT, --This index is created using the 2014 syntax INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID) ); INSERT @TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM [Person].[Address] SELECT City FROM @TestMe WHERE StateProvinceID=@StateProvinceID; SELECT name, ps.index_id, ps.row_count, create_date, modify_date FROM tempdb.sys.objects AS so join tempdb.sys.dm_db_partition_stats ps on so.object_id = ps.object_id WHERE name LIKE N'#%' and row_count > 0; GO EXEC dbo.TempObjectFightClub @StateProvinceID=1; GO <h3>Temp Tables Revisited with Inline Index Declaration</h3> This test shows that the new "inline index" creation in SQL Server 2014 lets us create that nonclustered index when the temp table is created, which then allows the cached object to be re-used. 1 IF OBJECT_ID ('dbo.TempObjectFightClub') IS NULL EXEC ('CREATE PROCEDURE dbo.TempObjectFightClub AS RETURN 0') GO ALTER PROCEDURE dbo.TempObjectFightClub @StateProvinceID int AS CREATE TABLE #TestMe ( i INT IDENTITY PRIMARY KEY CLUSTERED, City VARCHAR(60), StateProvinceID INT, /* This index is created using the 2014 syntax */ INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID) ); INSERT #TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM [Person].[Address] SELECT City FROM #TestMe WHERE StateProvinceID=@StateProvinceID; SELECT name, ps.index_id, ps.row_count, create_date, modify_date FROM tempdb.sys.objects AS so join tempdb.sys.dm_db_partition_stats ps on so.object_id = ps.object_id WHERE name LIKE N'#%' and row_count > 0; GO EXEC dbo.TempObjectFightClub @StateProvinceID=1; GO
Memory Optimized Table Variables
Oh Brave New World. First, enable things:
ALTER DATABASE Showdown ADD FILEGROUP [ImaNewFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA GO ALTER DATABASE Showdown ADD FILE (name='imanewfilegroup_1', filename='S:\MSSQL\Data\imanewfilegroup_1') TO FILEGROUP [ImaNewFileGroup]; GO ALTER DATABASE Showdown SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT ON; GO
Then create the table type:
CREATE TYPE TM1 as TABLE ( City VARCHAR(60) COLLATE Latin1_General_100_BIN2, StateProvinceID INT NOT NULL, INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID) ) WITH (MEMORY_OPTIMIZED = ON) GO
Now, let’s test!
IF OBJECT_ID ('dbo.TempObjectFightClub') IS NULL EXEC ('CREATE PROCEDURE dbo.TempObjectFightClub AS RETURN 0') GO ALTER PROCEDURE dbo.TempObjectFightClub @StateProvinceID int AS DECLARE @TestMe TM1; INSERT @TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM [Person].[Address] SELECT City FROM @TestMe WHERE StateProvinceID=@StateProvinceID; -- Look at the object we're using. Hm, where is it? SELECT * FROM sys.dm_db_xtp_object_stats; SELECT * FROM tempdb.sys.dm_db_xtp_object_stats; GO EXEC dbo.TempObjectFightClub @StateProvinceID=1; GO