Change Tracking is a developer tool introduced in SQL Server 2008 to help you sync data between SQL Servers, or between SQL Servers and devices. It’s designed to help you answer the request, “I need to get the most recent value for rows that have been modified since I last grabbed data.”
First, I’ll give you the top three reasons people use Change Tracking. Next, I’ll give you queries! If you’re using Change Tracking, these will help you document how it’s configured and measure how big your secret internal tables used by Change Tracking have grown to.
Then we’ll peek into query plans for queries using Change Tracking and talk about some common pitfalls that can impact performance.
Finally, we’ll talk blocking and data consistency. There may be some tradeoffs you need to make to get consistent results out of change tracking.
Why do people use Change Tracking?
Here are the top three reasons that Change Tracking with SQL Server sounds attractive to people:
1. Change Tracking doesn’t send every change over the wire. With a technology like transactional replication, every change that’s made is sent off into replication. If I update a single row a million times, that usually means at least a million “commands” head off into my distribution database. But I don’t always want all of those values. Often, I just want the most recent value, so Change Tracking offers a lightweight solution.
2. Change Tracking doesn’t care what platform your client is running. With a technology like merge replication, all the partners need to run merge replication. With Change Tracking, your client application just needs to come in and talk to a web server. The web server then queries out data that’s changed and sends it back to the client. The client doesn’t need to know a thing about SQL Server. (Clarification: there’s a part of merge replication itself which is also called “Change Tracking”. That’s totally different than what I’m talking about in this post.)
3. Change Tracking is cheap. It works in every edition of SQL Server (Standard Edition, Web Edition, you name it). This isn’t true for all your replication options – Change Data Capture requires Enterprise Edition, for example. Licensing SQL Server can be a huge part of your IT budget, so features that work with Standard Edition are often very attractive.
But as you’ll see in this post, there are tradeoffs, and Change Tracking doesn’t fit every scenario.
Are you cleaning up, and do you have a retention period set?
Change Tracking isn’t free. As you’ll see up ahead, it creates secret tables in your database. The first thing to find out is where you have change tracking on, if you have cleanup enabled, and what it cleans up. Check that out by running this query against any database on your SQL Server instance:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
db.name AS change_tracking_db,
FROM sys.change_tracking_databases ct
JOIN sys.databases db on
Are the right tables in Change Tracking?
You need to know which tables are being tracked. The more tables you track changes on, the more SQL Server has to hoard in other secret tables, and the more it will have to clean up. (Assuming you’re cleaning up. If you don’t clean up, that could be bad.)
Run this query against every database that has Change Tracking enabled to find out which tables have changes tracked:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT sc.name as tracked_schema_name,
so.name as tracked_table_name,
ctt.begin_version /*when CT was enabled, or table was truncated */,
ctt.min_valid_version /*syncing applications should only expect data on or after this version */ ,
ctt.cleanup_version /*cleanup may have removed data up to this version */
FROM sys.change_tracking_tables AS ctt
JOIN sys.objects AS so on
JOIN sys.schemas AS sc on
How many committed transactions show in sys.dm_tran_commit_table?
We’ve looked at which tables you’re tracking and how long you’re retaining changes. But how many changes have been committed in that time period? This is the number committed across all the tables in Change Tracking (so the more tables you’re tracking, the bigger this can get).
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
count(*) AS number_commits,
MIN(commit_time) AS minimum_commit_time,
MAX(commit_time) AS maximum_commit_time
How big are your hidden Change Tracking tables?
There’s two primary places where Change Tracking keeps data about what’s changed:
- sys.syscommittab – this is the system table behind the sys.dm_tran_commit_table view, which you saw above
- sys.change_tracking_objectid tables – these are per each table tracked.
This query will show you all the internal tables with their size and rowcount:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
select sct1.name as CT_schema,
sot1.name as CT_table,
ps1.row_count as CT_rows,
ps1.reserved_page_count*8./1024. as CT_reserved_MB,
sct2.name as tracked_schema,
sot2.name as tracked_name,
ps2.row_count as tracked_rows,
ps2.reserved_page_count*8./1024. as tracked_base_table_MB,
change_tracking_min_valid_version(sot2.object_id) as min_valid_version
FROM sys.internal_tables it
JOIN sys.objects sot1 on it.object_id=sot1.object_id
JOIN sys.schemas AS sct1 on
JOIN sys.dm_db_partition_stats ps1 on
it.object_id = ps1. object_id
and ps1.index_id in (0,1)
LEFT JOIN sys.objects sot2 on it.parent_object_id=sot2.object_id
LEFT JOIN sys.schemas AS sct2 on
LEFT JOIN sys.dm_db_partition_stats ps2 on
sot2.object_id = ps2. object_id
and ps2.index_id in (0,1)
WHERE it.internal_type IN (209, 210);
Change Tracking Cleanup: No, you’re not crazy
We’ve just looked at two things that might not completely line up: your minimum commit time, and your cleanup/retention period.
If these don’t match, the root cause might be a few things:
- You’ve hit a bug in SQL Server 2008 and cleanup just isn’t running. See KB 973696, “FIX: Change tracking cleanup does not clean up the sys.syscommittab system table in SQL Server 2008”
- The Auto Cleanup Thread hasn’t run yet. This is a background task. I have observed that if I change the retention period on an existing database with change tracking it can take quite a while for the CHANGE_TRACKING_MIN_VALID_VERSION to update for that table. It takes about 30 minutes. The best documentation I can find for this is the comment from Jan 5, 2010 this Microsoft Connect Item.
If you adjust cleanup thresholds and you don’t see cleanup happening right away, you’re not crazy.
I bet I know what you’re thinking right now. “How do I make cleanup run when I want it to?”
Well, I can’t give you a documented way to do it. However, Nic Cain will show you how to use the undocumented sys.sp_flush_commit_table_on_demand procedure. Nic has also written on some limitations that he found in Change Tracking cleanup here, which might require you to run cleanup manually if you have a lot of changes flowing through your system.
This is undocumented, so test carefully and use it at your own risk. You can review the code it runs by running:
exec sp_helptext 'sys.sp_flush_commit_table_on_demand';
As Nic warns, your manual cleanup could conflict with automatic cleanup. So if you choose to use this undocumented procedure, you may need to disable auto-cleanup for a time period. That requires running an ALTER DATABASE command, which might impact your execution plan cache and will require its own locks.
Note: I have not seen this automatically start cleaning up rows immediately after I change the retention period on a restored database. The cleanup procedure relies on results from the hidden internal functions change_tracking_hardened_cleanup_version() and safe_cleanup_version(), and I have not found a way to force updating the value used by that function on SQL Server 2012 SP1, I’ve had to wait 30 minutes like I mention above.
So again, even using this procedure, you might at times wonder if you’re crazy, or if Change Tracking is haunted. Be patient, be careful, and write some lightweight code to monitor and alert you if those internal tables get too big for your tastes.
Is your code using the CHANGE_TRACKING_MIN_VALID_VERSION function?
Now that you know if cleanup is turned on and how it’s working, check if your code is calling CHANGE_TRACKING_MIN_VALID_VERSION() at any point.
This is important to make sure the data you’re pulling is correct. Let’s say I’m using Change Tracking to move data to a secondary SQL Server. Normally it polls data every hour, but there’s a hardware problem and my secondary server is offline for 3 days. The next time it polls, it needs to make sure that all the data it requires is still on the publishing server.
This means your clients need to store off the highest version they pull down, and check it against CHANGE_TRACKING_MIN_VALID_VERSION() each time they poll.
Your client may also need to use a special isolation level called SNAPSHOT. (More on this soon!)
How are you querying with CHANGETABLE?
Here’s an example of the columns you can work with when querying:
CHANGE_TRACKING_CURRENT_VERSION() AS current_version
FROM Person.Person AS p
JOIN CHANGETABLE(CHANGES Person.Person, 2) AS c
ON p.BusinessEntityID = c.BusinessEntityID;
For each row that’s changed in the Person.Person table since revision number 2, I get the latest version of the FirstName, MiddleName and LastName for that row. I can see what the highest version number of those changes were, and I can also see the current highest version number for all tracked changes in the current database.
Could Trace Flag 8295 Help Your Performance? (TF 8295)
Trace Flag 8295 is documented in KB 2476322 and applies to SQL Server 2008 R2 CU7 / SQL Server 2008 SP2 CU4. After the trace flag is turned on, an additional nonclustered index is created on internal tables when you enable Change Tracking for that table.
There’s a big old disclaimer on that KB that says you need to test thoroughly in a non-production environment before making this change. (Hey, that’s a good general rule for trace flags in general.) I personally have not tested this trace flag, but it is used to support Change Tracking features by default by some System Center / SCOM products. This blog post by Santos Martinez gives some details.
What do your execution plans look like?
I’ve been giving you lots of ways to look at the internal objects for Change Tracking for a reason: the number of rows in the internal tables impacts performance.
Let’s say that my client last synced at revision number 1 million. A lot of changes have happened since then– 4 million in this table, and a handful of changes in other tables. I run a query like this to see what’s changed:
FROM Production.TransactionHistory AS th
JOIN CHANGETABLE(CHANGES Production.TransactionHistory, 1000000) AS c
ON th.TransactionID = c.TransactionID
The execution plan looks like this:
All of the tables with thick lines to the right side of the plan are the change tracking related tables. Looking at our STATISTICS IO and STATISTICS TIME output, here’s where we spent all our CPU usage:
Table 'TransactionHistory'. Scan count 0, logical reads 3, physical reads 0,
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'change_tracking_574625090'. Scan count 4001000, logical reads 12795765, physical reads 0
Table 'syscommittab'. Scan count 1, logical reads 16158, physical reads 0
SQL Server Execution Times:
CPU time = 7735 ms, elapsed time = 7755 ms.
(For readability, I removed the printed messages for readahead, lob, etc– those were all 0.)
Here’s the story the execution plan told me as I stepped through it:
- First, SQL Server pulled the commit_ts (it’s a timestamp!) and xdes_id (transactional identifier) columns for every commit that occurred since our revision number from sys.syscomittable. It used a function, an index scan, and a merge join to do this. It found 4,001,000 rows with these columns.
- SQL Server then looked for the corresponding TransactionID values in the sys.change_tracking_574625090 table. Unfortunately it did an index seek 4,001,000 times to do this. It found 4,000,000 values matching the commit_ts and xdex_ids. (The other 1K were changes made for a different tracked table.)
- The 4 million rows were pushed into a hash operation to find the unique TransactionIDs for all those revisions. It was a single row. (It had just been changed 4 million times.)
- The TransactionID key was then used to look up the related columns in a single, efficient, clustered index seek. Hey, that part was fast!
The whole process took 7.5 seconds. Of this, 99% of the work was just identifying what had changed.
This may seem like an extreme example, but it’s really not. If you’re using Change Tracking on very frequently modified table or you’re not cleaning up very often (or at all), your overhead may in fact be far greater. I’ve seen undersized hash joins and tempdb spills in queries like this where revisions have piled up.
Change tracking, statistics, and parameter sniffing
Sometimes Change Tracking can be smart enough to give you a better execution plan based on the revision number you pass in. Whether or not it can do this well depends on two things:
- The freshness of statistics on your tables, including the internal change tracking tables
- Whether or not your revision number might be having a problem with parameter sniffing
If you have highly variable runtimes on reads from your Change Tracking tables, either of these might be be part of the problem.
Blocking, Cleanup, and Snapshot Isolation
Let’s talk about two problem scenarios that can happen with Change Tracking. They’re both related to that pesky background cleanup thread that you need to run so that your tables don’t bloat terribly, but which you can’t schedule.
Problem 1: blocking. When the cleanup thread runs, things get slow and you see locking and blocking. Investigation shows that it’s on the Change Tracking tables.
Problem 2: inconsistent results. A client comes in and says, “Hi, I need everything that changed after revision 99!” It checks the minimum revision and it’s below 99, so it begins a transaction and starts pulling down data from multiple tables. While it’s happening, new data is coming into some tables. And also, cleanup starts before it ends, and cleans up through revision 105 on some tables. Oh, we’ve got problems — we will have pulled down MORE revisions for some tables than others. And we will also have missed some revisions for some tables. We’re out of sync.
Because of these two problems, Microsoft strongly recommends that you use transactions and Snapshot isolation when reading and synchronizing data. It will protect you from those inconsistencies, which makes coding simpler, but of course there are tradeoffs. Learn more about implementing Snapshot isolation here.
Backups, Checkpoints, and Change Tracking
There’s a bug where duplicate rows in change tracking tables can cause checkpoints and backups to fail. Get information on the fix in KB 2682488.
Takeaways: Performance Tuning Change Tracking
Here’s the steps we’ve covered that will help you make Change Tracking go faster:
- Make sure you’re only tracking changes on tables that needed it. Extra tables in change tracking will bloat internal tables, and this can impact query performance.
- Make sure that your code properly checks CHANGE_TRACKING_MIN_VALID_VERSION so that you can run cleanup (or you’ll get troublesome bloat in internal tables).
- Validate that your cleanup jobs can keep up (and monitor it).
- Review and test using the undocumented procedure SYS.SP_FLUSH_COMMIT_TABLE_ON_DEMAND for cleanup if needed, but be sure to get approval since… well, it’s undocumented.
- Use query execution plans and STATISTICS IO and STATISTICS TIME to measure how much of your query runtime is being spent on Change Tracking tables.
- Look out for out-of-date statistics on Change Tracking internal tables, and parameter sniffing on procedures using Change Tracking.
- Use SNAPSHOT isolation to ensure you get consistent sets of data, even with cleanup running and new data coming in. It will also help you avoid pesky blocking situations.
Should you use Change Tracking?
As you may have guessed from this post, Change Tracking isn’t something that just configures itself and works perfectly for every application. But do you know of any other built-in data replication features which are like that?
Synchronizing data is hard. Change Tracking offers you some built in features that may save you time from writing custom code, but it’s not a great fit for databases with a very high rate of transaction commits to the tracked tables. For these databases there are major downsides when it comes to cleaning up those internal tables. You should also consider that the changes written to all the internal tables are also fully logged, which adds further overhead if you have a high level of writes. So you’ve got to choose wisely.
Could writing entirely custom code do better than Change Tracking? That’s up to you!