Blog

New sp_Blitz® v35 Checks for Unsupported and Dangerous SQL Server Builds

News broke recently of a dangerous data loss bug in SQL Server 2012 and 2014, and Aaron Bertrand explained which patch levels are affected. It’s kinda tricky, and I’m afraid most people aren’t even going to know about the bug – let alone whether or not they’re on a bad version.

I added build number checking into the latest version of sp_Blitz® so that you can just run it. If your build has the dangerous data loss bug, you’ll get a priority 20 warning about a dangerous build. If you’re running an out-of-support version, like SQL 2012 RTM with no service packs, you’ll get a priority 20 warning about that as well.

Other changes in the last couple of versions:

Changes in v35 – June 18, 2014:

  • John Hill fixed a bug in check 134 looking for deadlocks.
  • Robert Virag improved check 19 looking for replication subscribers.
  • Russell Hart improved check 34 to avoid blocking during restores.
  • Added check 126 for priority boost enabled. It was always in the non-default configurations check, but this one is so bad we called it out.
  • Added checks 128 and 129 for unsupported builds of SQL Server.
  • Added check 127 for unneccessary backups of ReportServerTempDB.
  • Changed fill factor threshold to <80% to match sp_BlitzIndex.

Changes in v34 – April 2, 2014:

  • Jason Pritchard fixed a bug in the plan cache analysis that did not return results when analyzing for high logical reads.
  • Kirby Richter @SqlKirby fixed a bug in check 75 (t-log sizes) that failed on really big transaction log files. (Not even gonna say how big.)
  • Oleg Ivashov improved check 94 (jobs without failure emails) to exclude SSRS jobs.
  • Added @SummaryMode parameter to return only one result set per finding.
  • Added check 124 for Performance: Deadlocks Happening Daily. Looks for more than 10 deadlocks per day.
  • Moved check 121 for Performance: Serializable Locking to be lower priority (down to 100 from 10) and only triggers when more than 10 minutes of the wait have happened since startup.
  • Changed checks 107-109 for Poison Waits to have higher thresholds, now looking at more than 5 seconds per hour of server uptime. Been up for 10 hours, we look for 50 seconds, that kind of thing.

Download the latest sp_Blitz®, or head over to the introduction page. Enjoy!

Monitoring SQL Server Transactional Replication

replication-latency

“It will catch up eventually. I hope.”

It was the best of times, it was the worst of times. I was a SQL Server DBA, and if something went wrong in Transactional Replication I needed to find out about it right away and help keep things healthy, day or night. Here’s what I learned from that experience about monitoring replication.

If you’re just getting started and need an introduction to transactional replication, head over here.

Tracer Tokens Aren’t Really Your Friend

“Tracer Tokens” were introduced in SQL Server 2005. They sound awfully good. Books Online explains that you can automate them using sys.sp_posttracertoken and report on them using sp_helptracertokenhistory.

There’s a big problem: tracer tokens are too patient.

Let’s say my replication is incredibly overwhelmed and I send out a tracer token. I won’t hear back until it reaches its destination or definitively fails. That could be a very, very long time. The fact that it’s potentially unknown means I don’t want to rely heavily on it for monitoring.

Don’t Rely Too Much on Replication Monitor (REPLMON.exe)

When replication is behind, it’s natural to turn to Replication Monitor. The first five links in “Monitoring Replication” in Books Online point to it, after all.

Replication Monitor isn’t all bad. But don’t depend on it too much, either.

  • Replication Monitor is a tool to help you answer the question “how are things doing right now?” It doesn’t baseline or give the kind of historical info that your manager wants to see.
  • Replication Monitor may run queries to count the number of undistributed commands that may take a while to run and be performance intensive (particularly when things get backed up in the distributor).

I’ve personally seem some cases where running more than one instance of Replication Monitor while a publication snapshot was being taken also caused blocking. Too many people checking to see “how much longer will this take?” actually caused things to take longer. It’s not just me, Microsoft recommends you avoid running multiple instances of Replication Monitor.

ReplMon protip: You can disable automatic refreshing for the Replication Monitor UI, and just refresh the data when you need it. More info in Books Online here. (Thanks to John Samson for this tip.)

Replication Monitor is useful, but you’re better off if people can get information on replication health without everyone having to run Replmon. You can do this fairly easily by using simpler tools to create dashboards to chart replication latency.

Easy Replication Monitoring: Alert on Latency with Canary Tables

It’s easy to build your own system for tracking replication latency for each publication. Here are the ingredients for the simplest version:

  • Add a table named dbo.Canary_PubName to each publication
  • dbo.Canary_PubName has a single row with a datetime column in it
  • A SQL Server Agent job on the publisher updates the datetime to the current timestamp every minute
  • A SQL Server Agent job on the subscriber checks dbo.Canary_PubName every minute and alerts if the difference between the current time and the timestamp is greater than N minutes

It’s very simple to extend this to a simple dashboard using a third party monitoring tool or SQL Server Reporting Services: you simply poll all the dbo.Canary tables and report on the number of minutes of latency on each server.

This simple process gets around the weaknesses of tracer tokens, and also gives you immediate insight into how much latency you have on each subscriber. Bonus: this exact same technique also works well with logshipping and AlwaysOn Availability Groups. Tastes great, less filling.

Medium Replication Monitoring: Notify when Undistributed Commands Rise in the Distribution Database

The distribution database is a special place for Transactional Replication. The log reader agent pulls information on what’s changed from the transaction log of the publication database and translates it into commands that hang out in the distribution database before the changes go out to subscribers.

If you have a lot of data modification occurring on the publisher, you can get a big backup of commands in the distribution database.

If replication performance is important, set up a SQL Server Agent job on your distribution server to regularly check the amount of undistributed commands with a script like Robert Davis provides here. Have it alert you when the commands go above a given threshold.

Real world example: When I was the DBA for an environment with mission-critical replication, we would warn when undistributed commands rose above 500K and create a severity-1 ticket when they rose above 1 million. We did this after setting up dashboards to baseline replication latency and also baselining the amount of undistributed commands in distribution, so that we knew what our infrastructure could recover from and what might need DBA attention to recover in time.

Difficult Replication Monitoring: Alert When Individual Articles are Unhealthy

Here’s where things get tricky. It’s very difficult to prove that all articles in replication are healthy. The steps up to this point have tracked latency for the entire publication and bottlenecks in the distribution database.Things get pretty custom if you need to prove that individual tables are all up to date.

I once had a situation where a code release removed some articles from replication, modified the tables and data significantly, then re-added the articles to replication.

There was an issue with the scripts and one of the articles didn’t get put back into replication properly at the end of the process. Replication was working just fine. No script had explicitly dropped the table from the subscriber, so it just hung out there with stale data. The problem wasn’t discovered for a few days, and it was a bit difficult to track down. Unfortunately, the next week was kind of a downer because a lot of data had to be re-processed after that article was fixed.

Here’s what’s tricky: typically some articles change much more often than others. Monitoring individual articles typically requires baselining “normal” latency per article, then writing custom code that checks each article against the allowed latency. This is significantly more difficult for any large articles that don’t have a “Last Modified Date” style column.

(Disclaimer: in the case that you don’t have a “Last Modified” date on your subscriber, I do not suggest layering Change Tracking on top of the replication subscriber. If you are tempted to do that, first read my post on Performance Tuning Change Tracking, then go through all the steps that you would do if you needed to re-initialize replication or make schema changes on articles. You’ll change your mind by the end.)

Special Cases: The “Desktop Heap” is Used Up

This is a special case for replication. If you have a large amount of replication agents on a single server (such as 200 or more), you may run into issues where things just silently stop working due to desktop heap exhaustion. This is an issue that can be hard to identify because the agents just stop working!

Canary tables can help monitor for this, but you’ll need a lot of them since this can happen on an agent-by-agent basis. Read more about fixing desktop heap problem in replication in KB 949296. (Thanks to Michael Bourgon for suggesting we include this.)

Test Your Monitoring out in Staging

The #1 mistake I find with transactional replication is ignoring the staging environment. This is critical to supporting replication and creating effective monitoring for it.

The staging environment isn’t the same thing as development or QA. It’s a place where you have the same number of SQL Server instances as production, and the same replication setup as production. You test changes against staging before they go to production. You can also use it to test replication changes.

Staging is also where you confirm that your replication monitoring works. Data probably doesn’t constantly change in your staging environment, but that’s OK. Use canary tables and get creative to simulate load for test purposes.

Do You Have a Technique for Monitoring Replication Not Listed Here?

Tell us about it in the comments!

Let’s Test Your Backups Together [Video]

You manage SQL Server databases, but you never get the chance to take time out of your busy day to test your backups. You assume that just because the jobs are succeeding, that you’ll be able to restore your databases when disaster strikes. Join Brent Ozar as he walks you through several queries of your MSDB backup history tables, checks your RPO and RTO live, and helps you build a recovery strategy for your production databases in this one-hour video.

For more videos like this:

Our Sessions at the #SQLPASS Summit 2014

The session list has been published, and we’re excited to say all of us have been selected to speak at the PASS Summit again this year. Here’s our sessions:

Are Your Indexes Hurting You or Helping You? – Jes Schultz Borland – Queries need your help! Your mission, should you choose to accept it, is to make great decisions about what indexes are best for your workload. In this session, we’ll review the difference between clustered and nonclustered indexes, show when to use included columns, understand what sargability means, and introduce statistics. You’ll leave this session with the ability to confidently determine why, or why not, SQL Server uses your indexes when executing queries.

Developers: Who Needs a DBA? – Brent Ozar – You store data in SQL Server, but you don’t have enough work to keep a full-time DBA busy. In just one session, you’ll learn the basics of performance troubleshooting, backup, index tuning, and security. Brent Ozar, recovering developer, will teach you the basic care and feeding of a Microsoft SQL Server 2005, 2008, 2012, or 2014 instance and give you scripts to keep you out of trouble.

pass-summit-2014

Dynamic SQL: Build Fast, Flexible Queries – Jeremiah Peschka – Dynamic SQL is a misunderstood and much maligned part of a DBA’s tool kit – it can be used to solve difficult business problems, respond to diverse data needs, and alleviate performance problems. Many DBAs reject dynamic SQL outright as a potential source of SQL injections, being poorly performing, or just for being a hacky solution in general. Not so! Jeremiah Peschka has been making extensive use of dynamic SQL throughout his career to solve a variety of problems. In this session, we’ll be dispelling these misconceptions and demonstrating how dynamic SQL can become a part of every DBA’s tool kit.

From Minutes to Milliseconds: High-Performance SSRS Tuning – Doug Lane – Even though you’re an experienced report developer or administrator, performance tuning for SQL Server Reporting Services still feels as bewildering and hopeless as folding a fitted bed sheet. You’ve made your data sets smaller and timeouts longer, but it’s not enough to remove the slowness dragging down your reporting environment. In this session, you’ll learn how design and configuration choices put pressure on your report server and techniques to relieve that pressure. You’ll see how to configure your Reporting Services databases for speed, streamline your subscription schedules, and use caching for high-demand reports. You’ll also learn some design strategies to lighten your report processing load. If you want to maximize the speed of your Reporting Services environment and minimize the pain of performance tuning, this session is for you.

Lightning Talk: Conquer CXPACKET and Master MAXDOP – Brent Ozar – CXPACKET waits don’t mean you should set MAXDOP = 1. Microsoft Certified Master Brent Ozar will boil it all down and simplify CXPACKET to show you the real problem – and what you should do about it – in one quick 10-minute Lightning Talk.

Why Does SQL Server Keep Asking For This Index? – Kendra Little – SQL Server says you’d really benefit from an index, but you’d like to know why. Kendra Little will give you scripts to find which queries are asking for a specific missing index. You’ll learn to predict how a new index will change read patterns on the table, whether you need to add the exact index SQL Server is requesting, and how to measure performance improvements from your index changes. If you’re comfortable querying SQL Server’s missing index DMVs and looking at an execution plan here and there, this session is for you.

World’s Worst Performance Tuning Techniques – Kendra Little – Could one of your tricks for making queries faster be way off base? Kendra Little is a Microsoft Certified Master in SQL Server and a performance tuning consultant, which means she’s learned lots of lessons from her mistakes. In this session, you will learn how to stop obsessively updating or creating statistics, find alternatives to forcing an index, and deal with an addiction to ‘recompile’ hints.

Pre-Conference Session: psych! We didn’t get picked for an official pre-con this year (I know, right?) so we’re building our own lunar lander. Stay tuned – we’ll get things ironed out in the next week or two so you can make your official plans.

Performance Tuning SQL Server Change Tracking

Change Tracking is a developer tool introduced in SQL Server 2008 to help you sync data between SQL Servers, or between SQL Servers and devices. It’s designed to help you answer the request, “I need to get the most recent value for rows that have been modified since I last grabbed data.”

First, I’ll give you the top three reasons people use Change Tracking. Next, I’ll give you queries! If you’re using Change Tracking, these will help you document how it’s configured and measure how big your secret internal tables used by Change Tracking have grown to.

Then we’ll peek into query plans for queries using Change Tracking and talk about some common pitfalls that can impact performance.

Finally, we’ll talk blocking and data consistency. There may be some tradeoffs you need to make to get consistent results out of change tracking.

Why do people use Change Tracking?

Here are the top three reasons that Change Tracking with SQL Server sounds attractive to people:

1. Change Tracking doesn’t send every change over the wire. With a technology like transactional replication, every change that’s made is sent off into replication. If I update a single row a million times, that usually means at least a million “commands” head off into my distribution database. But I don’t always want all of those values. Often, I just want the most recent value, so Change Tracking offers a lightweight solution.

2. Change Tracking doesn’t care what platform your client is running. With a technology like merge replication, all the partners need to run merge replication. With Change Tracking, your client application just needs to come in and talk to a web server. The web server then queries out data that’s changed and sends it back to the client. The client doesn’t need to know a thing about SQL Server. (Clarification: there’s a part of merge replication itself which is also called “Change Tracking”. That’s totally different than what I’m talking about in this post.)

3. Change Tracking is cheap. It works in every edition of SQL Server (Standard Edition, Web Edition, you name it). This isn’t true for all your replication options – Change Data Capture requires Enterprise Edition, for example. Licensing SQL Server can be a huge part of your IT budget, so features that work with Standard Edition are often very attractive.

But as you’ll see in this post, there are tradeoffs, and Change Tracking doesn’t fit every scenario.

Are you cleaning up, and do you have a retention period set?

Change Tracking isn’t free. As you’ll see up ahead, it creates secret tables in your database. The first thing to find out is where you have change tracking on, if you have cleanup enabled, and what it cleans up. Check that out by running this query against any database on your SQL Server instance:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO
SELECT
	db.name AS change_tracking_db,
	is_auto_cleanup_on,
	retention_period,
	retention_period_units_desc
FROM sys.change_tracking_databases ct
JOIN sys.databases db on
	ct.database_id=db.database_id;
GO
One is not like the other

One of these is not like the other.

Are the right tables in Change Tracking?

You need to know which tables are being tracked. The more tables you track changes on, the more SQL Server has to hoard in other secret tables, and the more it will have to clean up. (Assuming you’re cleaning up. If you don’t clean up, that could be bad.)

Run this query against every database that has Change Tracking enabled to find out which tables have changes tracked:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO
SELECT sc.name as tracked_schema_name,
	so.name as tracked_table_name,
	ctt.is_track_columns_updated_on,
	ctt.begin_version /*when CT was enabled, or table was truncated */,
	ctt.min_valid_version /*syncing applications should only expect data on or after this version */ ,
	ctt.cleanup_version /*cleanup may have removed data up to this version */
FROM sys.change_tracking_tables AS ctt
JOIN sys.objects AS so on
	ctt.[object_id]=so.[object_id]
JOIN sys.schemas AS sc on
	so.schema_id=sc.schema_id;
GO
change-tracking-change_tracking_tables

I just have two tables in Change Tracking. You might have a LOT more.

 

How many committed transactions show in sys.dm_tran_commit_table?

We’ve looked at which tables you’re tracking and how long you’re retaining changes. But how many changes have been committed in that time period? This is the number committed across all the tables in Change Tracking (so the more tables you’re tracking, the bigger this can get).

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO
SELECT
	count(*) AS number_commits,
	MIN(commit_time) AS minimum_commit_time,
	MAX(commit_time) AS maximum_commit_time
FROM sys.dm_tran_commit_table
GO
At this moment, things were VERY clean.

At this moment, things were VERY clean.

How big are your hidden Change Tracking tables?

There’s two primary places where Change Tracking keeps data about what’s changed:

  • sys.syscommittab – this is the system table behind the sys.dm_tran_commit_table view, which you saw above
  • sys.change_tracking_objectid tables – these are per each table tracked.

This query will show you all the internal tables with their size and rowcount:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO
select sct1.name as CT_schema,
	sot1.name as CT_table,
	ps1.row_count as CT_rows,
	ps1.reserved_page_count*8./1024. as CT_reserved_MB,
	sct2.name as tracked_schema,
	sot2.name as tracked_name,
	ps2.row_count as tracked_rows,
	ps2.reserved_page_count*8./1024. as tracked_base_table_MB,
	change_tracking_min_valid_version(sot2.object_id) as min_valid_version
FROM sys.internal_tables it
JOIN sys.objects sot1 on it.object_id=sot1.object_id
JOIN sys.schemas AS sct1 on
	sot1.schema_id=sct1.schema_id
JOIN sys.dm_db_partition_stats ps1 on
	it.object_id = ps1. object_id
	and ps1.index_id in (0,1)
LEFT JOIN sys.objects sot2 on it.parent_object_id=sot2.object_id
LEFT JOIN sys.schemas AS sct2 on
	sot2.schema_id=sct2.schema_id
LEFT JOIN sys.dm_db_partition_stats ps2 on
	sot2.object_id = ps2. object_id
	and ps2.index_id in (0,1)
WHERE it.internal_type IN (209, 210);
GO
change tracking table sizes

Oooo, at this moment things were NOT cleaned up! Compare the size and rows in my sys.syscommittab table to everything else!

Change Tracking Cleanup: No, you’re not crazy

We’ve just looked at two things that might not completely line up: your minimum commit time, and your cleanup/retention period.

If these don’t match, the root cause might be a few things:

  1. You’ve hit a bug in SQL Server 2008 and cleanup just isn’t running. See KB 973696, “FIX: Change tracking cleanup does not clean up the sys.syscommittab system table in SQL Server 2008″
  2. The Auto Cleanup Thread hasn’t run yet. This is a background task. I have observed that if I change the retention period on an existing database with change tracking it can take quite a while for the CHANGE_TRACKING_MIN_VALID_VERSION to update for that table. It takes about 30 minutes. The best documentation I can find for this is the comment from Jan 5, 2010 this Microsoft Connect Item.

If you adjust cleanup thresholds and you don’t see cleanup happening right away, you’re not crazy.

sys.sp_flush_commit_table_on_demand

I bet I know what you’re thinking right now. “How do I make cleanup run when I want it to?”

Well, I can’t give you a documented way to do it. However, Nic Cain will show you how to use the undocumented sys.sp_flush_commit_table_on_demand procedure. Nic has also written on some limitations that he found in Change Tracking cleanup here, which might require you to run cleanup manually if you have a lot of changes flowing through your system.

This is undocumented, so test carefully and use it at your own risk. You can review the code it runs by running:

exec sp_helptext 'sys.sp_flush_commit_table_on_demand'
sp_flush_commit_table_on_demand

In search of Sysadmin who loves undocumented procedures and long walks on the Change Tracking internal tables.

As Nic warns, your manual cleanup could conflict with automatic cleanup. So if you choose to use this undocumented procedure, you may need to disable auto-cleanup for a time period. That requires running an ALTER DATABASE command, which might impact your execution plan cache and will require its own locks.

Note: I have not seen this automatically start cleaning up rows immediately after I change the retention period on a restored database. The cleanup procedure relies on results from the hidden internal functions change_tracking_hardened_cleanup_version() and safe_cleanup_version(), and I have not found a way to force updating the value used by that function on SQL Server 2012 SP1, I’ve had to wait 30 minutes like I mention above.

So again, even using this procedure, you might at times wonder if you’re crazy, or if Change Tracking is haunted. Be patient, be careful, and write some lightweight code to monitor and alert you if those internal tables get too big for your tastes.

Is your code using the CHANGE_TRACKING_MIN_VALID_VERSION function?

Now that you know if cleanup is turned on and how it’s working, check if your code is calling CHANGE_TRACKING_MIN_VALID_VERSION() at any point.

This is important to make sure the data you’re pulling is correct. Let’s say I’m using Change Tracking to move data to a secondary SQL Server. Normally it polls data every hour, but there’s a hardware problem and my secondary server is offline for 3 days. The next time it polls, it needs to make sure that all the data it requires is still on the publishing server.

This means your clients need to store off the highest version they pull down, and check it against CHANGE_TRACKING_MIN_VALID_VERSION() each time they poll.

Your client may also need to use a special isolation level called SNAPSHOT. (More on this soon!)

How are you querying with CHANGETABLE?

Here’s an example of the columns you can work with when querying:

SELECT p.FirstName,
	p.MiddleName,
	p.LastName,
	c.SYS_CHANGE_VERSION,
	CHANGE_TRACKING_CURRENT_VERSION() AS current_version
FROM Person.Person AS p
JOIN  CHANGETABLE(CHANGES Person.Person, 2) AS c
ON p.BusinessEntityID = c.BusinessEntityID;
GO

For each row that’s changed in the Person.Person table since revision number 2, I get the latest version of the FirstName, MiddleName and LastName for that row. I can see what the highest version number of those changes were, and I can also see the current highest version number for all tracked changes in the current database.

change-tracking-change-version-vs-current-version

Could Trace Flag 8295 Help Your Performance? (TF 8295)

Trace Flag 8295 is documented in KB 2476322 and applies to SQL Server 2008 R2 CU7 / SQL Server 2008 SP2 CU4. After the trace flag is turned on, an additional nonclustered index is created on internal tables when you enable Change Tracking for that table.

There’s a big old disclaimer on that KB that says you need to test thoroughly in a non-production environment before making this change. (Hey, that’s a good general rule for trace flags in general.) I personally have not tested this trace flag, but it is used to support Change Tracking features by default by some System Center / SCOM products. This blog post by Santos Martinez gives some details.

(Thanks to Adam Machanic for letting me know about this trace flag, and to André Kamman and Edmund Poillion for helping track down the relevant documentation.)

What do your execution plans look like?

I’ve been giving you lots of ways to look at the internal objects for Change Tracking for a reason: the number of rows in the internal tables impacts performance.

Let’s say that my client last synced at revision number 1 million. A lot of changes have happened since then– 4 million in this table, and a handful of changes in other tables. I run a query like this to see what’s changed:

SELECT th.TransactionID,
	th.ProductID,
	th.TransactionDate
FROM Production.TransactionHistory AS th
JOIN  CHANGETABLE(CHANGES Production.TransactionHistory, 1000000) AS c
ON th.TransactionID = c.TransactionID
GO

The execution plan looks like this:

change-tracking-execution-plan

Click on the execution plan for a larger view.

All of the tables with thick lines to the right side of the plan are the change tracking related tables. Looking at our STATISTICS IO and STATISTICS TIME output, here’s where we spent all our CPU usage:

Table 'TransactionHistory'. Scan count 0, logical reads 3, physical reads 0,
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'change_tracking_574625090'. Scan count 4001000, logical reads 12795765, physical reads 0
Table 'syscommittab'. Scan count 1, logical reads 16158, physical reads 0

 SQL Server Execution Times:
   CPU time = 7735 ms,  elapsed time = 7755 ms.

(For readability, I removed the printed messages for readahead, lob, etc– those were all 0.)

Here’s the story the execution plan told me as I stepped through it:

  • First, SQL Server pulled the commit_ts (it’s a timestamp!) and xdes_id (transactional identifier) columns for every commit that occurred since our revision number from sys.syscomittable. It used a function, an index scan, and a merge join to do this. It found 4,001,000 rows with these columns.
  • SQL Server then looked for the corresponding TransactionID values in the sys.change_tracking_574625090 table. Unfortunately it did an index seek 4,001,000 times to do this. It found 4,000,000 values matching the commit_ts and xdex_ids. (The other 1K were changes made for a different tracked table.)
  • The 4 million rows were pushed into a hash operation to find the unique TransactionIDs for all those revisions. It was a single row. (It had just been changed 4 million times.)
  • The TransactionID key was then used to look up the related columns in a single, efficient, clustered index seek. Hey, that part was fast!

The whole process took 7.5 seconds. Of this, 99% of the work was just identifying what had changed.

Oops.

This may seem like an extreme example, but it’s really not. If you’re using Change Tracking on very frequently modified table or you’re not cleaning up very often (or at all), your overhead may in fact be far greater. I’ve seen undersized hash joins and tempdb spills in queries like this where revisions have piled up.

Change tracking, statistics, and parameter sniffing

Sometimes Change Tracking can be smart enough to give you a better execution plan based on the revision number you pass in. Whether or not it can do this well depends on two things:

If you have highly variable runtimes on reads from your Change Tracking tables, either of these might be be part of the problem.

Blocking, Cleanup, and Snapshot Isolation

Let’s talk about two problem scenarios that can happen with Change Tracking. They’re both related to that pesky background cleanup thread that you need to run so that your tables don’t bloat terribly, but which you can’t schedule.

Problem 1: blocking. When the cleanup thread runs, things get slow and you see locking and blocking. Investigation shows that it’s on the Change Tracking tables.

Problem 2: inconsistent results. A client comes in and says, “Hi, I need everything that changed after revision 99!” It checks the minimum revision and it’s below 99, so it begins a transaction and starts pulling down data from multiple tables. While it’s happening, new data is coming into some tables. And also, cleanup starts before it ends, and cleans up through revision 105 on some tables. Oh, we’ve got problems — we will have pulled down MORE revisions for some tables than others. And we will also have missed some revisions for some tables. We’re out of sync.

Because of these two problems, Microsoft strongly recommends that you use transactions and Snapshot isolation when reading and synchronizing data. It will protect you from those inconsistencies, which makes coding simpler, but of course there are tradeoffs. Learn more about implementing Snapshot isolation here.

Takeaways: Performance Tuning Change Tracking

Here’s the steps we’ve covered that will help you make Change Tracking go faster:

  1. Make sure you’re only tracking changes on tables that needed it. Extra tables in change tracking will bloat internal tables, and this can impact query performance.
  2. Make sure that your code properly checks CHANGE_TRACKING_MIN_VALID_VERSION so that you can run cleanup (or you’ll get troublesome bloat in internal tables).
  3. Validate that your cleanup jobs can keep up (and monitor it).
  4. Review and test using the undocumented procedure SYS.SP_FLUSH_COMMIT_TABLE_ON_DEMAND for cleanup if needed, but be sure to get approval since… well, it’s undocumented.
  5. Use query execution plans and STATISTICS IO and STATISTICS TIME to measure how much of your query runtime is being spent on Change Tracking tables.
  6. Look out for out-of-date statistics on Change Tracking internal tables, and parameter sniffing on procedures using Change Tracking.
  7. Use SNAPSHOT isolation to ensure you get consistent sets of data, even with cleanup running and new data coming in. It will also help you avoid pesky blocking situations.

Should you use Change Tracking?

As you may have guessed from this post, Change Tracking isn’t something that just configures itself and works perfectly for every application. But do you know of any other built-in data replication features which are like that?

Synchronizing data is hard. Change Tracking offers you some built in features that may save you time from writing custom code, but it’s not a great fit for databases with a very high rate of transaction commits to the tracked tables. For these databases there are major downsides when it comes to cleaning up those internal tables. You should also consider that the changes written to all the internal tables are also fully logged, which adds further overhead if you have a high level of writes. So you’ve got to choose wisely.

Could writing entirely custom code do better than Change Tracking? That’s up to you!

How to Pick a Monitoring Tool

Step 1: Make a list of 5 problems you’ve faced in the last couple of months that you needed alerting on. If you’ve got a help desk ticket system, look at the ticket types that occur most frequently and cause the most outage times.

For me as a DBA, that might be:

  • SQL Server service down
  • Deadlock occurs
  • A runaway query consumes high CPU
  • An Agent job is running more than 2x the time it usually takes to run, and it’s still going
  • Log shipping gets more than 15 minutes behind

Step 2: Set up a lab to repro those problems on demand. This is actually a great way to learn about these problems, by the way – the more you understand how to create these situations, the better you’ll be at detecting and reacting to them.

Step 3: Download the eval editions of the tools you want. All monitoring software vendors give away short-term (10-15 day) versions. Install & configure them to monitor your test lab.

Step 4: Actively evaluate them. Build a spreadsheet with a column for each monitoring tool, and a group of rows for each failure scenario. For each problem, when you trigger it, document:

My favorite monitoring dashboard. I'm looking for four 7's of uptime.

My favorite monitoring dashboard. I’m looking for four 7′s of uptime.

  • How long it takes to alert you about the correct underlying problem
  • How many false alarms you get (alarms that are unrelated to the real problem)
  • How intuitively obvious the real problem is when looking at the tool’s dashboard (are all the lights flashing red, or is there a single light flashing red exactly where the problem is?)

Step 5: Pick winners and negotiate price. Out of the tools you evaluated, pick at least two that you’re willing to live with. Call each of the vendors and say, “I did a tool evaluation, and it was a tie between you and ___. What’s your best price? I’m going to be asking the other guys too.”

They’re going to want to start talking about value differentiators, like how they’re so much better than the other company because they do ___. Doesn’t matter – you’ve already picked the two tools you’re willing to live with. Let them talk, listen fairly, and then repeat the question: what’s your best price?

You don’t have to pick the cheapest one – there may be one tool you like much more – but at least now you’ve gotten good prices on both, and you can make an informed decision.

Join me in my SQL Server Tools That Cost Money webcast to learn more about what tools are out there and how to evaluate them.

Trace Flags 1117, 1118, and Tempdb Configuration

Unevenly sized tempdb files can cause uneven use - and contention on some critical pages

Unevenly sized tempdb files can cause uneven use – and contention on some critical pages

SQL Server has a few options for configuring tempdb. It’s great that Microsoft gives some advice for how to battle some kinds of contention that can impact tempdb in KB 2154845, but things can still be a little confusing.

Trace Flag 1118 – Full Extents Only

KB 2154845 advises that Trace Flag 1118 can help in some situations. That trace flag tells SQL Server that it should avoid “mixed extents” and use “full extents”. (Learn more about extents here.)

This means that each newly allocated object in every database on the instance gets its own private 64KB of data. Tempdb is usually the place where most objects are created, so it makes the most difference there.

Since this trace flag is advocated in KB 2154845, it’s clearly documented as safe to use. But for one reason or another it hasn’t made its way into the list of Trace Flags in SQL Server Books Online.

Trace Flag 1117 – Grow All Files in a FileGroup Equally

Trace flag 1117 changes the behavior of file growth: if one data file in a filegroup grows, it forces other files in that filegroup to ALSO grow. This can be useful for tempdb, which is commonly configured with multiple data files as KB 2154845 advises.

Not everyone likes to implement this trace flag, particularly because it impacts every database on the instance and not just tempdb. Personally, I prefer to pre-grow tempdb files so they fill the tempdb drive, just leaving room for any “free space monitoring” you have. Whenever I can avoid using a trace flag, I do it: just less room to hit an edge condition.

But Trace Flag 1117 is something that some folks like, and it’s recommended in the Fast Track Architecture Guide from Microsoft.

Trace Flag 1117 is also not in the list of Trace Flags in SQL Server Books Online.

Why Care About Trace Flag Documentation?

Imagine this: a critical database server starts throwing frequent stack dumps which are very hard to interpret. Every time it happens, the instance freezes up. You need to fix it fast. You start up a ticket with Microsoft Support. You want that ticket to move as quickly as it possibly can. And if your manager starts asking about your configuration, it’s really nice to have a single link showing you haven’t turned on anything too weird, right?

Your life is much easier if every trace flag you have enabled is clearly documented as safe to run. A blog post like this, or even a blog post on Technet, doesn’t prove that for you. (Technet’s great, but every blogger is fallible, and there’s some old content out there.)

Trace Flag 1118 is documented in the Tempdb KB, so it’s not all that controversial. But Trace Flag 1117 is more of a judgement call at this point. (It’s in the Fast Track Guide, sure, but are you running Fast Track?)

Clarify the Situation: Vote for My Connect Item!

Both of these trace flags have been around for a while. I suspect that they’re fine to use and that Books Online is just a little out of date. But wouldn’t it be nice to be sure?

Vote for my Connect item to update the Trace Flags page in Books Online and add TF 1117 and TF 1118 to the list of approved trace flags.

Announcing the SQL Server Setup Checklist: Free eBook Download

SQL-Server-Setup-Checklist-Thumbnail

Your new co-pilots for installing SQL Servers

One of the most popular posts on this site of all time is a SQL Server Setup Checklist that Brent started way back in March 2008. Over the years, we’ve updated the post and kept it fresh for installing and configuring SQL Server 2005, 2008, 2008R2, 2012, and 2014.

Now we’ve taken it one step further: we’ve created a free PDF of the setup guide for you. Download it, print it, and check items off as you work your way through your next SQL Server installation. We’ll be your co-pilots making sure you don’t miss a critical step.

The guide helps you:

  • Validate you’ve got the right basic redundancy in your hardware
  • Configure and test HBAs and multi-pathing
  • Configure your system drive, and page file for SQL Server
  • Provision storage for backups
  • Create Service Accounts and grant permissions for settings like Instant File Initialization
  • Format drives for the best performance
  • Set up Anti-Virus with Microsoft recommended best practices
  • Install and configure SQL Server
  • Set up maintenance to run backups, manage index fragmentation, and check for corruption
  • Get basic monitoring and alerting from the SQL Server Agent
  • Design the right number of files and layout for tempdb
  • Copy SQL Server logins from other servers
  • Install free tools and run a health check against SQL Server

Download the free SQL Server Setup Checklist e-Book here.

In-Memory Analytics: What’s Different Between SQL Server and Oracle

Pick a memory, any memory.

Pick a memory, any memory.

Both Microsoft and Oracle have released what they call “in-memory analytics” features. Microsoft started two years ago with SQL Server 2012 and Oracle is following suit next month (July 2014) with the Oracle In-Memory Analytics feature. Most of us won’t pick our database platform based on just one feature, but it’s important to know the difference between these two so you can have a better discussion around the technical choices that you’re making.

SQL Server Columnstore Indexes

When they first arrived, SQL Server columnstore indexes were touted as memory optimized. By using a columnar data structure and columnar compression techniques, SQL Server columnstore indexes are able to get tremendous improvements in performance.

In effect, SQL Server columnstore indexes are “in-memory” because of the effects of columnar compression. The compression means that the index size is significantly smaller. It’s likely that the whole index can be cached in memory and will stay in memory because the index smaller.

By thinking smarter, Microsoft created a technology that allowed BI professionals to achieve significant speeds on data warehouses. There was one problem: the SQL Server 2012 columnstore was read-only.

SQL Server 2014 columnstore fixed the read-only problem by introducing a writable clustered columnstore. When the index is created as a clustered columnstore index (it’s the only index on the table) SQL Server will use a writable deltastore to hold data. This advantage over SQL Server 2012 has prompted many teams to migrate to SQL Server 2014.

The downside of both SQL Server columnstore implementations is that they’re designed for pure data warehouse workloads. If you wanted to combine an OLTP workload with a data warehouse workload on the same system, you’d have some work head of you to build something that would work with both regular indexes and columnstore indexes. At its core, SQL Server columnstore technology is a data warehouse technology.

Oracle Database In-Memory

Oracle Database In-Memory takes a different approach to SQL Server. The Oracle feature is designed to be a transparent drop-in. Rather than require a separate physical structure on disk, a DBA will activate the in-memory feature for a given table and wait as Oracle does the rest.

Oracle does this by creating a dual-format data structure. As data is read and written to tables using the in-memory feature, the database builds columnar indexes in-memory and on-the-fly. Yes, this means that you’ll need more memory in your database servers. Many Oracle customers use Oracle RAC for scale-out performance improvements. Dedicating one RAC node to reporting isn’t unheard of and can be used to alleviate the pressure caused by holding a second, albeit compressed, copy of a table or index in memory.

Once the row-level data has been merged into the columnar index, Oracle uses compression to further improve performance. Just like other columnar database engines, compression vastly improves the performance of queries.

Oracle’s offering doesn’t persist the in-memory analytic indexes to disk – they’re purely in-memory indexes. This avoids the overhead of logging and persisting them to disk. The double write hit is avoided. However, there’s one problem with this technology – restarts. Whenever the database engine is restarted, the in-memory analytic indexes will have to be rebuilt from disk.

Technical details about the Oracle offering are sparse – we can expect to learn more after Oracle has released the feature to the public.

Comparing In-Memory Analytics

While both companies have produced an in-memory analytic feature of one form or another, direct comparisons are difficult. Microsoft SQL Server columnstore indexes are designed to improve pure data warehouse implementations. Oracle Database In-Memory, on the other hand, is intended to help DBAs get the most out of the hardware they already have. Many business users want to report on live data and this feature makes live reporting a possibility.

How to Use Our HA/DR Planning Worksheets [Video]

You’re in charge of making sure the database never goes down, but your managers won’t give you any budget. You’re tired of feeling guilty like it’s your fault there’s an outage – and now there’s help. Microsoft Certified Master Brent Ozar helps businesses understand that they have to pay the piper if they want high availability. He’ll share his simple two-page worksheet to get the business users on your side in this 30-minute webcast.

Get the worksheets here.

For more videos like this:

css.php