When learning how Read Committed Snapshot Isolation works in SQL Server, it can be a little tricky to understand how writes behave. The basic way I remember this is “Readers don’t block writers, writers don’t block readers, but writers still block writers.”
But that’s not so easy to understand. Let’s take a look at a simple test showing what it means that “writers still block writers”.
Create our simple test schema
We’re doing a super simple test: we just need a database with READ_COMMITTED_SNAPSHOT enabled and a table with a single row.
CREATE DATABASE RCSI_Updates GO ALTER DATABASE RCSI_Updates SET READ_COMMITTED_SNAPSHOT ON GO USE RCSI_Updates GO CREATE TABLE dbo.UpdateTest ( i int identity primary key, Color varchar(256) not null ); GO INSERT dbo.UpdateTest (Color) VALUES ('White'); GO
Start one transaction that does an update (Session #1)
An update begins! To test this out, we’re doing an explicit BEGIN TRAN so that our update will hold locks. Run this in one session in SQL Server Management Studio:
BEGIN TRAN UPDATE dbo.UpdateTest SET Color='Red' WHERE Color='White'; /* Transaction left open on purpose*/
Another update begins! But it’s blocked. (Session #2)
Before that update transaction complete, another session comes in. Just open up another session window in SSMS and run this:
UPDATE dbo.UpdateTest SET Color='Black' WHERE Color='White'; GO
This won’t complete right away, because it’s blocked by your first session.
Back in Session #1, run:
When Session #1 commits, it finishes up and cleans up its lock. Session #2 is also able to finish, and in its output you’ll see:
(0 row(s) affected)
Let’s check the rows in the table:
SELECT * FROM dbo.UpdateTest; GO
Here’s how things happened:
- Session #1 locked the row and turned the marble red.
- Session #2 was also doing an update, and it didn’t get to the row until Session #1 released the lock.
- At that point it found zero rows to update, because the color had already changed.
If You Wrote the Update Differently, You Might Get a Different outcome…
Remember, writers do NOT block readers under read committed snapshot isolation. What if someone wrote this as a two part operation (instead of a single update statement)?
CREATE TABLE #marbles (i int) INSERT #marbles (i) SELECT i FROM dbo.UpdateTest WHERE Color='White'; UPDATE dbo.UpdateTest SET Color='Black' FROM dbo.UpdateTest u JOIN #marbles m on u.i=m.i; GO
When we check how things turned out…
In this case, Session #2 wasn’t completely blocked! Things happened differently:
- Session #1 locks the row, that’s the same
- Session #2 is doing a SELECT statement, so it uses the version store and isn’t blocked. It’s able to populate the temp table.
- But Session #2 becomes blocked at the point it runs the UPDATE.
- When Session #1 completes, Session #2 is unblocked and updates using the data it’s cached in its temp table.
- Session #2 changes the color of the marble.
I’m not saying that it’s always better to do a single UPDATE or that it’s always better to use a temp table. You just have to know how they behave so you can pick the right code for your situation.
If you need to use a temporary table for performance reasons, but want your read to be blocked if any writer has a lock, you can achieve that by using locking hints. Which might cause a lot of blocking, just like anytime you use locking hints.
What About Snapshot Isolation?
In this post we’ve looked at Read Committed Snapshot Isolation. There’s another isolation level, SNAPSHOT, which behaves a bit differently. If you want to test that out, you can modify the code samples above to reproduce an “update conflict” with both of the sessions using SNAPSHOT isolation.
This May Sound Complicated, But It’s Not That Bad
When you talk about any isolation level in detail in SQL Server, things get complicated. I don’t think that Read Committed Snapshot Isolation is any more complicated than plain old Read Committed, it’s just that people don’t think too much about Read Committed, since it’s the default!
A question came up in our company chat room the other day: does the forced parameterization database setting help with plan cache bloat caused by dynamic comments or whitespace?
I love the topic of parameterization in SQL Server, probably because it’s one of those things that’s really pretty weird (but seems like it’d be straightforward). So I immediately wanted to test it out and see if SQL Server would do what I thought it would do.
Parameterization reduces compilation and plan cache bloat
For anything but the simplest queries, SQL Server takes literal values in your query literally. If you don’t parameterize the query, SQL Server won’t do it for you. That means that each of these three queries will get its own execution plan (even if the execution plans are all identical):
SELECT TOP 1 Id INTO #byebye FROM dbo.Posts WHERE Id=1 GO SELECT TOP 1 Id INTO #byebye FROM dbo.Posts WHERE Id=2 GO SELECT TOP 1 Id INTO #byebye FROM dbo.Posts WHERE Id=3 GO
If you do this a lot, it can impact performance, particularly in a busy transactional database. It costs CPU to compile each query and SQL Server has to work much harder to manage an execution plan cache which is churning through frequent queries that are never-reused.
(For super-simple queries, simple parameterization may be applied. This is also sometimes called “auto parameterization” by people who worked with SQL Server 2000.)
White space, formatting, and comments can increase compilation and plan cache bloat for parameterized queries
I’m going to use spaces as examples in this post, but basically this is just a shorthand for talking about any variance in the syntax for executing a query.
Basically, these three queries will each get their own execution plan in SQL Server because the string being executed is different– even though it’s only different inside of a comment bracket, and even though the difference is simply the number of spaces in that comment:
SELECT TOP 1 Id /* Comment = */ INTO #byebye FROM dbo.Posts WHERE Id=@Id GO SELECT TOP 1 Id /* Comment = */ INTO #byebye FROM dbo.Posts WHERE Id=@Id GO SELECT TOP 1 Id /* Comment = */ INTO #byebye FROM dbo.Posts WHERE Id=@Id GO
The differing format in the whitespace give you just as much compilation burn and plan cache bloat as if the query were not parameterized.
‘Forced parameterization’ reduces execution plan bloat for non-parameterized queries
You can’t always re-write your whole application. The ‘forced parameterization’ database setting can help out sometimes — it tells SQL Server to look for those literal values and to try to treat them like parameters. This database setting can dramatically reduce compilation work by enabling execution plan reuse in some databases.
Like anything else, it has some limitations:
- It impacts the whole database. If you only want to do individual queries, you have to use Plan Guides.
- It won’t do anything if the database compatibility level is set to 80. (But plan guides will work.)
The Question: Does the ‘forced parameterization’ setting help with whitespace in query strings?
If SQL Server is smart enough to find those parameters in the string and remove them, shouldn’t it be smart enough to find the whitespace / dynamic comments and ignore them, too?
To figure this out I wrote up some test TSQL and executed it against a restored copy of the StackOverflow database.
Forced parameterization test: non-parameterized queries
For this test, I set up some simple code to execute non-parameterized dynamic SQL which also had different whitespace for each query. I ran the query 1,000 times in a loop, and then checked the execution plan cache to see how many copies of the query I had. Here’s what the code looks like:
/* This code is for test systems only, not production */ ALTER DATABASE StackOverflow SET PARAMETERIZATION FORCED; GO USE StackOverflow; GO DBCC FREEPROCCACHE; GO SET NOCOUNT ON; DECLARE @dsql NVARCHAR(MAX)= N'SELECT TOP 1 Id /* Comment = %REPLACE-COMMENT% */ INTO #byebye FROM dbo.Posts WHERE Id=%REPLACE-ID%', @dsql2 NVARCHAR(MAX)= N'', @i INT = 1; WHILE @i < 1001 BEGIN SET @dsql2=REPLACE(@dsql,N'%REPLACE-COMMENT%',REPLICATE(N' ',@i)); SET @dsql2=REPLACE(@dsql2,N'%REPLACE-ID%',@i); PRINT @dsql2; EXEC sp_executesql @dsql2; SET @i=@i+1; END SELECT TOP 25 qs.[query_hash] , COUNT(DISTINCT plan_handle) AS number_of_plans , COUNT(DISTINCT query_plan_hash) as distinct_plans, SUM(qs.[execution_count]) AS execution_count FROM sys.dm_exec_query_stats qs GROUP BY qs.[query_hash] ORDER BY number_of_plans DESC; GO
Here’s what the queries looked like:
Here’s how many execution plans I got:
Forced parameterization “worked”, and it overcame the different whitespace for the queries! I had one plan run 1,000 times.
Forced parameterization test: parameterized queries
So, what if you have a parameterized query, but it has a problem with dynamic comments/ whitespace? Will turning on forced parameterization help that, too?
/* This code is for test systems only, not production */ USE master; GO ALTER DATABASE StackOverflow SET PARAMETERIZATION FORCED; GO DBCC FREEPROCCACHE; GO USE StackOverflow; GO SET NOCOUNT ON; DECLARE @dsql NVARCHAR(MAX)= N'SELECT TOP 1 Id /* Comment = %REPLACE-COMMENT% */ INTO #byebye FROM dbo.Posts WHERE Id=@Id', @dsql2 NVARCHAR(MAX)= N'', @i INT = 1; WHILE @i < 1001 BEGIN SET @dsql2=REPLACE(@dsql,N'%REPLACE-COMMENT%',REPLICATE(N' ',@i)); PRINT @dsql2; EXEC sp_executesql @dsql2, N'@Id int', @Id=@i; SET @i=@i+1; END SELECT TOP 25 qs.[query_hash] , COUNT(DISTINCT plan_handle) AS number_of_plans , COUNT(DISTINCT query_plan_hash) as distinct_plans, --MIN(qs.creation_time) AS min_creation_time , --MAX(qs.last_execution_time) AS max_last_execution_time , SUM(qs.[execution_count]) AS execution_count FROM sys.dm_exec_query_stats qs GROUP BY qs.[query_hash] ORDER BY number_of_plans DESC; GO
Here’s what the queries looked like:
Here’s how many execution plans I got:
Oh, ouch. Everything does hash out to the exact same query hash, but I got 1 plan for each execution, 1,000 plans total. For a parameterized query, the ‘Forced Parameterization’ database setting didn’t clean up the dynamic comments / whitespace for me.
Recap: what we saw
The ‘forced parameterization’ database setting is smarter than you might think, but it doesn’t fix every problem.
For our test query, it was able to force parameterize a query even if there were comments with different amounts of whitespace in it.
However, for query that was already parameterized, it didn’t fix the issue of dynamic comments/whitespace causing extra compilation and plan cache bloat.
If you’re a production database administrator responsible for backups, corruption checking, and index maintenance on SQL Server, try Ola Hallengren’s free database maintenance scripts. They’re better than yours (trust me), and they give you more flexibility than built-in maintenance plans.
However, the index maintenance defaults aren’t good for everyone. Here’s how they ship:
CREATE PROCEDURE [dbo].[IndexOptimize] @Databases nvarchar(max), @FragmentationLow nvarchar(max) = NULL, @FragmentationMedium nvarchar(max) = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationHigh nvarchar(max) = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationLevel1 int = 5, @FragmentationLevel2 int = 30,
The defaults on some of these parameters are a little tricky:
When an index is between @FragmentationLevel1 (5%) and @FragmentationLevel2 (30%), the scripts will perform the actions listed in @FragmentationMedium, in order. First it’ll try an index reorganize, which is an online operation. If the index’s design doesn’t allow for that, IndexOptimize will try to rebuild the index online. If the index’s design or your SQL Server edition doesn’t allow for that, it’ll perform the last resort – an offline index rebuild.
When an index has higher fragmentation than @FragmentationLevel2 (30%), the scripts will try the actions listed in @FragmentationHigh – first an online rebuild, then if that’s not possible, an offline rebuild.
These defaults aren’t dangerous or deadly, but they do have drawbacks.
Index maintenance changes data pages and creates transaction log records. This means larger transaction log backups, which at first doesn’t sound too terribly harmful. However, if you’re using database mirroring, transaction log shipping, AlwaysOn Availability Groups, or anything else that copies your logged transactions around, you’ve put yourself behind. In one case, I saw an all-solid-state server generate ~25GB of transaction logs in under 3 minutes – all of which had to be copied out to the various reporting and disaster recovery servers. This creates an RPO/RTO problem without really solving a performance problem.
Index maintenance changes data pages, too. This also sounds harmless, but if you’re doing differential backups or storage replication, boom, you just inflated your storage and bandwidth costs. The larger your database becomes, the more you’ll start to investigate these recovery options, and the more index maintenance starts to backfire.
Offline index rebuilds cause blocking. As low as 5% fragmentation, the script defaults have the potential to take tables offline and block queries. I take kind of an extreme view on this: I would rather not take tables offline when there’s not a human being around to watch what’s happening.
So here’s the defaults I prefer:
- FragmentationLevel1 = 30%
- FragmentationLevel2 = 50%
- FragmentationMedium = ‘INDEX_REORGANIZE,INDEX_REBUILD_ONLINE’
- FragmentationHigh = ‘INDEX_REBUILD_ONLINE’
Note that I raised the fragmentation thresholds by a lot, and I removed the offline rebuild operations. I’m not saying I never rebuild indexes offline – I just don’t want to do that by default, with nobody around. When I’ve got tables that are heavily fragmented, and I’m convinced that a rebuild will solve the problem, AND I can only rebuild them offline, then I want to run the stored procedure manually when I can be around to keep an eye on it.
How to Change the IndexOptimize Defaults
You could just edit the IndexOptimize creation script. This way, whenever you execute it without any parameters, it will inherit the defaults you set at creation time. If you maintain IndexOptimize across a lot of servers, and you keep a single master version in a central location, this works.
But you have to be doggone sure to remember to change those defaults every single time you download a fresh version from Ola. (You won’t. You’re too busy reading blogs.)
The safer way is to leave his script the way it is, but when you work with SQL Agent jobs to run IndexOptimize, change the job’s settings.
You should probably be doing this anyway because you should create two separate Agent jobs: one to do your normal lightweight optimizations, and a second one for deeper optimizations (like with offline rebuilds). Some folks like to have weekday jobs that only do low-impact maintenance techniques for highly fragmented tables (like if something manages to hit 80% fragmentation on a Tuesday, go ahead and rebuild it online) followed by more invasive techniques on the weekends.
Just keep your backup schedule in mind – if you do full backups weekly on Saturday, followed by differential backups the rest of the week, don’t do your index rebuilds on Sunday. That instantly has the effect of inflating your differential backups all week long. Do the index maintenance first (like Saturday morning) followed by the full backup Saturday evening.
Words are hard. Acronyms are even harder. Here’s the words that give me the most trouble when talking to people about SQL Server. And thanks to all my Twitter friends quoted here for their suggestions!
“I have a question about replication…”
“Which kind? Transactional? Merge? Peer to Peer?”
Technically, the word “replication” refers to very specific features to publish and subscribe to data at the table level in SQL Server. But users commonly use the word as a general term to cover other technologies used to replicate data between SQL Servers (see there, I just said “replicate”), so you never quite know what anyone means at first without clarifying.
— Robert L Davis (@SQLSoldier) November 25, 2014
“We use snapshots for that.”
“Oh, a database snapshot? Or a replication snapshot?”
“Oh, no, we do it with the SAN.”
The word snapshot is super-popular when it comes to SQL Server technologies. It’s even the name of an isolation level.
“We’ve got a SQL Server on that cluster.”
“Oh, a Windows failover cluster?”
“Sort of? It’s virtualization.”
You can cluster in many different ways. You may have a cluster of virtualization hosts. Or you could have SQL Server installed in a Windows Failover Cluster. Or you could have both!
Cluster + AG
“We’ve got a cluster and an AG.”
“Is the cluster in the AG?”
“Yes, on all three servers.”
“Cluster” is more confusing than ever, because AlwaysOn Availability Groups require a Windows Failover Cluster. And you might combine a Windows Failover Cluster Instance with your AlwaysOn Availability Group, so there’s a failover cluster and… a cluster. Yeah, there’s a proper way to use these terms (and their related acronyms), but practically speaking nobody understands them. When things get confusing, just surrender and draw a picture. It’s easier.
“We’ve got three SQL Servers in a HAG.”
“In a what?”
I’m not sure why, but some Microsoft blogs and posts started using the acronym HAG for “High Availability Group.” Most SQL Server posts just use the shorter acronym “AG” for “Availability Group”, or spell out AlwaysOn Availability Groups.
“Truncating the log”
“We truncate the log every hour.”
“Oh. What command do you use to run that?”
“Let me look. Oh, it’s BACKUP LOG.”
“Are you using TRUNCATE_ONLY?”
“Truncating the log” is an older phrase that still sticks around. People start saying “truncate” instead of “backup”. They mean completely different things — and it’s really important to clarify what’s being run.
— Joey D'Antoni (@jdanton) November 25, 2014
“Have you enabled the DAC?”
“No, we don’t use DAC packages. That feature was terrible.”
“No, I mean the Dedicated Admin Connection.”
— Michael J Swart (@MJSwart) November 25, 2014
Histogram: A target for Extended Events in SQL Server 2012 or later. (But if you’re using an earlier version of Extended Events, it’s called a “Bucketizer”.)
Histogram: A table showing the distribution of values used in a statistic.
These are completely unrelated. Although I guess you could probably use an Extended Events histogram target to run a trace showing if a procedure recompiled because statistics updated on a table (which means that histogram would have changed). I have no idea if that sentence even makes any sense.
Speaking of statistics…
“Let’s look at the stats for that query.”
“You mean the statistics on the indexes?”
“No, the statistics in dm_exec_query_stats.”
I have a huge problem with this word when I’m teaching. I frequently have to use ‘statistics’, and make sure that what I’m saying isn’t totally misleading because the word can have multiple meanings.
“In memory” data
— Steinar Andersen (@SQLSteinar) November 25, 2014
“We’re really interested in keeping the data ‘In Memory’.”
“Do you mean by using Hekaton?”
“No, we mean by just adding memory to the server.”
It could be worse…
@Kendra_Little The alternative is to go nosql. Give *everything* a new name: I used floop to skadoosh my bazinga cluster and it schlemozzled
— Michael J Swart (@MJSwart) November 25, 2014
Until we have perfect terms, I’ll be here, using the DAC to take a snapshot of your statistics on the cluster.
Index maintenance is confusing. Should you use maintenance plans or customized SQL Agent jobs? Join Kendra to learn the difference between index ‘rebuild’ and ‘reorganize’ commands, and how to avoid the pitfalls of maintenance plans in SQL Server.
It’s a bummer that SQL Server 2012 Standard Edition only allows you 64GB of memory. And it’s good news that the limit goes up to 128GB of memory in SQL Server 2014.
But are you getting as much memory as you can out of Standard Edition?
Check out Microsoft KB 2663912:
Starting with SQL Server 2012, these memory limits are enforced only for the database cache (buffer pool). The rest of the caches in the SQL Server memory manager can use much more memory than is specified by these edition limits. …. Other caches (such as the procedure cache, the metadata cache, and so on) can consume memory up to the size specified by the “max server memory” configuration.
This means that if you’re using Standard Edition of SQL Server 2012 or higher, you should be setting your max server memory higher than the “limit” — because you can use more than you might think!
So if I need to design a VM for a SQL Server that has 300 GB of data and is using SQL Server 2012 Standard Edition with its 64GB limit for data cache, to get the biggest benefit of memory for my licensing dollar I would want something like:
- 96GB of memory for the VM
- SQL Server “max server memory (MB)” set at 88064 (86GB)
Different workloads will use different amounts of memory in non-buffer pool caches — and not all workloads need 64GB of data cache, of course. But for those that do, don’t think “max server memory (MB)” is the cap on all your memory — for SQL Server 2012 and higher, it’s the cap for your buffer pool.
Every now and then I run across an automated script that does something a little suspicious. I’m not saying you should never put these things into a script, but if you do, seriously think about the surprise gotchas when someone runs the script:
- DBCC commands other than CHECKDB
- sp_configure (and especially RECONFIGURE afterwards)
- ALTER SERVER CONFIGURATION
- ALTER DATABASE
- ALTER AVAILABILITY GROUP
- CREATE INDEX or DROP INDEX
- And most of the database engine management stored procedures
If you’re doing any of these on a scheduled basis, take a few minutes to document what you’re doing, why, and whether it’s safe to stop doing it. Your successor will thank you.
Believe me, otherwise she’s going to throw you under the bus when you’re gone, and if you haven’t left something behind to defend you, you’re going to look like That Guy.
As a developer, one of the things I can no longer live without is a date table. Who wants to type DATENAME and DATEPART over and over again? Not this guy, and once you have a date table, you won’t want to either.
In this 16-minute video, I’ll give you four reasons why you should stop writing so many date functions and concatenations, and start looking them up in a table instead.
For questions & answers about these, tune in to our Tuesday webcast.
In this video, Doug talks about how changing a column’s data type in SQL Server can be dangerous when there are Access clients using it, and a reference table to help avoid type mismatches when planning data type changes.
Click here to see the list of SQL Server data types according to Access.
The best part about transactional replication is that it’s flexible. And the worst thing about transactional replication is that it’s flexible.
A poor replication configuration will cause you performance pains and increase your risks of development and administrative errors. Here’s one important design principle: isolate your replication subscriber tables from one another, and from non-subscriber tables.
Our example environment
We’ve got an environment with two SQL Server databases used for OLTP purposes: SalesDB and CRMDB. Each database uses SQL Server Standard Edition. The databases support different applications, and they’re on totally separate SQL Server instances (and failover clusters).
The SalesDB and CRMDB support completely different applications, but data from selected tables in them both is needed for internal reports. SQL Server transactional replication has been chosen as the technology to keep a copy of the data up to date on the reporting instance. Replication is already in use in the environment and a dedicated SQL Server has been configured for distribution.
There’s still choices to make: how do you configure the subscriber? My recommendation is that each replication subscription be isolated in individual databases. Let’s take a look at why.
The all-in-one subscriber
In this model, we have one big database named ReportDB on the Warehouse01 subscriber server. This database holds reporting tables as well as tables from replication subscriptions.
SQL Server will let you do this, but it’s got some downsides:
1. You’ve got a single transaction log (and a single recovery model) for ReportDB. Not all of the data in ReportDB may be recoverable from other sources. If for any reason you need to be in the FULL recovery model for ReportDB, you’re in a sticky situation if you have to reinitialize replication and reload the subscriber tables: you’ve got to do it fully logged, and your transaction log backups will be bigger and slower.
2. It’s easy for DBAs and Developers to screw this up. You may want to drop all the replication subscriber tables at some point if you have to re-initialize replication. I would much rather isolate drop table commands to a database where everything in the database is replication related, and I don’t have to worry about accidentally dropping the wrong thing! Similarly, it’s much easier to set up security so that application accounts can only read subscriber tables (not write to them), when they’re in their own database.
3. You’ve got fewer options when it comes to reinitialization. The ability to initialize a replication subscriber from a restored backup is huge– running snapshots on your publishers makes your users sad.
Here’s the alternate model, where each subscriber has its own database:
Note that the subscriber databases are named in a way that you can identify their role. (As noted by David in the comments, it’s much easier if you don’t re-use the name of a publishing database.)
This model has a lot of benefits:
- More options to re-initialize
- Supports different recovery models on replication subscriber databases and ReportDB (which can allow minimal logging during bulk inserts if you’re re-initializing that way)
- Safer for administrators
- Easier to set up security
- Easier for new DBAs to and developers to understand (the architecture is much more transparent)
What If I’m Not Doing it This Way? Do I Have to Change All My Code?
Not necessarily. I’ve worked with this with two different models.
Synonyms: ReportDB could contain synonyms for all the replication subscriber tables. This makes the tables seem like they’re in ReportDB, while they’re actually in other databases.
Dynamically configured procedures. In one environment I worked in where replication was common, there was a configuration table where you configured the name of the replication subscriber databases needed by an application. When code was deployed, the installation process for stored procedures checked this configuration table and dynamically compiled the stored procedures referencing the name of the database in that environment. This sounds like more work, but it was very useful for the developers, who might need to deploy builds to instances configured differently than production.
Want To Learn More About Replication?
We’ve got you covered. Start with these articles:
- Introduction to SQL Server Transactional Replication
- Performance Tuning SQL Server Transactional Replication
- Monitoring SQL Server Transactional Replication