Blog

A question came up in our company chat room the other day: does the forced parameterization database setting help with plan cache bloat caused by dynamic comments or whitespace?

I love the topic of parameterization in SQL Server, probably because it’s one of those things that’s really pretty weird (but seems like it’d be straightforward). So I immediately wanted to test it out and see if SQL Server would do what I thought it would do.

Parameterization reduces compilation and plan cache bloat

For anything but the simplest queries, SQL Server takes literal values in your query literally. If you don’t parameterize the query, SQL Server won’t do it for you. That means that each of these three queries will get its own execution plan (even if the execution plans are all identical):

SELECT TOP 1 Id INTO #byebye FROM dbo.Posts WHERE Id=1
GO
SELECT TOP 1 Id INTO #byebye FROM dbo.Posts WHERE Id=2
GO
SELECT TOP 1 Id INTO #byebye FROM dbo.Posts WHERE Id=3
GO

If you do this a lot, it can impact performance, particularly in a busy transactional database. It costs CPU to compile each query and SQL Server has to work much harder to manage an execution plan cache which is churning through frequent queries that are never-reused.

(For super-simple queries, simple parameterization may be applied. This is also sometimes called “auto parameterization” by people who worked with SQL Server 2000.)

White space, formatting, and comments can increase compilation and plan cache bloat for parameterized queries

I’m going to use spaces as examples in this post, but basically this is just a shorthand for talking about any variance in the syntax for executing a query.

Basically, these three queries will each get their own execution plan in SQL Server because the string being executed is different– even though it’s only different inside of a comment bracket, and even though the difference is simply the number of spaces in that comment:

SELECT TOP 1 Id /* Comment =   */ INTO #byebye FROM dbo.Posts WHERE Id=@Id
GO
SELECT TOP 1 Id /* Comment =    */ INTO #byebye FROM dbo.Posts WHERE Id=@Id
GO
SELECT TOP 1 Id /* Comment =     */ INTO #byebye FROM dbo.Posts WHERE Id=@Id
GO

The differing format in the whitespace give you just as much compilation burn and plan cache bloat as if the query were not parameterized.

‘Forced parameterization’ reduces execution plan bloat for non-parameterized queries

You can’t always re-write your whole application. The ‘forced parameterization’ database setting can help out sometimes — it tells SQL Server to look for those literal values and to try to treat them like parameters. This database setting can dramatically reduce compilation work by enabling execution plan reuse in some databases.

Like anything else, it has some limitations:

  • It impacts the whole database. If you only want to do individual queries, you have to use Plan Guides.
  • It won’t do anything if the database compatibility level is set to 80. (But plan guides will work.)

The Question: Does the ‘forced parameterization’ setting help with whitespace in query strings?

If SQL Server is smart enough to find those parameters in the string and remove them, shouldn’t it be smart enough to find the whitespace / dynamic comments and ignore them, too?

To figure this out I wrote up some test TSQL and executed it against a restored copy of the StackOverflow database.

Forced parameterization test: non-parameterized queries

For this test, I set up some simple code to execute non-parameterized dynamic SQL which also had different whitespace for each query. I ran the query 1,000 times in a loop, and then checked the execution plan cache to see how many copies of the query I had. Here’s what the code looks like:

/* This code is for test systems only, not production */
ALTER DATABASE StackOverflow SET PARAMETERIZATION FORCED;
GO

USE StackOverflow;
GO

DBCC FREEPROCCACHE;
GO

SET NOCOUNT ON;

DECLARE
    @dsql NVARCHAR(MAX)= N'SELECT TOP 1 Id /* Comment = %REPLACE-COMMENT% */ INTO #byebye FROM dbo.Posts WHERE Id=%REPLACE-ID%',
    @dsql2 NVARCHAR(MAX)= N'',
    @i INT = 1;

WHILE @i < 1001
BEGIN
    SET @dsql2=REPLACE(@dsql,N'%REPLACE-COMMENT%',REPLICATE(N' ',@i));
    SET @dsql2=REPLACE(@dsql2,N'%REPLACE-ID%',@i);

    PRINT @dsql2;
    EXEC sp_executesql @dsql2;

    SET @i=@i+1;
END

SELECT TOP 25
	qs.[query_hash] ,
	COUNT(DISTINCT plan_handle) AS number_of_plans ,
	COUNT(DISTINCT query_plan_hash) as distinct_plans,
	SUM(qs.[execution_count]) AS execution_count
FROM     sys.dm_exec_query_stats qs
GROUP BY qs.[query_hash]
ORDER BY number_of_plans DESC;
GO

Here’s what the queries looked like:

Forced-Parameterization-Comments-Non-Parameterized-String-Queries

Here’s how many execution plans I got:

Forced-Parameterization-Comments-Non-Parameterized-String

Forced parameterization “worked”, and it overcame the different whitespace for the queries! I had one plan run 1,000 times.

Forced parameterization test: parameterized queries

So, what if you have a parameterized query, but it has a problem with dynamic comments/ whitespace? Will turning on forced parameterization help that, too?

/* This code is for test systems only, not production */
USE master;
GO
ALTER DATABASE StackOverflow SET PARAMETERIZATION FORCED;
GO

DBCC FREEPROCCACHE;
GO

USE StackOverflow;
GO

SET NOCOUNT ON;

DECLARE @dsql NVARCHAR(MAX)= N'SELECT TOP 1 Id /* Comment = %REPLACE-COMMENT% */ INTO #byebye FROM dbo.Posts WHERE Id=@Id',
 @dsql2 NVARCHAR(MAX)= N'',
 @i INT = 1;

WHILE @i < 1001
BEGIN
 SET @dsql2=REPLACE(@dsql,N'%REPLACE-COMMENT%',REPLICATE(N' ',@i));

 PRINT @dsql2;
 EXEC sp_executesql @dsql2, N'@Id int', @Id=@i;

 SET @i=@i+1;
END

SELECT TOP 25
 qs.[query_hash] ,
 COUNT(DISTINCT plan_handle) AS number_of_plans ,
 COUNT(DISTINCT query_plan_hash) as distinct_plans,
 --MIN(qs.creation_time) AS min_creation_time ,
 --MAX(qs.last_execution_time) AS max_last_execution_time ,
 SUM(qs.[execution_count]) AS execution_count
FROM sys.dm_exec_query_stats qs
GROUP BY qs.[query_hash]
ORDER BY number_of_plans DESC;
GO

Here’s what the queries looked like:
Forced-Parameterization-Comments-Parameterized-String-Queries

Here’s how many execution plans I got:

Forced-Parameterization-Comments-Parameterized-String

Oh, ouch. Everything does hash out to the exact same query hash, but I got 1 plan for each execution, 1,000 plans total. For a parameterized query, the ‘Forced Parameterization’ database setting didn’t clean up the dynamic comments / whitespace for me.

Recap: what we saw

The ‘forced parameterization’ database setting is smarter than you might think, but it doesn’t fix every problem.

For our test query, it was able to force parameterize a query even if there were comments with different amounts of whitespace in it.

However, for query that was already parameterized, it didn’t fix the issue of dynamic comments/whitespace causing extra compilation and plan cache bloat.

5 comments ↑ Back to top

If you’re a production database administrator responsible for backups, corruption checking, and index maintenance on SQL Server, try Ola Hallengren’s free database maintenance scripts. They’re better than yours (trust me), and they give you more flexibility than built-in maintenance plans.

However, the index maintenance defaults aren’t good for everyone. Here’s how they ship:

CREATE PROCEDURE [dbo].[IndexOptimize]
@Databases nvarchar(max),
@FragmentationLow nvarchar(max) = NULL,
@FragmentationMedium nvarchar(max) = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh nvarchar(max) = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationLevel1 int = 5,
@FragmentationLevel2 int = 30,

The defaults on some of these parameters are a little tricky:

When an index is between @FragmentationLevel1 (5%) and @FragmentationLevel2 (30%), the scripts will perform the actions listed in @FragmentationMedium, in order. First it’ll try an index reorganize, which is an online operation. If the index’s design doesn’t allow for that, IndexOptimize will try to rebuild the index online. If the index’s design or your SQL Server edition doesn’t allow for that, it’ll perform the last resort – an offline index rebuild.

When an index has higher fragmentation than @FragmentationLevel2 (30%), the scripts will try the actions listed in @FragmentationHigh – first an online rebuild, then if that’s not possible, an offline rebuild.

These defaults aren’t dangerous or deadly, but they do have drawbacks.

Index maintenance changes data pages and creates transaction log records. This means larger transaction log backups, which at first doesn’t sound too terribly harmful. However, if you’re using database mirroring, transaction log shipping, AlwaysOn Availability Groups, or anything else that copies your logged transactions around, you’ve put yourself behind. In one case, I saw an all-solid-state server generate ~25GB of transaction logs in under 3 minutes – all of which had to be copied out to the various reporting and disaster recovery servers. This creates an RPO/RTO problem without really solving a performance problem.

Index maintenance changes data pages, too. This also sounds harmless, but if you’re doing differential backups or storage replication, boom, you just inflated your storage and bandwidth costs. The larger your database becomes, the more you’ll start to investigate these recovery options, and the more index maintenance starts to backfire.

Offline index rebuilds cause blocking. As low as 5% fragmentation, the script defaults have the potential to take tables offline and block queries. I take kind of an extreme view on this: I would rather not take tables offline when there’s not a human being around to watch what’s happening.

So here’s the defaults I prefer:

  • FragmentationLevel1 = 30%
  • FragmentationLevel2 = 50%
  • FragmentationMedium = ‘INDEX_REORGANIZE,INDEX_REBUILD_ONLINE’
  • FragmentationHigh = ‘INDEX_REBUILD_ONLINE’

Note that I raised the fragmentation thresholds by a lot, and I removed the offline rebuild operations. I’m not saying I never rebuild indexes offline – I just don’t want to do that by default, with nobody around. When I’ve got tables that are heavily fragmented, and I’m convinced that a rebuild will solve the problem, AND I can only rebuild them offline, then I want to run the stored procedure manually when I can be around to keep an eye on it.

How to Change the IndexOptimize Defaults

You could just edit the IndexOptimize creation script. This way, whenever you execute it without any parameters, it will inherit the defaults you set at creation time. If you maintain IndexOptimize across a lot of servers, and you keep a single master version in a central location, this works.

But you have to be doggone sure to remember to change those defaults every single time you download a fresh version from Ola. (You won’t. You’re too busy reading blogs.)

The safer way is to leave his script the way it is, but when you work with SQL Agent jobs to run IndexOptimize, change the job’s settings.

You should probably be doing this anyway because you should create two separate Agent jobs: one to do your normal lightweight optimizations, and a second one for deeper optimizations (like with offline rebuilds). Some folks like to have weekday jobs that only do low-impact maintenance techniques for highly fragmented tables (like if something manages to hit 80% fragmentation on a Tuesday, go ahead and rebuild it online) followed by more invasive techniques on the weekends.

Just keep your backup schedule in mind – if you do full backups weekly on Saturday, followed by differential backups the rest of the week, don’t do your index rebuilds on Sunday. That instantly has the effect of inflating your differential backups all week long. Do the index maintenance first (like Saturday morning) followed by the full backup Saturday evening.

18 comments ↑ Back to top

Words are hard. Acronyms are even harder. Here’s the words that give me the most trouble when talking to people about SQL Server. And thanks to all my Twitter friends quoted here for their suggestions!

Replication

“I have a question about replication…”

“Which kind? Transactional? Merge? Peer to Peer?”

“Availability Groups.”

Technically, the word “replication” refers to very specific features to publish and subscribe to data at the table level in SQL Server. But users commonly use the word as a general term to cover other technologies used to replicate data between SQL Servers (see there, I just said “replicate”), so you never quite know what anyone means at first without clarifying.

Snapshot

“We use snapshots for that.”

“Oh, a database snapshot? Or a replication snapshot?”

“Oh, no, we do it with the SAN.”

The word snapshot is super-popular when it comes to SQL Server technologies. It’s even the name of an isolation level.

Cluster

“We’ve got a SQL Server on that cluster.”

“Oh, a Windows failover cluster?”

“Sort of? It’s virtualization.”

You can cluster in many different ways. You may have a cluster of virtualization hosts. Or you could have SQL Server installed in a Windows Failover Cluster. Or you could have both!

Cluster + AG

“We’ve got a cluster and an AG.”

“Is the cluster in the AG?”

“Yes, on all three servers.”

“Cluster” is more confusing than ever, because AlwaysOn Availability Groups require a Windows Failover Cluster. And you might combine a Windows Failover Cluster Instance with your AlwaysOn Availability Group, so there’s a failover cluster and… a cluster. Yeah, there’s a proper way to use these terms (and their related acronyms), but practically speaking nobody understands them. When things get confusing, just surrender and draw a picture. It’s easier.

HAG

“We’ve got three SQL Servers in a HAG.”

“In a what?”

I’m not sure why, but some Microsoft blogs and posts started using the acronym HAG for “High Availability Group.” Most SQL Server posts just use the shorter acronym “AG” for “Availability Group”, or spell out AlwaysOn Availability Groups.

“Truncating the log”

“We truncate the log every hour.”

“Oh. What command do you use to run that?”

“Let me look. Oh, it’s BACKUP LOG.”

“Are you using TRUNCATE_ONLY?”

“No way!”

“Truncating the log” is an older phrase that still sticks around. People start saying “truncate” instead of “backup”. They mean completely different things — and it’s really important to clarify what’s being run.

DAC

“Have you enabled the DAC?”

“No, we don’t use DAC packages. That feature was terrible.”

“No, I mean the Dedicated Admin Connection.”

Histogram

Histogram: A target for Extended Events in SQL Server 2012 or later. (But if you’re using an earlier version of Extended Events, it’s called a “Bucketizer”.)

Histogram: A table showing the distribution of values used in a statistic.

These are completely unrelated. Although I guess you could probably use an Extended Events histogram target to run a trace showing if a procedure recompiled because statistics updated on a table (which means that histogram would have changed). I have no idea if that sentence even makes any sense.

Statistics

Speaking of statistics…

“Let’s look at the stats for that query.”

“You mean the statistics on the indexes?”

“No, the statistics in dm_exec_query_stats.”

I have a huge problem with this word when I’m teaching. I frequently have to use ‘statistics’, and make sure that what I’m saying isn’t totally misleading because the word can have multiple meanings.

“In memory” data

“We’re really interested in keeping the data ‘In Memory’.”

“Do you mean by using Hekaton?”

“No, we mean by just adding memory to the server.”

It could be worse…

Until we have perfect terms, I’ll be here, using the DAC to take a snapshot of your statistics on the cluster.

11 comments ↑ Back to top

Index maintenance is confusing. Should you use maintenance plans or customized SQL Agent jobs? Join Kendra to learn the difference between index ‘rebuild’ and ‘reorganize’ commands, and how to avoid the pitfalls of maintenance plans in SQL Server.

Interested in trying out sp_BlitzTrace™ yourself? Download it in our free SQL Server download pack.

You can also read more about rebuild vs reorganize, or sign up for our online course, Database Maintenance Starter Kit for $149.

8 comments ↑ Back to top

You may have heard the term “commodity hardware” thrown around when describing solutions like Redis, Elasticsearch, or kCura’s new Data Grid.

Commodity hardware refers to cheap, standardized servers that are easy to buy off the shelf from any vendor. Here’s a typical example of a 2u, 2CPU commodity hardware server:

SYS-1028R-TDWSay we buy a few parts to get ‘er started:

  • Two Intel Xeon E5-2623 v3’s (quad core) – $900 total
  • 128GB RAM (using 8GB DIMMs) – $1,920
  • Two 512GB SSDs for fast storage – $450
  • Six 4TB hard drives for slow storage – $900
  • Grand total: $5,070

Not bad. Want a little more power? Here’s a faster config:

  • Two Intel Xeon E5-2623 v3’s (quad core) – $900 total
  • 256GB RAM (using 16GB DIMMs) – $3,500
  • 8 1TB SSDs – $2,600
  • Grand total: $7,900.

The term “commodity hardware” used to mean really crappy gear, but when you look at these numbers, that’s not necessarily the case anymore. You can build yourself quite the army of pizza boxes.

When vendors say, “You can deploy our solution on commodity hardware,” they’re not saying you’re going to get amazing performance with 16GB of RAM and a couple of spinning rusty frisbees. It’s time to reset your expectations about what commodity means.

8 comments ↑ Back to top
Who needs disk when you have memory?

Who needs disk when you have memory?

It’s a bummer that SQL Server 2012 Standard Edition only allows you 64GB of memory. And it’s good news that the limit goes up to 128GB of memory in SQL Server 2014.

But are you getting as much memory as you can out of Standard Edition?

Check out Microsoft KB 2663912:

Starting with SQL Server 2012, these memory limits are enforced only for the database cache (buffer pool). The rest of the caches in the SQL Server memory manager can use much more memory than is specified by these edition limits. …. Other caches (such as the procedure cache, the metadata cache, and so on) can consume memory up to the size specified by the “max server memory” configuration.

This means that if you’re using Standard Edition of SQL Server 2012 or higher, you should be setting your max server memory higher than the “limit” — because you can use more than you might think!

So if I need to design a VM for a SQL Server that has 300 GB of data and is using SQL Server 2012 Standard Edition with its 64GB limit for data cache, to get the biggest benefit of memory for my licensing dollar I would want something like:

  • 96GB of memory for the VM
  • SQL Server “max server memory (MB)” set at 88064 (86GB)

Different workloads will use different amounts of memory in non-buffer pool caches — and not all workloads  need 64GB of data cache, of course. But for those that do, don’t think “max server memory (MB)” is the cap on all your memory — for SQL Server 2012 and higher, it’s the cap for your buffer pool.

Thanks to Bob Ward , who mentioned this in a recent talk at SQL Intersection. (And I suspect he’s the one behind this great KB.)

21 comments ↑ Back to top

Every now and then I run across an automated script that does something a little suspicious. I’m not saying you should never put these things into a script, but if you do, seriously think about the surprise gotchas when someone runs the script:

  1. DBCC commands other than CHECKDB
  2. sp_configure (and especially RECONFIGURE afterwards)
  3. ALTER SERVER CONFIGURATION
  4. ALTER DATABASE
  5. ALTER AVAILABILITY GROUP
  6. CREATE INDEX or DROP INDEX
  7. KILL
  8. SHUTDOWN
  9. And most of the database engine management stored procedures

If you’re doing any of these on a scheduled basis, take a few minutes to document what you’re doing, why, and whether it’s safe to stop doing it. Your successor will thank you.

Believe me, otherwise she’s going to throw you under the bus when you’re gone, and if you haven’t left something behind to defend you, you’re going to look like That Guy.

16 comments ↑ Back to top

As a developer, one of the things I can no longer live without is a date table. Who wants to type DATENAME and DATEPART over and over again? Not this guy, and once you have a date table, you won’t want to either.

In this 16-minute video, I’ll give you four reasons why you should stop writing so many date functions and concatenations, and start looking them up in a table instead.

For questions & answers about these, tune in to our Tuesday webcast.

20 comments ↑ Back to top

Performance tuning of independent software vendor (ISV) databases is a little tricky.

To understand who does what, let’s think through all of the work required with building and hosting a database application:

Who adds indexes and tunes queries?

Who adds indexes and tunes queries?

Typically the two parties involved – the vendor and the customer – start at opposite ends of the table. The software vendor starts by designing their tables, and gradually works down. The customer starts by buying hardware, and works their way up.

In the middle, things kinda fall apart.

When I was a DBA, I thought it was the vendor’s responsibility to put the right indexes on their tables. It’s their app, right? Aren’t they supposed to know how people query their own app? Why am I responsible for reinventing that wheel? Will I lose support if I add indexes?

Now that I’ve worked with a lot of ISVs, I see things differently. Here are their challenges:

Every customer uses the app differently. Some customers really hammer a particular feature hard, whereas others never use it at all. If you index for every possible feature/query, then the database will be dead slow for delete/update/insert operations.

Customers don’t give feature usage or index usage data back to the ISV. Software vendors have no way of knowing what indexes are working well out in the field, or sometimes even know what queries are running on the server. This comes back to our responsibilities grid at the beginning of the post – often even the customers aren’t running any performance monitoring software on their app at all.

Customers won’t upgrade to current versions quickly. Sometimes the ISV fixes performance by improving queries or adding the right indexes, but they only add it in the current version of the software. It takes development time to backport the fixes to older versions, and software vendors only have so much time. If you ask end users if they’d rather pay for new features or backporting stuff to older versions, they vote with their wallet.

Some customers have really bad DBAs (or none at all). Time and again, I’ve seen “DBAs” add dozens – or in one case hundreds – of indexes to an ISV’s tables in an effort to improve performance. The first few indexes make things better, so they start solving every problem by adding more indexes. You can guess how that ends up. As a result, the ISV’s support team starts making rules like “no customer is allowed to add indexes.”

Building a Successful Relationship With Your ISV

When you install a software vendor’s database, show them the list of responsibilities at the top of this post. Say, “I agree to do the stuff in my column – if you’ve got any guidance on things you’d like me to do for your application, send ‘em over, and I’ll be glad to follow them.” For example, the vendor may have a list of trace flags they’d like to enable for their app.

Then say, “For the stuff in the middle, who’s responsible?”

If the vendor agrees to tune indexes, then you’ll probably need to give them index usage data and plan cache data on a regular basis to help them tune. I like to be my software vendor’s best customer – I’ll say, “I’ll send you the output of sp_BlitzIndex® and sp_BlitzCache® every week or every month, whatever works for you. I’ll be glad to take my time working with you on the output, and together, we can make the software experience awesome for everybody.”

You’ll be surprised by how often they accept because they’re desperate to find customers willing to give them some diagnostic data about how their software works in the wild.

If the vendor says you’re responsible for tuning indexes, then:

  • Instead of dropping their indexes, disable them, and let their support team know which ones you’re disabling
  • Create your own indexes with your company’s initials, like BGO_CustomerNumber_CustomerName, so you can quickly identify which ones are yours versus which shipped with the product
  • Run sp_BlitzIndex® every month to make sure your indexes are actually helping, not hurting
  • When it’s time to deploy a new version of their app, enable the vendor’s indexes again first, drop your own indexes, run their deployment scripts, and then set things back the way you had ‘em (this way you minimize the chance of their deployment scripts breaking)

There’s no right or wrong choice – for example, Sharepoint customers aren’t allowed to touch the database, whereas SAP customers are encouraged to do their own index tuning through the SAP UI. The key is knowing which method your vendor supports right from the start. Otherwise, both sides just assume the other side is taking care of the problem.

24 comments ↑ Back to top

In this video, Doug talks about how changing a column’s data type in SQL Server can be dangerous when there are Access clients using it, and a reference table to help avoid type mismatches when planning data type changes.

Click here to see the list of SQL Server data types according to Access.

4 comments ↑ Back to top
css.php