Blog

Monitoring Snapshot Isolation with Perfmon in SQL Server (video)

Snapshot Isolation can be a great way to reduce lock waits and speed up your SQL Server, but long running transactions could still slow you down. Join Kendra Little to learn how to monitor for these sneaky killers using performance monitor.

Brent says: if you’re a developer and you’re getting started building a new SQL Server application, you should totally check out RCSI as a default setting. Learn more about isolation levels here.

Quiz: Are You the Next Brent Ozar Unlimited Consultant?

Hey SQL Server DBAs — we’re hiring!Help-Wanted-Unicorn

Here’s a quick five question quiz. Give yourself one point for every ‘yes':

  1. Have you been a database administrator for a few years?
  2. Does planning how to keep an application reliable and safe from disasters sound like fun?
  3. Do you love helping people make SQL Server go faster?
  4. Would you like to have a cartoon character of yourself?
  5. Can you see yourself dropping into our Office Hours sessions occasionally to help people for free?

If you got five points, you just might be the next Brent Ozar Unlimited employee!

Here’s what to do next:

Read more about the job here to find out what we’re looking for and what benefits we offer.

Then tell us about you!

We plan to accept applications through Friday, March 6, but don’t wait too long to submit — life moves fast sometimes.

Extended Events Sessions: Messing Up Filters

Recently, I wanted to play around with the auto_stats event against a test system running SQL Server 2014. I ran through the session setup GUI and added the auto_stats event. I configured it with a filter (predicate) to only show me auto_stats event in a database named AutoStatsTest. There’s a cost to events that may fire frequently and a cost to filters, but this is my test box and I was just using this to learn– so no biggie, right?

The wizard worked just fine. It created a session which scripted out as this:

CREATE EVENT SESSION [AutoStatsTest] ON SERVER 
ADD EVENT sqlserver.auto_stats(
    WHERE ([database_name]=N'AutoStatsTest')) 
ADD TARGET package0.event_file(SET filename=N'S:\XEvents\Traces\AutoStatsTest')
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY=5 SECONDS, MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,
    TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF)
GO

I started up the session and started running some code to create the AutoStatsTest database, populate data in a table, and then run some SELECT queries. I left the auto_create_statistics property enabled on the database so that the SELECT queries should trigger automatic creation of stats. I then deleted data and ran the selects again.

After running my code, I opened up my trace file and saw…. nothing.

Extended Events None Found

Nothing to see here, folks

Hmm. I thought maybe it was a latency issue. I waited and re-checked the file. I reran the code. No change. I set it up from the beginning, same thing happened. I drank more coffee.

Then I realized I had probably screwed up my filter.

I had set up a filter on my “Filter (predicate)” tab, but I hadn’t actually told it to collect database_name. I went back in and checked that off.

Whoops, I didn't ask for that!

Whoops, I didn’t ask for that!

Now the session scripts out a bit differently — we’ve got SET collect_database_name=(1) in there:

CREATE EVENT SESSION [AutoStatsTest] ON SERVER 
ADD EVENT sqlserver.auto_stats(SET collect_database_name=(1)
    WHERE ([database_name]=N'AutoStatsTest')) 
ADD TARGET package0.event_file(SET filename=N'S:\XEvents\Traces\AutoStatsTest')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

After recreating the trace with the new definition and saving it off, I re-ran my test script and this time it worked as expected: I got trace data to explore!

Oh, that's interesting.

Oh, that’s interesting.

The Extended Events wizard can be a little rough around the edges. It didn’t warn me that I was doing something problematic. But it also made it pretty easy to set up my trace, and also test my theory about what was wrong with it.

Summing up: when you set up an Extended Events trace, just because you set a filter or predicate on a field doesn’t necessarily mean that it can use the filter. You may have to additionally specify that the field needs to be collected.

“OR” and “IN” with SQL Server’s Filtered Indexes

You can’t do everything with filtered indexes in SQL Server. For instance, you can’t create the following index:

CREATE INDEX IX_Votes_filter ON dbo.Votes (PostId) 
    WHERE (VoteTypeId = 1 OR VoteTypeId = 2);
GO

If you try, you’ll get the error message:

Msg 156, Level 15, State 1, Line 5
Incorrect syntax near the keyword 'OR'.

Instead, you can use ‘IN’ and create the index this way:

CREATE INDEX IX_Votes_filter ON dbo.Votes (PostId) 
    WHERE (VoteTypeId IN (1,2));
GO

That works– and good news, even queries written with ‘OR’ can use that filtered index, because SQL Server is clever like that. Here’s an execution plan that shows it in action.

Index found!

The filtered index can also be used for a subset of the VoteTypeIds, too. Here’s an example where I use it just for VoteTypeId = 1. (Note that I had to force it to do it– it preferred a different plan.)

filtered index subset of in list

Takeaway: SQL Server’s filtered indexes can be complicated and limited sometimes, but don’t get too tripped up by the fact that they don’t allow “OR”.

Need to learn more about indexes?

Measuring Performance Can Slow SQL Server Down

I’ve written a terrible query. It might be the worst query in the world. Sure, there are other queries that may be slower, but this query is terrible and elegantly short.

The query is gloriously bad for a two reasons:

  1. It’s incredibly inefficient in the way it uses a scalar function
  2. Observing the query’s performance can make it much, much slower

Don’t believe me on #2? Let’s take a look!

Here’s the query:

SELECT TOP 10
    Id, dbo.BadgeCountForUser(Id)
FROM dbo.Users
WHERE dbo.BadgeCountForUser(Id) > 50
ORDER BY dbo.BadgeCountForUser(Id) DESC
GO

The query uses a restored copy of the StackOverflow database, and the following (terrible) function and related index.

IF OBJECT_ID('dbo.BadgeCountForUser') IS NOT NULL
    DROP FUNCTION dbo.BadgeCountForUser;
GO
CREATE FUNCTION dbo.BadgeCountForUser (@UserId INT)
RETURNS INT
AS
BEGIN
    DECLARE @badgecount INT;
    SELECT @badgecount=COUNT(*)
    FROM dbo.Badges
    WHERE UserId=@UserId;

    RETURN @badgecount;
END
GO

CREATE NONCLUSTERED INDEX ix_FunctionAbuser ON [dbo].[Badges] (UserId)
GO

Now, let’s run it and see how long it takes. All my tests are running against a virtual machine on a laptop with 6GB of RAM, SSD storage, and no other activity running. I ran each test a few times and made sure I got consistent results.

SQL Server Dynamic Management Views (DMVs)

The easiest way to measure performance stats for a query using the dynamic management views is to dump the procedure cache, run the query, and then run our free tool, sp_BlitzCache®.

Note that I’m dumping the ENTIRE execution plan cache– I’m running this on a dedicated test instance, so my FREEPROCCACHE can’t harm anyone else.

DBCC FREEPROCCACHE; 
GO
SELECT TOP 10
    Id, dbo.BadgeCountForUser(Id)
FROM dbo.Users
WHERE dbo.BadgeCountForUser(Id) > 50
ORDER BY dbo.BadgeCountForUser(Id) DESC
GO
exec sp_BlitzCache @hide_summary=1;
GO

Here’s the duration and CPU from the output:

blitzcache-results

This gives me a lot of info — my total duration is over 14 seconds. Of that, 9+ seconds were the SELECT statement from inside the scalar function. And by the way, that executed 2.3 million times. WHOA. 2.3 million times? Yep, that’s how bad my query is.

Extended Events (XEvents)

I can easily run an Extended Events trace against my session using our tool, sp_BlitzTrace™ on SQL Server 2012 and higher. Here’s the syntax to do that:

exec sp_BlitzTrace @Action='start', 
    @SessionId=@@SPID,
    @TargetPath='S:\XEvents\Traces\';
GO
SELECT TOP 10
    Id, dbo.BadgeCountForUser(Id)
FROM dbo.Users
WHERE dbo.BadgeCountForUser(Id) > 50
ORDER BY dbo.BadgeCountForUser(Id) DESC
GO
exec sp_BlitzTrace @Action='stop';
GO
exec sp_BlitzTrace @Action='read';
GO

According to the trace, the query’s duration and CPU time are in the same ballpark as we saw before:

sp_BlitzTrace-Output

But I only have one line here– that’s because by default sp_BlitzTrace™ only collects sql_batch_completed and does not collect statement level completion events. You can turn that on, but remember how the dynamic management views told us the scalar function was executed 2.3 million times?

If you capture sp_statement_completed, that means capturing 2.3 million events. Which means your trace starts churning large amounts of data to the file system. And yes, if you collect the statement level stuff, you’ll slow the query down — and besides, querying the trace files is just awful.

I generated these trace files pretty quickly by collecting statement level events for this terrible query.

How to generate 1GB of XEvents trace files with a single SELECT statement… oops!

STATISTICS TIME AND STATISTICS IO

Another very common way to measure query performance is to turn on STATISTICS TIME and STATISTICS IO. I’m lazy, so I usually do these in a single statement, like this:

SET STATISTICS TIME, IO ON;
GO
SELECT TOP 10
    Id, dbo.BadgeCountForUser(Id)
FROM dbo.Users
WHERE dbo.BadgeCountForUser(Id) > 50
ORDER BY dbo.BadgeCountForUser(Id) DESC
GO
SET STATISTICS TIME, IO OFF;
GO

Here’s how that performed:

statistics-io-time

This performance is consistent across multiple executions. I am not making this up.

Wait just a second– those times were different!

Yep, there’s something weird about these numbers. Let’s sum up the CPU time measurements:

  • Dynamic Management Views via sp_BlitzCache®:~14 seconds
  • XEvents sql_batch_completed via sp_BlitzTrace™: ~14 seconds
  • SET STATISTICS IO, TIME ON: 70 + seconds
  • XEvents sp_statement_completed: gave up after one query generated 1GB of trace data

Dang, this stuff is complicated.

I still like STATISTICS IO and STATISTICS TIME. I do also like sometimes looking at statement level trace data, too! But with any method of measuring performance, I try to be aware that watching what’s going on can impact the results I’m seeing.

This post was inspired by Grant Fritchey’s recent post. If you’ve got your own story about “the observer effect” slowing things down, I’d love to hear it, too!

Brent Says: Whenever I’m there to watch a client’s server, it seems to go faster, and they’re mad because the problem is gone. Hmm.

Dropping an Offline Database in SQL Server

When you drop a database in SQL Server, the files are normally removed from the file system right away. POOF! Hope you had a backup if you ran DROP DATABASE against the wrong instance.

However, things are a little different if you take the database offline before you drop it. Consider the following code:

use master;
GO

CREATE DATABASE DropMeOffline
 ON  PRIMARY 
    ( NAME = N'DropMeOffline', 
    FILENAME = N'S:\MSSQL\Data\DropMeOffline.mdf' , 
    SIZE = 256MB , 
    FILEGROWTH = 256MB )
 LOG ON 
    ( NAME = N'DropMeOffline_log', 
    FILENAME = N'S:\MSSQL\Data\DropMeOffline_log.ldf', 
    SIZE = 128MB, 
    FILEGROWTH = 128MB )
GO

ALTER DATABASE DropMeOffline
    SET OFFLINE WITH ROLLBACK IMMEDIATE;
GO

DROP DATABASE DropMeOffline;
GO

Here’s what S:\MSSQL\Data looks like after I run the whole script, including DROP DATABASE…

Offline Database Dropped

The files are still there!

This is by design and documented in Books Online — any database with one or more offline files won’t have the files removed from the filesystem when a database is dropped.

Why Take Databases Offline?

Taking databases offline is a very useful process tool. When you’re retiring or migrating a database off of a server, it’s often convenient to make the database unusable for a period before really deleting it. Because sometimes it turns out that a database is in use when we thought it wasn’t.

If you do choose to take databases offline before dropping them:

  • Make sure you have enough copies of the last full backup for your policies
  • Test that the backup restores properly elsewhere and isn’t corrupt
  • When it comes time drop the database, remember that you’ll need to clean up the files manually if you drop it while it’s offline

But whatever you do, don’t skip that backup. The fact that the files aren’t automatically cleaned up doesn’t replace a good backup!

Brent Says: Wow, I didn’t know that.

The Hard Truth About Patching SQL Server Availability Groups (Hotfixes, Cumulative Updates, and Service Packs)

Whoa, be careful with that fix

Whoa, be careful with that fix

As a DBA, you’re responsible for identifying necessary updates to keep your SQL Servers healthy. Your business may have some mandates about the frequency of patches, but even if they don’t, you have a duty to look out for Cumulative Updates, Service Packs, and out of band hotfixes that can prevent downtime and prevent performance problems. It’s your task to test and apply them as needed.

If you use SQL Server Availability Groups (AGs), this is a tough responsibility. There are a lot of updates released for critical issues. And sometimes there are updates to the updates, because…. oops. And sometimes the updates may cause you downtime.

Here’s the lose-lose situation that the Availability Group DBA ends up in:

  • If you don’t apply updates, you could hit a critical performance issue that brings down your environment which was fixed in an existing cumulative update or hotfix. An update you should have known about.
  • If you apply updates regularly, you can cause yourself downtime by issues introduced in the updates. And here’s the worst part: testing reduces your risk of this, but doesn’t prevent it.

You can’t just ignore the situation. After all, you’re responsible.

High Availability is Supposed to Reduce Downtime– Including Downtime for Patching

One of the big reasons DBAs like High Availability solutions is that we want to reduce downtime during patching. If you’ve ever done much patching, you know that it can take a while to restart servers sometimes. Sometimes the server hangs on restart and you may have to connect to a special server management card to nudge it along. (iLo, DRAC, etc.)

If you’re using Database Mirroring, a Failover Cluster, or Availability Groups, you can reduce downtime in patching: you can fail your SQL Server resources around so that there are only short downtimes and your customers aren’t dependent on the server to come back. You want these features so you have less to worry about, not more.

PATCHING an Availability Group Can Cause Unplanned Downtime

Check out this recent critical problem impacting multiple cumulative updates across SQL Server 2012 and 2014. If you apply these updates, you may run into a critical blocking issue which you can only fix by:

  • Turning off automatic failover
  • Restarting the SQL Server instance on the primary replica. Yep, that’s an outage.
  • Turning on automatic failover

Update: KB 3033492 has now been published which gives more information on options to avoid the issue (spoiler: it requires an outage), or other ways you can attempt to correct it by killing sessions (which may not work).

You might wonder, “How many people could this have impacted?” I wondered too. The night I learned about this issue, I went to my local user group meeting and heard that a person there had been impacted by the bug. They said their production environment was down for more than 9 hours while they worked with Microsoft to resolve the issue.

Which of These Downtime Scenarios is Worse?

Pick your poison. And it will feel like poison. Is it worse to:

A) Be an unlucky optimist. Follow the steps in the KB. Hope that you detect the problem during your patching window if it occurs. There’s a chance that the issue might be detected after your planned patching window has ended (and you’ve gone back to sleep), and then you’ve got an unplanned downtime on your hand and unhappy customers.

B) Be a beleaguered pessimist. Proactively take the steps above when applying patches to prevent the bug from occurring. This is more work for you, more steps for you to carry out and validate (you did put that back in automatic failover, right?), and more outages for your customer. You’re now not just failing back and forth between nodes, you’re doing an extra restart. And restarting the instance underneath the primary replica with no failover is exactly the type of thing we expect High Availability to save us from.

This Couldn’t Happen Again, Right?

Remember SQL Server 2012 SP1? It had an issue that impacted some installs where processes named msiexec.exe would keep running and use tons of CPU. It bloated out the registry. It was pretty nasty, and it raised the issue, “Should you trust Service Packs?” (Spoiler: no.)

That issue wasn’t specific to AGs. But we now have a pattern where supported, released updates (even service packs) can cause major problems on a portion of installs and they are released anyway. Odds are, it will happen again.

But Microsoft Said I Should Only Install Cumulative Updates if I Need Them, Right?

Yep– that’s what they used to say. It was dangerous advice to follow, because Service Packs are released very slowly these days.

But this changed with KB 2964518 – “Recommended updates and configuration options for SQL Server 2012 and SQL Server 2014 used with high-performance workloads.” This article specifically recommends using the latest cumulative update, and gives a dramatic list of issues fixed in those updates.

Cumulative Updates ARE Especially Critical for AGs. Look at these Fixes…

It’s possible you might still be thinking that it’s safer to just not patch an availability group and hope for the best. That’s not a good idea. Take a look at these fixes:

I know that some folks are going to ask for a comprehensive bug list specific to AGs. I’m afraid I don’t have time to maintain that list– and that is kinda telling.

Nobody likes it when the doctor makes this face.

Nobody likes it when the doctor makes this face.

So What Should You Do?

How much time and effort do you have for reading about fixes, testing them carefully, watching blogs for out of release fixes and known issues, and establishing complex patching and production processes?

If you don’t have at least an hour a week to keep on top of available fixes, and another eight hours a month (minimum) to devote to testing and deploying fixes, you don’t have time for an AG.

Got more than two replicas? You need more time. Running Windows Server 2008R2? Well you need a lot more time, because you should be moving out of there.

It was hard for me to write this post. I’ve met some of the folks who make SQL Server. They’re incredibly smart, very dedicated, super helpful, and they work very hard to make a great product. But right now Availability Groups are a very complex and time consuming feature to manage — and many teams don’t have the right staffing and time to take on that challenge and succeed.

Brent Says: I have a hunch that maybe Kendra was a little angry when she wrote this, but even so, I’m glad I’m a consultant these days and not a full time DBA. Patching blows.

Failing over an Asynchronous Mirror in SQL Server (video)

Will a long running open transaction block you from changing to synchronous mode and failing over to your SQL Server database mirror? Join Kendra to test performing a planned failover from an asynchronous database mirror.

Brent Says: Often, when I’m working with developers, they’re totally surprised that SQL Server’s failover methods all break in-flight transactions. No matter whether you use mirroring, AlwaysOn Availability Groups, or failover clustering, you’ve gotta build in retry logic in your applications if you want seamless failovers.

Reporting From a Log Shipping Secondary in STANDBY mode

You can build a lot with simple tools. Carefully.

You can build a lot with simple tools. Carefully.

I’m a fan of SQL Server’s transaction log shipping. It works in Standard Edition, it’s relatively simple to set up, and you can even make your log shipping secondary readable using STANDBY mode.

I’ve worked with some pretty cool, complex log shipping environments over the years. In one case, we had multiple log shipping secondaries and a load balancer involved to support a full fledged reporting application. It worked pretty well– with a lot of careful scripting and support.

But there’s a few things you should know before you decide to implement this yourself.

“Readable” Log Shipping Secondaries Are Just a Point in Time

Although a log shipping secondary can be made readable using “STANDBY” mode, it’s just readable to a specific point in time. If I bring the secondary online at 2 pm, users can only read data through the last committed transaction in the last log file I restored. And…

Everyone’s Kicked Out When You Restore Transaction Logs

The database can’t be read when you’re refreshing data. You must kick out any users (or not refresh the data).

Logs Restore More Slowly if You Use STANDBY (Readable) Mode

If you’re reading from the secondary, you usually want those periods where it’s unavailable to be as short as possible. If you have a lot of log files to restore, you will probably find that performance of the restores is better if you’re in “NORECOVERY” mode while you’re doing the restores, and then switch back to “STANDBY” at the end of the process so the data can be read.

This can be done, it just can take some fancy scripting.

You will also need to add monitoring for slow restores if getting the database online is critical. If your log shipping secondaries get behind, it’s possible to use differential backups to catch them up.

You can’t use “STANDBY” Mode if your Secondary is a Higher Version

I was a little sad when I learned this one years ago. I had a SQL Server 2005 instance that was the log shipping primary. The log shipping secondary was made readable so that developers could check out production data if needed without having access to production.

Our upgrade plan was to get SQL Server 2008 on the least critical servers first — and that developer access server was one of them. But I couldn’t use STANDBY mode on the log shipping secondary with it as a higher version: it failed with an error like this…

This backup cannot be restored using WITH STANDBY because a database upgrade is needed. Reissue the RESTORE without WITH STANDBY.

When SQL Server brings a database online in a higher version, it needs to make some modifications to that database. This breaks with STANDBY mode because the database is read only.

An aside: unfortunately, the Enterprise Edition feature of a database snapshot doesn’t overcome this limitation if you try to use it against a database mirror to make it readable to a point in time. You get a different error, but it’s the same theme:

Msg 946, Level 14, State 1, Line 1
Cannot open database 'MirrorMePlease_snap' version 661. Upgrade the database to the latest version.
Msg 1823, Level 16, State 7, Line 1
A database snapshot cannot be created because it failed to start.

Auto-Create Statistics Doesn’t Work in the User Database in STANDBY mode

If you’re using your log shipping secondary for reporting, SQL Server can’t automatically create statistics in the database, regardless of your settings. (That read only thing keeps coming up.)

In SQL Server 2012 and higher, this isn’t a huge problem because temporary statistics can get created in tempdb. This new feature was added when Availability Groups came along, but it also works for log shipping secondaries, which is pretty awesome.

Security is a Hoot. And by that, I Mean a Problem. (Less So in SQL Server 2014)

In many scenarios, you only want to grant read access to a user on the log shipping secondary database. You do not want that user to be able to access the primary. This is tricky.

To read from a database, you need a login with an associated database user. To grant reads to a log shipping secondary database, you can create all the logins you want– but the log shipping secondary database is read only, so you can’t create a user in it.

Up through SQL Server 2012, here’s the two main options I’ve seen people use (there are a couple more in the comments):

  1. Grant access via stored procedures or views in another database. This will require enabling cross database ownership chaining, which can be a treacherous security road. And a lot of things could go wrong over time as tables are added, modified, and dropped.
  2. Create the login on the log shipping primary instance with the associated database user, and disable the login on the log shipping primary instance. If you’re using SQL authentication, you may have to use a special script to transfer the SID to the log shipping secondary to get it all to work.

Option 2 isn’t terrible, it’s just awkward to have a bunch of disabled logins. Someone can misunderstand and accidentally enable them or delete them, and then… oops.

Using CONNECT ANY DATABASE and SELECT ALL USER SECURABLES IN SQL SERVER 2014

As of SQL Server 2014, you have a new option because of a couple of new security features. You can do this:

CREATE SERVER ROLE readeverything;
GO
GRANT CONNECT ANY DATABASE TO readeverything;
GO
GRANT SELECT ALL USER SECURABLES TO readeverything;
GO

When you add a member to this role, they can read anything. I tested and this worked for me against a logshipping secondary in standby mode just fine.

CONNECT ANY DATABASE is designed to allow access to connect to databases that exist now and databases that exist in the future, so this really IS permissions to read everything, including system databases– only use this if that’s appropriate.

It’s Complicated, but It’s Still Cool

If you’re clever and dedicated, you can work around these issues and use log shipping to provide read access to either applications or users who shouldn’t be reading from the live production copy of data.

Brent Says: Even with AlwaysOn AGs out, I think I’ve said the words “use log shipping” more in the last two years than I’ve said in the rest of my career. It’s useful, cheap, flexible, and nearly bulletproof.

How to Use SQL Server’s Dedicated Admin Connection (DAC) – Video

You may have heard that there’s a secret back door into your SQL Server– but have you enabled access to it properly? Do you know when you might need to use it, and how you can connect? Kendra Little explains why the Dedicated Admin Connection (or “DAC”) is so important and how to configure remote access.

Don’t have 11 minutes / don’t have YouTube access? Head on over to our blog post on the DAC.

Jeremiah says: I had to use this once. I ran into a low memory situation on a SQL Server and Windows decided to page out SQL Server in the middle of the business day. The system ground to a halt, it was even unresponsive over RDP, and we had to hard power cycle the server (hold down the power button in the server room). I set up the DAC and the next time the issue happened, I was able to jump into the SQL Server, collect information, and then issue a graceful shutdown. This made it easier to diagnose the problem, too.

Brent says: Using the DAC is like knowing how to read the basics of DBCC CHECKDB output. You’re probably never going to need it, but when you do, just ten minutes of preparation will make a world of difference.

Jes says: What Brent said. A few minutes of practice every six months can save minutes or even hours of frustration – and potentially downtime – when faced with a real emergency.

css.php