Blog

Why Zero Data Loss in SQL Server is Harder Than You Think

But Kendra, it can’t be that hard… after all, we have synchronous modes in Database Mirroring and Availability Groups, right?

Synchronous Commit doesn’t mean “zero data loss”

When we think about limiting data loss, the first thing we think of is a technology that lets us reduce points of failure. If every transaction must be written to two separate storage systems, we have a pretty good chance to have no data loss, right?

Maybe.

Let’s say you’re using a SQL Server Availability Group in SQL Server 2014 with a synchronous replica to do this. The secondary replica fails and is offline, but you don’t lose quorum. If you want 0 data loss, the primary needs to stop accepting writes immediately, right?

It doesn’t do that. The primary replica keeps going and writes can continue. Here’s what that looks like:

Synchronous AG Exposed

Tabloid headline: Synchronous AG Running Exposed! Scandal!

You could write custom scripts to detect the situation and stop the primary replica, but there’s a couple of problems with that. First, you’re offline, and you probably don’t want that. And second, it’s going to take some time to get that done, and that means that you don’t have zero data loss– you could lose anything written in the meanwhile. You could add another synchronous commit replica, but there’s obvious cost and support impacts, and you still aren’t guaranteed zero data loss.

Synchronous writes don’t necessarily guarantee zero data loss, you’ve got to dig into the details.

This stuff isn’t obvious

I’ll be straight up: I’ve been working with high availability and disaster recovery for a long time, and I hadn’t actually thought very critically about this until a recent chat room conversation with Brent discussing why it’s not super easy for cloud hosting providers to offer zero data loss in return for a lot of dollar signs.

Crazy facts: you can learn things from chat rooms and from the cloud. Who knew?

NEED TO PROTECT YOUR DATA? YOU NEED A FULL TIME EMPLOYEE WHO IS RESPONSIBLE FOR THAT.

If data loss is important to you, don’t just assume that you’ve got it under control because you’re paying a vendor to take care of it. If you look closely, you may find that nothing’s working like you think! When your data is important, you need to make someone responsible for ensuring that you’re meeting your RPO and RTO, and have them prove that it works on a scheduled basis. Their job title doesn’t have to be “Database Administrator,” but they need to work for you, and they need to take their responsibility seriously.

Want to Learn More About High Availability and Disaster Recovery?

We just launched our new DBA’s Guide to SQL Server High Availability and Disaster Recovery – a 6-hour online video course that teaches you about clustering, AlwaysOn AGs, quorum, database mirroring, log shipping, and more.

New Quiz! SQL Server High Availability and Disaster Recovery

HADRQuizThink you know everything there is to know about high availability and disaster recovery?

Head on over to our new free quiz to find out! Take the SQL Server High Availability and Disaster Recovery Quiz.

The Nine Circles of Developer Hell

Commit message: Duck!

Commit message: Duck!

Everyone’s code is terrible. But exactly how terrible is yours?

First Circle: Code Limbo

We can’t say this is good code, but for the most part nobody notices it.

Second Circle: Code Lust

Cool function, bro!

Third Circle: Gluttonous Code

There’s always one simple piece of logic that’s been written hundreds of times, all over the codebase, but in slightly different ways. And then, one day, you need to change it.

Fourth Circle: Angry Code

Your application has periodic timeouts, which disappear as soon as users report them.

Fifth Circle: Greedy Code

This code was only supposed to run once when a user logged in and their profile was loaded. But sp_BlitzCache® says it runs 4,000 times per minute when you’re hardly doing anything.

Sixth Circle: Code Heresy

This code looks incredibly bad. It violates common sense, but it works right now. You’re afraid to touch it, for fear it will burst into flames.

Seventh Circle: Violent Code

Your application is deadlocking, and literally killing itself.

Eighth Circle: Code Fraud

That day you find out that large portions of your source code don’t actually belong to your company. And now you need to fix it.

Ninth Circle: Treacherous Code

When your database corrupts itself.

What Permissions does QUERYTRACEON Need?

The QUERYTRACEON hint can be particularly useful for testing the new cardinality estimator in SQL Server 2014 on individual queries using Trace Flag 2312, if your database is still using compatibility level 110:

SELECT ClosedDate, Id
FROM dbo.Posts
WHERE LastActivityDate > '2013-09-05 11:57:38.690' 
OPTION (QUERYTRACEON 2312);
GO

Conversely, if you’re using the new estimator everywhere by having database compatibility set to 120, you can use the old estimator for an individual query by using QUERYTRACEON 9481:

SELECT ClosedDate, Id
FROM dbo.Posts
WHERE LastActivityDate > '2013-09-05 11:57:38.690' 
OPTION (QUERYTRACEON 9481);
GO

But… Permissions?!?!

There’s a little problem. Turning on a trace flag requires high permissions. Let’s say that I’m attempting to use this hint from an account which doesn’t have superpower permissions. Here we create a login and user for app_account and grant it data reader, and execute on the dbo schema:

USE [master]
GO
CREATE LOGIN [app_account] WITH PASSWORD=N'DontBeLikeMeUseWindowsAuth', 
  DEFAULT_DATABASE=StackOverflow, CHECK_EXPIRATION=ON, CHECK_POLICY=ON
GO

use StackOverflow;
GO

CREATE USER [app_account] FOR LOGIN [app_account];
GO

ALTER ROLE [db_datareader] ADD MEMBER [app_account];
GO

GRANT EXECUTE ON  SCHEMA::dbo to [app_account];
GO

We can now impersonate app_account to run some tests!

EXECUTE AS LOGIN='app_account';
GO

Meet Error 2561, User Does Not Have Permission to Run DBCC TRACEON

When I try to run the query, things screech to a halt:

Error 2571 - no permission to run DBCC TRACEON

Oh, sad panda

Even if I try to get tricky, I can’t sneak this into a temporary procedure as app_account, either:

cannot sneak this into a temp procedure, either

Deferred name resolution doesn’t help me here. I guess it’s not a name.

Stored Procedures to the Rescue

We need a little help from a stored procedure. We’re going to revert back to our sa role, and create a procedure around our query:

REVERT
GO

CREATE PROCEDURE dbo.RecentPosts
    @DateVal DATETIME
AS
    SELECT ClosedDate, Id
    FROM dbo.Posts
    WHERE LastActivityDate > @DateVal OPTION (QUERYTRACEON 2312, RECOMPILE);
GO

/* Now we go back to running as app_account again */
EXECUTE AS LOGIN='app_account';
GO

Due to the magic of ownership chaining, we can now run the query as app_account:

No errors to be seen

 

This Means You Don’t have to Give Your applications Sysadmin Rights

… but you may have problems if those applications need to run a lot of adhoc queries.

Want to learn more about statistics and the new cardinality estimator? Join me and Jeremiah Peschka for our Advanced Querying and Indexing course!

 

 

New Free Quizzes: Indexing, Query Writing, and More

Quiz time!

Quiz time!

Want to test your knowledge on indexes and query tuning in SQL Server?

Curious how your query tuning style compares to others?

Looking to have a little fun?

Interested in a coupon code?

Your Monday just got better. Check out our new free online quizzes right now:

  1. Test your SQL Server indexing prowess
  2. See how your query writing skills compare to others
  3. Take the SQL Server query tuner personality test

psst– New Free Webcast Next Friday on Query Execution Plans!

We probably won't make you spill coffee on your keyboard.

Classic anti-pattern: coffee’s going to end up all over that keyboard.

Find and Fix Your Worst Query Patterns (Sponsored by Idera Software)

Friday, June 12, at 11:00AM Central

You’d love to make your queries faster, but you’re not sure what to change. Kendra Little will teach you how to identify bad patterns in the execution plans of the biggest, baddest queries running against your SQL Server. You’ll see Kendra triage bad queries in action and get an easy-to-use reference you can use to diagnose problems in your query execution plans back at work. Register here!

Missed the Webcast or Want to Learn More on the Content?

Want More Free Training?

Hit up our events page to register for more free upcoming events on virtualization, Availability Groups, index tuning, database mirroring, and more.

When does a Query Get Trivial Optimization?

We had some great questions about trivial execution plans in SQL Server in our Advanced Querying and Indexing class a few weeks ago. Here’s a little glimpse into what we talked about.

For really simple queries, SQL Server can use “trivial optimization”. If there’s a very limited number of ways to run the query, why do a bunch of fancy, CPU burning cost-based optimization? Just chuck the plan at the query and let it go!

Downside: Trivial plans don’t ask for indexes

With a trivial plan, SQL Server never gets to the phase of optimization where it would ask for an index. This means you might have a really frequently executing query running against your SQL instance, burning lots of CPU which might be vastly improved by an index, but nothing ever registers in the missing index dynamic management views.

How Trivial Does A Query Need to be for this Optimization?

I find trivial plans frequently, but it’s also a little trickier than you’d think. Let’s look at some examples:

I’m starting off with a simple query, running against a large table:

SELECT Id
FROM dbo.Posts
WHERE ParentId=3;
GO

The Posts table has only a clustered primary key on on the Id column. This query gets FULL optimization, and SQL Server asks for an index.

full optimization with an index request

I’d like this to go, with an index please.

Why wasn’t this plan trivial? Well, even though SQL Server didn’t have its choice of indexes to use, notice the parallel lines in there. SQL Server did have to decide whether or not to use more than one processor!

Let’s Take Away SQL Server’s Choices

Sorry SQL Server, I’m writing this blog post, and I know that I can remove this query’s chance to go parallel by raising one setting: Cost Threshold for Parallelism.  This setting is the “estimated cost bar” for who gets to use multiple cores. I’m going to pump this setting up so my query doesn’t qualify.

exec sp_configure 'cost threshold for parallelism', 500
GO 
reconfigure
GO

Be careful, this setting impacts all queries on the instance, and maxdop hints don’t override it.

Now, rerunning my query:

trivial plan, no missing index request

I’ll just scan the clustered index and keep my mouth shut.

This time I got trivial optimization – the query’s estimated cost for running a single threaded plan doesn’t even qualify for parallelism, so that choice doesn’t exist.

No missing index shows up, even though SQL Server did 346,913 logical reads for this query.

What if I Add an Unrelated Nonclustered Index?

Now that we have a trivial plan, let’s give SQL Server a choice– but not a very good choice. We’re going to create an unrelated non-clustered index:

CREATE NONCLUSTERED INDEX ix_Posts_LastActivityDate ON dbo.Posts
(LastActivityDate)
GO

Our query doesn’t reference the LastActivityDate column at all.

Rerunning our query…

full optimization due to an an NC index

I had to think about things this time.

Full optimization is back! Even though the LastActivityDate index seems really unrelated to what we’re doing, just adding it puts us back in FULL optimization.

Let’s Clean Up That Cost Threshold

Otherwise I’ll forget about it and not understand the weird plans on my test machine later. I’m just setting it back to the default here:

exec sp_configure 'cost threshold for parallelism', 5
GO 
reconfigure
GO

Takeaways: Beware the Creeping Trivial Plan

While most of your tables may be indexed well, it’s easy for changes in code to result in poorly indexed changes creeping out. While SQL Server’s default setting of 5 for “Cost Threshold for Parallelism” is generally far too low for modern processors, understand that raising it may increase your chances of getting trivial execution plans, which won’t ever ask for indexing help.

Want to find out if you’ve got trivial plans among your top queries? Our free sp_BlitzCache™ procedure sniffs around in your query XML and warns you right away.

Who’s Backing Up That Database?

A question came up on Twitter recently– how you can you figure out where a rogue backup is coming from?

I’ve run into this question with clients before. In an enterprise environment where backup tools come and go, it’s not unusual to find backups being run by some mysterious account.

By the way, when’s the last time you changed your service account password?

Uncomfortable discussion time: should something unknown out there have permission to do this?

Lecture over, let’s track down the culprit

If the backup command is failing, it’s probably generating a level 16 error much like this:

Msg 911, Level 16, State 11, Line 1
Database 'idontexist' does not exist. Make sure that the name is entered correctly.
Msg 3013, Level 16, State 1, Line 1
BACKUP DATABASE is terminating abnormally.

The good news is that you can easily set up an alert to notify you right away when these problems happen.

The bad news is that the alert doesn’t tell you who tried to run the backup. Neither does the SQL Server Error log, it just says:

BACKUP failed to complete the command BACKUP DATABASE idontexist. Check the backup application log for detailed messages.

But, uh, who exactly is the backup application?

If the Backup is Failing, You Can Trace Based On the Error

SQL Server doesn’t capture the details for Severity 16 alerts by default, but you can trace failing commands in your SQL Server like I show in this earlier post. In this case, we want to customize the Extended Events Trace so we can get more information about who is trying to run the backup.

Here’s an example script to set up a trace (SQL Server 2012 and higher only):

CREATE EVENT SESSION [Sev 16 and Higher Errors] ON SERVER 

--Just using the error_reported event
ADD EVENT sqlserver.error_reported(
    ACTION(sqlserver.client_app_name, sqlserver.client_hostname, 
       sqlserver.database_name, sqlserver.nt_username, sqlserver.session_id, 
       sqlserver.sql_text, sqlserver.username)
    WHERE ([severity]>=(16)
    )
) 

--Customize the target
ADD TARGET package0.event_file(SET filename=N'S:\XEvents\Traces\Sev16AndHigher.xel')

--This is just using the defaults
--Note that startup state is off
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF)
GO

Now, when my backup fails, I can open up my trace file and see info like this:

backup being run from the sa account

It’s in the walls!

IF THE BACKUP IS Succeeding, YOU CAN TRY ANOTHER TRICK

You might have a situation where the backup isn’t failing, you just aren’t sure who’s running it and what permissions they’re using. Maybe you notice a message like this in your SQL Server Error Log:

Database backed up. Database: master, creation date(time): 2015/05/02(10:43:10), pages dumped: 59538, first LSN: 1098:34688:120, last LSN: 1098:34768:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'NUL:'}). This is an informational message only. No user action is required.

Someone’s backing up my master database, but writing it out to NUL: and throwing it away? That’s crazy! Why would someone do that? I need to track them down right away. I think a little user action is required.

In this case, since the backup command succeeded, I can get more information right away from SQL Server’s default trace. It saw the name of the host I was running from (“UNLIMITED”), the fact that I was running under the SA account, and even the fact that I was dumping the data to NUL:

backup event in the default trace

The default trace sees the terrible thing I did

While we’re on the subject of backups…

Brent says: you’d be surprised how often we hear, “wait a minute, why is IO freezing every fifteen minutes on this server? Who’s taking all these snapshot backups of this VM?”

Contest: Write the Scariest DBA Sentence (10 words or less)

Warm up your keyboard, it’s time for a contest.

The challenge: Write the creepiest sentence you can come up with. You get 10 words max.

An example to get you started:

Oh, I’m sure someone set up the backups.

The rules:

  • Enter as often as you like (here on the blog, in the comments for this post)
  • The contest will close on Thursday, June 4, 2015

The prize:

  • Bragging rights. That’s sublime all by itself.
  • One free license for Goat Simulator from Steam – we’ll send the gift certificate to the email address of your choice. ($9.99 value USD.) http://store.steampowered.com/app/265930/

Update…. the winner has been selected! We had tons of great entrants, but I had to pick a winner.

Honorable mentions:

5. “Microsoft has released details of new SQL licensing model.” … Dorsey

4. “I wrote a trigger to prevent duplicate inserts.” … Chris Bergin

3. “We just make everyone a sysadmin so it’s easier.” … John Nelson #2

2. “The server room is directly below the men’s room?” … ScalabilityDoug

And the winner of the bragging rights AND the coveted free license of Goat Simulator is:

1. “Please keep this pager on you at all times.” …Tim Cockerham

Tim, we’ve emailed you all the info to claim your prize.

Not Everything Should be Automated

How did I ever get this to work????

How did I ever get this to work????

Some people hate repetition. They want to avoid manual tasks at all costs.

I used to be part of the cult of automation. I thought that everything should be made efficient. Why should I do something twice, if I can figure out how to do it once and then just run a script or push a button?

I don’t feel that way anymore about a lot of things.

One thing I do often is write scripts to reproduce problems or demonstrate how something works. Very often, there’s pretty similar setup steps in these scripts. I’ve written the following code hundreds of times:

/* Recreate the test database with prejudice */
USE master;
GO
IF DB_ID ('IAMNotAProductionDatabase') IS NOT NULL
BEGIN 
     ALTER DATABASE IAMNotAProductionDatabase SET SINGLE_USER 
       WITH ROLLBACK IMMEDIATE;
     DROP DATABASE IAMNotAProductionDatabase;
END
GO

CREATE DATABASE IAMNotAProductionDatabase;
GO

It’s often useful to back up a demo database and restore it to avoid a long setup process. So I also frequently retype commands like this:

BACKUP DATABASE IAMNotAProductionDatabase TO DISK=N'S:\Backup\IAMNotAProductionDatabase.bak' WITH COMPRESSION, INIT;
GO
RESTORE DATABASE IAMNotAProductionDatabase FROM DISK=N'S:\Backup\IAMNotAProductionDatabase.bak' WITH NORECOVERY, REPLACE;
GO

/* I like to do this in a separate step because I am neurotic like that */
RESTORE DATABASE IAMNotAProductionDatabase WITH RECOVERY;
GO

I could use snippets in SQL Server Management Studio to reduce the typing. I could use a third party tool which auto-completes statements.

But I don’t. I re-type it every time. I was slow at first, but I’m much faster at it now.

These are commands I care about. Sure, they’re simple, they’re basic, they’re nothing fancy. But they’re really core to my profession and my job. I like being able to type these commands quickly and not wonder for too long about the syntax. I like not having to worry if I remembered to set up snippets in whatever test environment I’m using, or if it has third party tools installed.

I’m terrible at memorizing things, and this helps.

I’m not against automation. For production monitoring and responding to incidents, setting up tools and processes that help you work faster and avoid having one person as a single point of failure is awesome. Just don’t apply that principal to everything in your work.

There are some tasks that are fine to repeat, if you want to get to know them by heart.

Brent says: to learn more, check out our list of tasks you probably shouldn’t automate, and read the comments too – lively discussion in there.

Jeremiah says: I’m a huge fan of automation when you understand what you’re automating, why you’re automating it, and what can go wrong when you’re not watching. Otherwise, you need to be really careful.

Erik says: Just about everything I’ve automated has stemmed from a .sql file full of trial, error, and documentation. Know your processes, and make sure someone is made aware if they start failing.

css.php