Announcing the Dell DBA Days: August 25-28

If you want to work on the cutting edge, you can’t just read best practices – you have to go create them. Somebody’s gotta teach you how to use SQL Server 2016, right?

We asked Dell if we could fly out to their headquarters in Round Rock, Texas and borrow a bunch of hardware and storage gear for a week. Doug, Erik, Jeremiah, Kendra, and I will set up shop to do lab tests, experiments, and learning with the latest hardware, shared storage gear, and SQL Server 2016.

And we bet you wanna watch online.

So we’re bringing you Dell DBA Days. Twice a day, we’ll take a break from our work to do a one-hour webcast with you, talking about how we’re testing and what we’re learning.

Here’s the webcast lineup:

  • Tuesday morning – How Do You Fix a Slow TempDB?
  • Tuesday afternoon – TempDB Load Testing with SQL 2012, 2014, 2016, and SSDs
  • Wednesday morning – Introducing This Week’s Lab
  • Wednesday afternoon –  Advanced Shared Storage Features
  • Thursday morning – Find Your Slow SQL Server’s Bottlenecks Fast
  • Thursday afternoon – How to Prove Hardware is a Problem
  • Friday morning – SQL Server High Availability Options Explained
  • Friday afternoon – Watch SQL Server Break and Explode

Register now to watch us live.

PASS Summit #FreeCon2015: What We’re Talking About

FreeCon SQLSaturday Chicago 2012

FreeCon SQLSaturday Chicago 2012

On the Monday before the PASS Summit in Seattle this year, we’re putting on another one of our FreeCons. The FreeCon is different from a typical conference pre-con – it’s free, and it’s about the SQL Server community, presenting, marketing, branding, networking, and yes, some technical stuff too.

In the past, they’ve been invitation-only, but this year we opened up applications to the public. One of the application questions asked, “What’s the one thing you want to learn from the Brent Ozar Unlimited team?”

The answers were really enlightening, and we figured you’d like to hear ’em too, so here’s some of the highlights:

Business / Branding / Marketing Answers

How some of the brilliant marketing initiatives come about (brainstorming, creativity, etc) .

Tips on protecting IP and what works for you.

How to know when you are really ready to become a consultant.

Shameless self promotion :-) I’ve got skills and I want to market them better. I know about many different aspects of IT and SQL Server, but aimed at the small to medium shops. A more personal relationship with my customers is key.

How to build brand. I’m not a consultant or self-employed, but I am interested in expanding beyond the user group core of my local community. I don’t really know where to start or have a very solid idea of what it would entail–aside, of course, from the limos and champagne…

Presenting Answers

How to build demos that don’t suck.

How to build an plan to move from being a known speaker to one of the best speakers. I have had some great success but would love some help getting to the next level.

How to approach New technical initiatives at work and how to approach upper management correctly to gain buy-in.

Tips about presenting and stage-presence.

Technical Answers

Performance tuning especially query optimization and hardware optimization.

SQL 2014 in-memory tables; service broker; pushing the limits of availability groups, interesting stuff with larger/distributed setups

Best arguments to convince my employer that my unofficial, very-time-limited accidental DBA-ing isn’t going to cut it (especially since I got primarily other responsibilities), we need a dedicated DBA.

Community / Volunteering Answers

Finding the right mix of technical ‘up-keep’ and community leadership/involvement.

What you define as community success

Why public speaking at SQL events is so important.

I would love to learn how you guys structure your days and manage everything that you do (working, speaking, blogging, training). What are your personal stories, how did you get to where you are today, and what have you learned along the way?

Your Answers Help Us Build the Agenda – and the Community

As I wrote five years ago in my post Rock Stars, Normal People, and You, building a successful community means reaching out to those around you, giving back, and propelling the community forward. It’s been so exciting to hold these events and see the graduates go on to do amazing things for themselves and the community.

This year’s FreeCon is about finding 50 people who are excited to keep pushing the boundaries of community, and then us all helping each other fill in the gaps to multiply our efforts.

New sp_AskBrent® Reports on Waits, Files, Perfmon Since Startup

When your SQL Server is slow right now, you can run sp_AskBrent® to find out what’s going on. It checks for things like long-running queries, backups, blocking, files growing, and will tell you the top wait stats right now.

You can use the @Seconds parameter to control the sample length – by default, it samples for 5 seconds.

In the newest version, you can use @Seconds = 0 to skip the sampling altogether, and turn on @ExpertMode = 1 to see all the dirty details:

sp_AskBrent with @Seconds = 0 - click to enlarge

sp_AskBrent with @Seconds = 0 – click to enlarge

The top result set is about urgent warnings, and you can skip those if you’re doing over-time analysis. The wait stats and physical reads/writes are useful, though – these show you the top files by reads and writes for your instance.

Also in the news, when you fill out our EULA to download it (along with all of our goodies), there’s a checkbox to get alerted whenever we update our First Responder Kit. If you check that, we now send you a direct link to the zip file so you don’t have to hassle with the EULA again.

Enjoy, and hope this helps make your life easier.

Faster queries using narrow indexes and CROSS APPLY

It’s common to try to build the perfect index for a query. This index only uses the exact right columns for the key columns and other, presentational, data is pushed to the INCLUDE section of the index. While you can create a narrow key this way, the entire index row can become wide.

A Sample Report

For this blog post, we’re going to work with the StackOverflow dataset. For more information on getting started with the StackOverflow database, check out How to Query the StackExchange database.

We want to find interesting, unanswered questions that have been asked in the last 90 days.

We’ll start by creating some indexes:

CREATE INDEX IX_UnansweredQuestions
    ON dbo.Posts (CreationDate) 
    INCLUDE (Id, Score, OwnerUserId, Title) 
    WHERE (AcceptedAnswerId = 0) ;

CREATE INDEX IX_ScoredCommentsByParent
    ON dbo.Posts (ParentId, Id)
    WHERE (Score > 0);

    ON dbo.Votes (PostId) ;

    ON dbo.Votes (VoteTypeId, CreationDate)
    INCLUDE (PostId, UserId) ;

And here’s our sweet query:

        p.Id ,
        p.Score ,
        p.OwnerUserId ,
        + (SUM(CASE WHEN v.VoteTypeId = 2 THEN 1
                 WHEN v.VoteTypeId = 3 THEN -1
                 ELSE 0 END )
            / COUNT(*))
        + SUM(COALESCE(uv.Reputation, 0)) AS SortWeight
FROM    dbo.Posts AS p
        JOIN dbo.Votes AS v ON p.Id = v.PostId
                               AND v.VoteTypeId IN (2,3)
        LEFT JOIN dbo.Users AS uv ON v.UserId = uv.Id
WHERE   p.CreationDate > '2013-06-08'
        AND v.CreationDate > '2013-06-08'
        AND p.AcceptedAnswerId = 0
        AND p.PostTypeId = 1
                    FROM    dbo.Posts a
                    WHERE   a.ParentId = p.Id
                            AND a.Score > 0)
GROUP BY p.Id, p.Score, p.Title, 

This query uses a lot of CPU and uses nearly 5 seconds of CPU time on my machine between parsing, compilation, and execution. Overall it does around 1 million logical reads. That’s a few too many reads for my liking. I think we can make it better.

Simple, Small, Indexes and APPLY

We need a way to make this query faster and lighter. First, we get rid of the wide index we created before.

DROP INDEX dbo.Posts.IX_UnansweredQuestions ;

We add two more indexes:

CREATE INDEX IX_UnansweredQuestions
    ON dbo.Posts (PostTypeId, AcceptedAnswerId, CreationDate) 
    INCLUDE (Id, Score) ;

CREATE NONCLUSTERED INDEX IX_Posts_AcceptedAnswerId_PostTypeId_CreationDate
    ON dbo.Posts (PostTypeId, CreationDate)
    INCLUDE (Id, Score);


And then we re-write the query with magic:

WITH recent_votes AS (
    SELECT  v.PostId,
            (SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 
                     WHEN v.VoteTypeId = 3 THEN -1
                     ELSE 0 END ) 
                / COUNT(*)) AS VoteWeight
    FROM    dbo.Votes AS v
    WHERE   v.CreationDate > '2013-06-08'
            AND v.VoteTypeId IN (2,3)
    GROUP BY v.PostId, v.UserId
posts AS (
    SELECT  p.Id ,
    FROM    dbo.Posts AS p 
            LEFT JOIN dbo.Posts AS answers ON answers.ParentId = p.Id
                                              AND answers.Score > 0 
    WHERE   p.CreationDate > '2013-06-08'
            AND p.AcceptedAnswerId = 0
            AND p.PostTypeId = 1
            AND answers.Id IS NOT NULL
        p.Id ,
        p.Score ,
        _.Title, _.CreationDate, _.OwnerUserId ,
        + SUM(v.VoteWeight)
        + SUM(COALESCE(uv.Reputation, 0)) AS SortWeight
FROM    posts AS p
        JOIN recent_votes AS v ON p.Id = v.PostId
        LEFT JOIN dbo.Users AS uv ON v.UserId = uv.Id
        CROSS APPLY (   SELECT  Title, CreationDate, OwnerUserId
                        FROM    dbo.Posts p2
                        WHERE   p2.Id = p.Id ) _
        _.Title, _.CreationDate, _.OwnerUserId

The new query performs 800,000 reads but it only takes 1.2 seconds to execute and it goes parallel (1.9 total CPU seconds are used). This is a big improvement over the first query, and I’ll explain how we got there.

What makes the CROSS APPLY faster?

This query has a lot of differences from the first query. Let’s break down what’s going on in here.

We’ve split out the core of the query into two CTEs. In this case, we’re using the CTEs as optimization fences. Rather than let SQL Server do its own thing with optimization, we’re tricking it into providing the optimization path that we want – recent_votes and posts will be optimized separately. Because of this optimization fence, we’ve been able to construct very narrow indexes that bring back the data that we need for each.

After tricking SQL Server into optimizing our CTEs separately, we use CROSS APPLY to pull back the details we want based on a primary key lookup. This helps our query run faster – we need fewer pages allocated for our indexes which means that SQL Server estimates that less memory will be used. The results speak for themselves, too – the second query performs 1/4 the I/O of the first query.

What’s the Result?

The end result is that we have query that’s fast. Our indexes are free of bloat caused by wide INCLUDE columns.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Careful Testing the 2014 Cardinality Estimator with Trace Flags

When you migrate to SQL Server 2014, rather than turning on the new cardinality estimator right away, I recommend sticking with the old cardinality estimator initially by leaving your database in SQL Server 2012 compatibility mode. Let the database settle down for a few weeks, let folks air out their grievances, and give yourself some time to make sure the system is working at least as well as the old system.

Once you’re on 2014, you can check out the new CE’s execution plans by playing around with trace flag 2312. That flag turns on the new CE in one of two ways:

  1. At the query level using OPTION (QUERYTRACEON 2312), or
  2. At the session level using DBCC TRACEON(2312)

The latter is easier if you have to test a lot of queries at once, or if you need to test a stored procedure. Just turn on the trace flag, and it’s on for only your session.*

* – I lied. It’s not.

If you set trace flag 2312 for your session, and then run a stored procedure whose execution plan isn’t already cached, that stored procedure gets compiled using the new cardinality estimator, and the resulting plan is put into cache.

Subsequent sessions will use that plan even if they don’t have trace flag 2312 enabled.

Here’s how to prove it on a SQL Server 2014 server in a 2012 (or earlier) compat mode database:

  1. Pick a rarely-executed stored procedure. (If you’re doing this in production, don’t use a stored procedure that will cripple your server if it happens to get compiled with the 2014 CE.)
  2. Recompile it using sp_recompile ‘myprocname’.
  3. In a new SSMS window, run DBCC TRACEON(2312) to enable the new CE in your session.
  4. Click Query, Include Actual Query Plans.
  5. Execute the stored procedure.
  6. Right-click on the query plan and click Show Execution Plan XML.
  7. In the first few lines, look for the string “CardinalityEstimationModelVersion” – if it’s 120, that’s the new CE. 70 is the old one. You’ll see 120 as shown in the example below.
  8. Close that query window, and start a new one. Run DBCC TRACESTATUS to verify that you don’t get a line for 2312, indicating you’re still on the old CE.
  9. Make sure Query, Include Actual Query Plans is on.
  10. Execute the stored procedure again.
  11. Right-click on the query plan and click Show Execution Plan XML. Even though you’re in the old compat mode, you’ll be looking at a 120-level CE plan.
  12. Recompile that stored proc using sp_recompile ‘myprocname’ just to make sure it gets out of cache.
Execution plan XML - note the CardinalityEstimationModelVersion at the top right.

Execution plan XML – note the CardinalityEstimationModelVersion at the top right.

So how do you test stored procedures with the new CE?

You could test on a different server with the databases set to the new 2014 compatibility level, but of course when you’re testing execution plans and query performance, the servers need to be identical. That’s not always possible.

If you have to test both the old & new CEs on the same server, you’ll need to create new stored procedures (like myprocname_2014) with OPTION (QUERYTRACEON 2312) on every single query in the stored proc – or, at least, all of the queries where you’re concerned about performance variances.

Kendra says: Well, there’s always plan guides. Kidding! I’M JUST KIDDING!

Features SQL Server Needs to Add (And Drop)

When you work with the same database server day in and day out, you can get a bit blinded by your assumptions. You might just assume that the way SQL Server does it is how every database does it.

But check out these features from other database platforms:

Cache query results – SQL Server only caches raw data pages. If you join ten tables together and do an ORDER BY on the results, SQL Server doesn’t cache any of its work tables or the final results. It rebuilds that wheel again every time the query runs – even if the data hasn’t changed, and even if the data is read-only. Oracle’s SQL Result Cache does just what it says on the tin.

Transactional DDL – When you need to change a bunch of database objects as part of a deployment, better have your prayer beads handy. If any change goes wrong, especially in a long, complex script with no error handling, you can find yourself with a mess of a database where some changes worked and others didn’t. PostgreSQL’s transactional DDL support lets you start a transaction, make your table changes, and then only commit if they all worked.

Warm up the cache automatically – Restart a SQL Server or fail it over to another cluster node, and performance starts over from scratch: all queries will hit disk because nothing’s in the buffer pool yet. PostgreSQL’s pg_prewarm and pg_hibernator let you write the buffer pool to disk during a graceful shutdown, and warm up the cache on other nodes before you fail over so that they’re ready to fly.

Refresh indexed views on your terms – SQL Server’s indexed views are persisted to disk, but they’re always updated in real time with every delete/update/insert of the underlying tables. While that sounds like a good thing, it’s not a great idea for data warehouses where we want the fastest possible loads, or reporting tables for an OLTP system. Oracle’s materialized views can be refreshed on a scheduled basis. Need more up-to-date data? Just refresh ’em. Need to hold off while you run a database-intensive Black Friday sale? No problem, leave the views as-is.

Like these? I’ve got more. In a fun, relaxed webcast on Tuesday, June 23rd, I’ll will show you a few features from other database platforms, plus give you a tour through my favorite requests. I’ll also discuss features that just didn’t catch on – and probably need to be taken out to the farm. We’ll talk through the whole thing live over Twitter, too.

Register for the webcast now.

How to Set Up Standard Edition AlwaysOn Availability Groups in SQL Server 2016

SQL Server 2016 adds Availability Groups in Standard Edition. While the licensing details can obviously change right up to release date, here’s what was announced at the Ignite conference:

  • Limited to two nodes only (a primary and a secondary)
  • Like mirroring, you can’t read from the secondary, nor take backups of it
  • But like database mirroring, you can take snapshots of the secondary for a static reporting copy
  • Each database is in its own Availability Group

In CTP2, this is referred to as a Basic Availability Group.

Yes, a BAG. I can’t make this stuff up people. But here’s what I can make up:

  • If you use HammerDB for load testing of a BAG, that’s a BAG of Hammers
  • If you have ten databases, each in its own BAG, that’s a Dime BAG
  • It could be worse – it could be a Database Basic Availability Group, or a D-BAG

Enough laughing – let’s implement it. For this scenario, I’m already in production with one SQL Server 2016 machine, and I’d like to add a second one to give me some high availability or disaster recovery protection.

I’m setting up a simple AG with two standalone machines:

  • SQL2016A – this will be my primary instance where my databases will normally run.
  • SQL2016B – this will be my secondary, with less horsepower. Could be in the same data center, or in a different one. (To keep things simple for this explanation, I’m going to stick with the same data center and the same IP subnets.)

This checklist will focus on the Availability Groups portion, not the SQL Server installation, so go ahead and set up your second SQL Server using our SQL Server Setup Checklist. Make sure the second one uses the same user database data/log drive letters and paths as your primary server. If you use different paths, then whenever you add a data or log file to an existing database, it will succeed on the primary but then fail on the replica where the drive letters are different. (Only the user database paths have to match – not TempDB, system databases, binaries, etc.)

In theory, you might be able to do this in production without a major outage.

In practice, I wouldn’t bother trying – you’re probably behind on Windows Updates, and you need to apply these prerequisite Windows patches. Because of that, I’d schedule a separate (long) maintenance window to get up to speed on Windows patches, the latest SQL Server updates, and do the first few steps. Even when you rehearse it in a lab, it’s not unusual to run into several restarts-and-check-again-for-updates in the wild, or brief outages to

Let’s get started!


“But wait, Brent – I don’t want to build a cluster!” Unfortunately, SQL Server’s AlwaysOn Availability Groups is built atop Windows Failover Clustering Services, and you have to join your AG replicas into a single Windows cluster.

Fear not – this is nowhere near as scary as it sounds, and it can be done after-the-fact to an already-in-production SQL Server. (I’d recommend you set up a lab or staging environment first to get used to it, though.)

After installing those clustering prerequisites, open Server Manager and click Manage, Add Roles and Features, select your server, and on the Features list, check the Failover Clustering box:

Adding the clustering feature

Adding the clustering feature

You can move on to the next step after this feature has been installed on all of the servers that will be involved in the Availability Group.

Validate your cluster candidates.

Open Server Manager and click Tools, Failover Cluster Manager. (I know – you thought Failover Cluster Manager would be under Manage instead of Tools, but it’s these clever switcheroos that keep us all employed as IT professionals.) Click Validate Configuration on the right hand side to start the wizard:

Validate a Configuration Wizard

Validate a Configuration Wizard

Put in the names of all the servers that will be participating in the Availability Group, and hit Next. It’s going to ask if you want to run all tests – this is another one of those gotchas where, in theory, you could do this in production without an outage if you chose to run specific tests. I don’t recommend that – just run all tests. If SQL Server is running on any of these nodes, it’ll likely stop while this wizard runs, so do this during a maintenance window.

The validation wizard can take several minutes depending on the number of replicas, environment complexity, and phase of the moon.

Your cluster will probably fail, and that’s okay. This isn’t like school. Nobody has to know if you fail several times. (Chemistry was hard for all of us.) Even if you think you passed, you probably failed a portion of it. (The wizard, not chemistry.)

Click the View Report button, and Internet Explorer will launch because why wouldn’t we take every opportunity to open a browser on a production server:

1998 called, and they want their web developer back

1998 called, and they want their web developer back

You’ll be tempted to scroll down through it because that’s how most web pages work. This is not most web pages. Click on the first error – in my case, Network – and then click on the first warning it takes you to – in my case, Validate Network Communication – and then read the warnings in yellow:

Validation warnings

Validation warnings

This is where things get a little weird.

Some errors are completely okay to ignore. The most common two are:

  • Single network interface warnings – if you’re doing network teaming or virtualization, you’re probably protecting your network through means that the clustering wizard can’t detect.
  • No disks found for cluster validation – if you’re building an AlwaysOn configuration with no shared disk, like failover clustering on a UNC path or a pair of Availability Group replicas with local disk, this isn’t a problem.

But otherwise, read this report really carefully and fix the warnings before you go on to the next step. For example, if the two nodes have different Windows patch levels, get the missing updates installed first. (If you’re having difficulty getting some of the patches to apply, there’s your first sign that this isn’t a good candidate for a reliable AlwaysOn implementation – more like OftenOn, or OffAndOn.)

After you fix the issues, run the validation wizard again. This is important because the last validation report is saved in Failover Cluster Manager, and you always want that report to look good. When you call Microsoft (or a team of highly skilled consultants) for Availability Groups help, they’re going to start by looking at this validation report, and if it didn’t pass last time, that’s going to be a bad start.

Create the Cluster.

After a successful validation report, leave the “Create the cluster now” checkbox checked, and click Finish – your cluster will be created:

Creating the cluster

Creating the cluster

This checkbox is particularly diabolical – it will be checked by default as long as you didn’t have any heinously ugly showstopper errors, but even with mildly serious errors, it will still let you create the cluster. Make sure you fix the errors first.

You will be prompted for a cluster name – again, a little diabolical here, because this is not how you will address SQL Server. This name is how you will control the cluster itself:

Picking a cluster name

Picking a cluster name

Think of your cluster like a kitchen – a restaurant can have lots of cooks, but it will have only one head chef. Your head chef knows who’s doing what at all times, and bosses people around. The head chef never actually cooks anything – he just tells other people what to do.

Your cluster can have lots of SQL Server instances in it, and they can have lots of Availability Groups. We need one controller to know what’s going on, and that’s the cluster name.

In my case, I’m using SQL2016CLUSTER. You’ll notice that in my screenshot, it says “One or more DHCP IPv4 addresses were configured automatically” – don’t do that in your production environment. Use static IP addresses. You always want to know where the head chef is.

The next screen will prompt you to add all eligible storage to the cluster – uncheck that unless you’re building an AlwaysOn Failover Clustered Instance.

In a perfect world, the wizard finishes, your cluster is completed successfully, and your Failover Cluster Manager looks something like this:

Cluster Mangler

Cluster Mangler

Pay particular attention to the “Witness: None” line. Just like any good preacher, your cluster needs a witness, so right-click on the cluster name on the left side of the screen, click More Actions, and Configure Cluster Quorum Settings. In the next wizard, choose “Select the quorum witness” and choose either a disk (shared drive) or file share (UNC path) witness.

Choosing a quorum method deserves its own webcast, and luckily, Kendra Little has recorded a video on that very topic.

For my simple two-server configuration, I need a really reliable tiebreaker, so I’m going with a file share witness. I need it hosted on my most reliable server possible – in my simple lab, that’s one of my domain controllers, so I type in the server name and then click the Show Shared Folders button:

Picking a file share for quorum

Picking a file share for quorum

Click the New Shared Folder button, configure your file share, and exit the wizard. What I just did there is right for my lab, but only my lab – you’ll want to make some better-informed decisions in your production environment. For example, if all of my SQL Servers lived on virtual machines backed by NetApp storage, I’d probably use a file share right on the NetApp itself, because if that thing goes down, all my SQL Servers are hosed anyway.

Enable AlwaysOn Availability Groups on Each Server

On each replica, launch SQL Server Configuration Manager, then go into SQL Server Services, right-click on the SQL Server service, and click Properties.

Service Properties

Service Properties

On the AlwaysOn High Availability tab, check the box to Enable, and click OK. This won’t take effect until the SQL Server service is restarted. Repeat this on all of the SQL Servers.

Now let’s take a moment to reflect: we’ve got a pair of Windows servers that are in the same cluster, but at this point, there are no dependencies between the SQL Server services running. They can be hosting different databases, doing log shipping or mirroring between them, or have absolutely nothing in common. If you’re planning your first AG in production, the work done up to this point is a good amount of work to tackle in one maintenance window. I would schedule all of this for the first outage, and then let things simmer down for the next week or two before assembling the Availability Group components.

Create the Availability Group

Whew! We’re finally done with the Windows portion of setup. Let’s fire up SQL Server Management Studio and have some fun.

On my SQL Server, I have a copy of the database that I want to copy over to my secondary replica. It’s already in full recovery mode, and I’ve been taking full backups, so it already meets the lengthy AlwaysOn Availability Group prerequisites.

In Object Explorer, right-click on AlwaysOn High Availability and click New Availability Group Wizard. Starting in SQL Server 2016, we’ve got a few new options:

Availability Group Wizard

Availability Group Wizard

As of this writing, the AG documentation hasn’t caught up with the product yet, so here’s an explanation of each option:

Basic Availability Group – terminology can always change leading up to release, but as of CTP2, this functionality is called a Basic AG. I would just like to take another second to point out that the acronym is BAG.

Database Level Health Detection – in Super Availability Groups (as opposed to Basic – look, if they can make up stupid acronyms, so can I) with multiple databases, the SAG doesn’t drop if one database happens to go offline, become corrupt, or take the day off. New in SQL Server 2016 is the ability to force that failover if any one database goes down, so the SAG stays up.

Per Database DTC Support – The Distributed Transaction Coordinator (DTC) enables simultaneous commits across multiple SQL Servers hosted on different hardware. Prior to SQL Server 2016, AG databases didn’t support DTC participation, but now, just check this box.*

* – You’ll probably have to do more than that, but hey, the infrastructure is here.

Since we’re doing Standard Edition, we’ll just check the Basic Availability Group box and move on. Check the box to pick which database you want in your BAG, and then it’s time for replica selections. On this screen, click the Add Replica button and connect to your new soon-to-be secondary:

Choosing your replicas

Choosing your replicas

Here’s what the choices mean, and I’m going to explain them a little out of order:

Synchronous Commit – say SQL2016A is the primary, and a user does a delete/update/insert there. If I check the Synchronous Commit boxes for SQL2016A and SQL2016B, then SQL2016A will write the user’s transaction locally, send the transaction over the network to SQL2016B, write it to disk there, and then wait for confirmation back before telling the user that the transaction is committed. (Most of the time. If SQL2016B drops offline, that’s another story.)

If I don’t check the Synchronous Commit boxes, then I get asynchronous transactions – the SQL2016A primary transaction will succeed right away, and SQL2016A just sends the transactions over to SQL2016B later. SQL2016B can be seconds (or minutes, or hours) behind, so I’ll lose data on failover, but the primary’s transactions won’t be subject to that killer latency.

Automatic Failover – if the Synchronous Commit boxes are checked, then you can do automatic failover between two of the replicas. (This is where the cluster witness starts to come into play – in my two-node setup, SQL Server needs a tiebreaker to know the difference between single node failures and a split-brain scenario.)

For my evil purposes, I only need a disaster recovery copy of my databases, and I don’t want to fail over to DR if my primary just happens to restart or hiccup. I’m going to leave the Automatic Failover and Synchronous Commit boxes unchecked.

Readable Secondary – in Super Availability Groups, you can offload reads and backups to your secondaries. Since I’m dealing with a BAG here, the Readable Secondary dropdown only offers the “No” option.

The tabs for Endpoints, Backup Preferences, and Listener don’t really matter here. SQL Server automatically creates the endpoints for you, and you can’t offload backups or create a listener for BAGs.

The next step in the wizard syncs the data between replicas:

Data synchronization options

Data synchronization options

The Full option performs one full and one transaction log backup on the primary, then restores them over to the secondary to start the replication. (Keep in mind that if your databases are large, you’ll want to set database backup compression on by default in order to speed this process – you don’t get to choose compression settings in the wizard.)

The other two options are useful if you performed the backups yourself manually, or if you’re using storage replication technologies to get it started.

If you choose the Full option, you can track backup and restore status with sp_WhoIsActive’s percent-completed column.

After the AG wizard finishes, you can right-click on your Availability Group in Object Explorer and click Show Dashboard:

Availability Group Dashboard

Availability Group Dashboard

Voila! Papa’s got a brand new BAG.

Announcing Live Query Execution Plans


Live query execution plans

Live query execution plans

This is new in the preview of SQL Server Management Studio 2016, available for download now, and available for playing-around at the TechNet Virtual Labs. It even works when you’re connected to SQL Server 2014 SP1 – but not 2012, because it relies on DMVs that were only shipped with 2014 SP1.

It’s so cute to watch it working on multiple indexes at once:

A couple of operators

A couple of operators

And the bigger your plans get, the more fun it is to watch. I’m only showing 600-pixel wide images here though to make your blog reading easier, but here it is working with multiple statements in a batch:


Works with multiple statements in a batch

It’s not perfect – if SQL Server believes only a few rows will come back, but its estimates are wrong, then the operator will keep showing 100%. Here’s an example:

Live plan with incorrect statistics

Live plan with incorrect statistics

The top right operator starts first, and SQL Server only expects 1 row for this clustered index scan. As it turns out, there’s more data – note that the line leaving the scan keeps moving, indicating we’re still getting data.

Interestingly, the clustered index seek immediately underneath it keeps changing its time numbers. It’s as if SQL Server Management Studio is saying, “Okay, we started this now and – wait, no, hang on, not yet, now. Okay now we’re – no, maybe not yet.”

The catch (because you knew there was a catch)

Great news – it even works when connected to a server running SQL Server 2014 SP1, as long as you’re using the CTP version of SQL Server Management Studio 2016!

Bad news – you can’t just view someone else’s live query plan. The live query stats plumbing has to be turned on before the query starts.

Here’s how to start that plumbing – choose any one of the following:

  • In SSMS vNext, before you run a query, click Query, Include Live Query Statistics
  • In any SSMS, before you run a query, run SET STATISTICS XML ON or SET STATISTICS PROFILE ON
  • Start an Extended Events session and enable the query_post_execution_showplan extended event. For example, use sp_BlitzTrace™ with @TraceExecutionPlansAndKillMyPerformance = 1. Danger: this enables live query stats on all sessions, and as you can guess by how Kendra named the parameter, there’s going to be a big performance hit for that on a busy server. (There’s a Connect request about that already.)

I know. You want the capability to get there from something like sp_WhoIsActive, but it’s not there. Look, it can’t be all candy and roses. They have to save something for the next version.

How to Troubleshoot SQL Server Connection Timeouts

If your application can sometimes connect to SQL Server – and sometimes not – it can be really tough to troubleshoot. If SQL Server doesn’t even hear the call, it can’t log any errors.

Here’s the questions I ask to get to the root cause:

When it happens, does it happen to all applications? For example, do you have monitoring tools pointed at the SQL Server, and are they able to consistently connect to SQL Server even when the problem is happening?

Does it happen to all application servers? If you have several app or web servers, are they all affected? (If you’ve only got one, now is an excellent time to set up another one for troubleshooting, and balance the load between them.)

Are all queries in the application affected, or just some queries? Sometimes I see long-running queries keep right on going, but only new connections are affected.

Sometimes, your SQL Server just goes on vacation.

Sometimes, your SQL Server just goes on vacation.

Are there any errors logged in the SQL Server or application servers? In one case, we saw that all of the application servers lost network connectivity at the same time, on a regular basis. Turns out there was a bad switch involved.

Is there a pattern to the days/times of the timeouts? Start writing them down or documenting when they happen. For example, in one case, we saw that the days/times were exactly correlated to the security team’s regularly scheduled port scans.

During the timeouts, is the app server able to ping the SQL Server? When all else failed with one troubleshooting triage, we put a free network monitoring tool on the app server to ping the SQL Server every 10 seconds. Sure enough, the next time the app had query timeouts, we were able to prove that even pings weren’t working – thereby ruling out a SQL problem.

Ask those questions, and sometimes you don’t even have to troubleshoot SQL Server at all – the answers tell the whole story.

Do Foreign Keys Matter for Insert Speed

Do you have the key?

Do you have the key?

Do foreign keys hurt the performance of inserts? Okay, we all know that foreign keys do require some work, but the crazy people of the internet have wildly varying guesses as to just how much work is involved. Estimates varied from “barely any” to “as much as 50% slower”.

I figured that you were going to do the work, so I might as well do it.

How did the test work?

I created two tables – parent and child. The parent table has an int primary key and a fixed width filler column. I varied the size of the filler column across multiple test runs, just to make sure I wasn’t biasing the test by using an artificially large parent table. The testing happened with a parent table that had 2 rows per page, 4 rows per page, and 8 rows per page.

You can find the test code in fk.sql.

Testing SQL Server insert speed with foreign keys

The test code inserts 1,000,000 rows in batches of 5,000 rows. Although this isn’t strictly realistic, it’s better than timing a single batch of 1,000,000 inserts.

What happens during the test?

It turns out that no matter what the page size it takes approximately 40ms to insert 5,000 rows with the foreign key in place. When we remove the foreign key, inserting 5,000 rows takes ~26ms.

Although the difference looks big at 5,000 row batch sizes, each insert is taking, on average, 0.0072ms with the foreign key and 0.0052ms without the foreign key. These differences are hardly worth noting. And, let’s face it, 2 microseconds is a pretty minor price to pay for data integrity.

How much do foreign keys affect single row inserts?

I ran the test one last time with a batch size of 1 row, just to see how much difference there was in insert performance.

FK present? duration
yes 253,896 ms
no 241,195 ms

When it comes down to single row performance, the difference is neglible. We’re spending all of our time waiting for other parts of the system.

How real is this test?

It’s not terribly real, but it’s real enough to say “Yes, foreign keys have overhead” and follow that up with “No, you shouldn’t care, just add the keys.” The tests were done in VMware and the hardware in question is mid-2012 Retina MacBook Pro.

In the real world, we’ve got a lot more than inserts happening, but it’s worth quantifying the cost of a foreign key lookup and realizing that it’s worth having foreign keys.

Kendra says: Wanna know if your foreign keys are ‘trusted’? Check out our script here.