I’m going to teach the world about SQL Server indexes, one event at a time. I’d love to have you join me at these three upcoming events — they’re all different, hope to see you there!

Take the SQL Server Index Quiz!

Thursday, September 17, 2015, 5 PM Pacific
Part of the free 24 Hours of PASS event
How well versed are you on SQL Server indexes? Join this free session to take a quiz on SQL Server indexes to test your knowledge on clustered indexes, nonclustered indexes, filtered indexes, and more! Register over at the 24 Hours of PASS website.

And by the way, if you haven’t taken our free quiz on SQL Server indexing, now’s the time. Take the quiz here, it won’t spoil the webcast.

Top (3) SQL Server Index Tuning Questions

Monday, September 28th, 10 am Pacific
Free webcast hosted by the clever robots that keep Brent Ozar Unlimited® running

Got questions about how to design indexes for performance in SQL Server? You’re not alone! Join Kendra to get answers to the top questions she gets about indexes. You’ll learn how to decide on key column order, how to tell if you should drop an index, whether it’s bad to use the clustered index key in your nonclustered index definition, and more. Bring your best index questions for Q&A and discussion! Reserve your spot now.

SQL Server Index Therapy Session – One Day Pre-Conference Event at the SQL PASS Summit, 2015

Tuesday, October 27, 2015 in Seattle, WA
You want to design indexes to improve SQL Server’s performance, but you’re concerned that you might create too many, not enough, or just plain “wrong” indexes. In this session, you will learn how to design effective clustered and nonclustered indexes to speed up your queries. You’ll tackle index design challenges to learn how to choose the right key and included columns and how to consolidate duplicate indexes. You’ll get practical tips to recognize when to use filtered indexes and indexed views and how to avoid the gotchas and pitfalls that make indexes slow down your queries. If you have two years of experience writing queries for SQL Server, this session will put you in control of your next index-design project.

This one-day session is $495, and full conference registration isn’t required. Reserve your spot on the PASS Summit website.

Need video training, or in-depth training?

  • Brent teaches indexes in his “How to think like the Engine” 90 minute online video ($29)
  • Kendra teaches you “How to Tune Indexes in SQL Server” in 6+ hours of online content ($299)
  • Kendra and Jeremiah teach a full week of “Advanced Querying and Indexing” ($3995)
0 comments ↑ Back to top

In theory, when you configure AlwaysOn Availability Groups with synchronous replication between multiple replicas, you won’t lose data. When any transaction is committed, it’s saved across multiple replicas.

Except when it’s not. Read the manual carefully:

If primary’s session-timeout period is exceeded by a secondary replica, the primary replica temporarily shifts into asynchronous-commit mode for that secondary replica.

That’s right – your sync secondary becomes asynchronous.

Automatically. Without warning you. And you can’t control it. You can’t say, “I only want you to accept transactions as long as data is being copied to another replica.” For more details on what causes a sync replica to drop into async, check out the BOL page on synchronous commits.

Bottom line – you can’t actually guarantee zero data loss with AlwaysOn Availability Groups. I love AGs, but they’re much, much more complex than they look at first glance.

5 comments ↑ Back to top

Enter Query Store

Query Store, in short, is a way to track query performance over time. In Microsoft’s words, “The feature automatically captures a history of queries, plans, and runtime statistics, and retains these for your review”. It’s like sys.dm_exec_query_stats but it persists across reboots! And it has execution plans!

The Query Store is a great feature for on premises SQL Server backed applications, but it adds significant value in Azure SQL Database. Instead of potentially losing execution history and execution plans, developers can see the history of query performance and execution plans across reboots.

Enabling Query Store in Azure SQL Database

Point SSMS at your Azure SQL Database instance and run:


No, really, it’s that easy.

How Do I Really Know Query Store is Enabled?

Figuring out if Query Store is enabled is pretty easy, too. All you need to do is look at sys.database_query_store_options. The following query should give you exactly what you need to know about the Query Store:

SELECT  desired_state_desc ,
        actual_state_desc ,
        current_storage_size_mb , 
        max_storage_size_mb ,
FROM    sys.database_query_store_options ;

As long as things are running correctly, the desired and actual states should read either ALL or AUTO. For more information about using Query Store, check out Monitoring Performance by Using the Query Store.

The Takeaway

Azure SQL Database instances can be rebooted at any time. It’s important to keep this in mind and plan accordingly, not just in the application but also in how you approach tuning your T-SQL. You should enable the Query Store in Azure SQL Database instances where you care about performance.

3 comments ↑ Back to top

It’s tempting to think that table partitioning will improve query performance. After all, it’s an Enterprise Edition feature– it must have a lot of magic, right?

Table partitioning does have magic for the right situations. It shines when you want to add a large amount of data to a table or remove a large amount of data in chunks! The “partition switching” feature can essentially help you avoid doing inserts and deletes against a live production table, and it pairs beautifully with cool new features like ColumnStore indexes. For things like fact tables in a data warehouse, it’s drool-worthy.

But for other uses, table partitioning can be a lot of work to implement and it can slow your queries down. And it may not always be obvious to you why your queries are slower. Let’s take a look at a simple example.

Let’s partition the Posts table

We’re using the StackOverflow sample sample database. The Posts table contains questions, answers, and other types of posts, each of which has a different PostTypeId.

Let’s say that our workload does many point lookups on a post based on an integer Id column in the table. We have some aggregate queries that run looking for top scoring questions, recent answers, and things like that, and most of those queries specify PostTypeId. We might be tempted to partition the table on PostTypeId.

For comparison purposes, we create two tables with the exact same columns, data types, and rowcounts:

  1. dbo.PostsNarrow has a unique clustered index on the Id column
  2. dbo.PostsPartitioned is partitioned on PostTypeId and has a unique clustered index on Id, PostTypeId

For queries that do specify PostTypeId, we’re hoping that we can get partition elimination. (This can be tricky sometimes due to bugs with min/max syntax, or weird implicit conversion issues, or even optimizer issues with joins). But performance can also be tricky for simple queries which don’t specify PostTypeId, even when partition elimination isn’t an issue.

Our example query

Our example query pulls the people who are most active– they write the most questions and answers. Here’s the version of the query that runs against PostsNarrow:

WITH freq AS (
    COUNT(*) as PostCount
FROM dbo.PostsNarrow 
GROUP BY OwnerUserId
FROM freq
JOIN dbo.Users u on u.Id=freq.OwnerUserId;


We have the following index on both PostsNarrow and PostsPartitioned. (The included columns are to support other queries.)

CREATE INDEX ix_Posts_OwnerUserId_INCLUDES on dbo.PostsPartitioned (OwnerUserId)
    INCLUDE (PostTypeId, AcceptedAnswerId, CreationDate, LastEditorUserId)

Run against the non-partitioned table, this query uses 3344 ms of CPU time and has an elapsed time of 1039 ms.

Just changing the query to run against dbo.PostsPartitioned, the query goes to 5672 ms of CPU time and the elapsed time is 2437 ms– it’s twice as slow!

The secret is in the execution plans

Comparing the plans for the two queries, we can see that SQL Server is doing something different inside the CTE where it groups by OwnerUserId.

Non-Partitioned Plan Excerpt (showing actual row counts):

Query 1 excerpt

Partitioned Plan Excerpt (showing actual row counts):

Query 2 excerpt

The Non-Partitioned plan is able to stream the data from the index scan directly into a stream aggregate operator to do the group by OwnerUserId. The partitioned plan has to repartition the streams, then it has to put all the data in a Hash Match aggregate and compare all the buckets.

The hash match aggregate is more work

The hash match operator is what’s slowing this query down– it requires a larger memory grant and it has to do more work. But SQL Server has to use it for our query because our non-clustered index on OwnerUserId is partitioned.

In other words, the data in PostsPartitioned’s OwnerUserId index is like this:

  • PostTypeId=1
    • All rows ordered by OwnerUserId for this PostTypeId with included columns
  • PostTypeId=2
    • All rows ordered by OwnerUserId for this PostTypeId with included columns
  • PostTypeId=3
    • All rows ordered by OwnerUserId for this PostTypeId with included columns

Any given OwnerUserId can be in multiple partitions, and we have to find them all. That’s why we can’t use the stream aggregate operator.

Can’t we create a non-aligned index?

We can! We have the option to create a non-aligned (non-partitioned) index on dbo.PostsPartitioned’s OwnerUserId column. Since it’s not partitioned, everything is purely sorted by OwnerUserId and we can get the stream aggregate back.

But, wait a second. Let’s pause and think about this.

The whole point of using table partitioning was to partition things. Creating a bunch of non-aligned indexes is tantamount to admitting defeat: we can’t get partition elimination against those indexes for queries that DO use the partitioning key. (There are other downsides, too, like preventing partition switching.) And we will get into a whole different set of problems if we have to keep partitioned and non-partitioned copies of the same index.

This is just a simple example. What if it were more complex?

I’m going to let you in on a little secret: it took me a bit of time looking at this example to figure out why query 2 was slower. There were several red herrings that I fixed before writing this post regarding statistics differences between the tables that distracted me for a few minutes more time than I’m willing to admit.

And I had to think back to some of Jeremiah’s modules in our Advanced Querying and Indexing course to remember the difference between those two operators.

And the first time I created a non-aligned index on the partitioned table to verify I could get the stream aggregate operator back, I accidentally created it as aligned at first and got more confused.

Admittedly, I enjoy this kind of thing and am saving another example of a query that goes from 1 ms to 11 seconds for a treat to dig into and write a fun demo with, but most people would rather just keep the 1ms query and not worry about it!

Let’s sum it up

Here’s what to take away from all of this:

  • Table partitioning can make some queries slower
  • Partitioning adds complexity to troubleshooting and tuning queries– it’s time consuming and takes longer
  • Non-aligned indexes aren’t an easy answer: if we need to resort to a lot of them, how much are we actually benefitting from table partitioning?

For these reasons, table partitioning is typically not a great fit for SQL Servers with an OLTP pattern where slow queries are the biggest pain point. Traditional index tuning and query rewrites will usually get you better performance with less hassle.

Wanna learn more about indexing?

3 comments ↑ Back to top

After reading through yesterday’s announcements about VMware vCloud Air SQL, here’s a quick boiled-down version for busy people with jobs:

  • It’s Microsoft SQL Server 2008R2 and 2012 Enterprise Edition hosted in VMware’s data centers (not on-premises.)
  • Only 3 instance sizes are available: small (2vCPU, 8GB RAM), medium (4vCPU, 16GB RAM) and large (8 vCPU).
  • Drive space is limited to 150GB today (including data, log, and tempdb) but will be 2TB at official launch.
  • Licensing is not included yet – you have to bring your own EEs. (Unfortunately, due to the way MS License Mobility works, that means you have to move your licenses to vCloud Air SQL for a minimum of 90 days. Not many of us have spare licenses for this type of thing.)
  • The instance will need a public IP address if you want to access it from outside of VMware’s data centers. (See the “secure access” section on page 3 of the Solutions Brief PDF.)
  • The instances appear to be single virtual machines – the only high availability is the VMware host protection. (I can’t find mention of a SLA for data loss.)
  • You get a dedicated VM (not a dedicated host), but you can’t access the VM. You can only access SQL Server through your chosen port (typically 1433).
  • To get your data into vCloud Air, you’ll be pushing it through ETL products or insert statements. Restores are not supported.

In summary, if it was announced three years ago, it still wouldn’t have even competed with Amazon RDS for SQL Server back then – here’s our initial 2012 post about the RDS release. Amazon has more options for instance sizing, drives, licensing, connectivity, high availability – heck, even database platforms, because AWS supports MySQL, Oracle, SQL Server, and even their own flavor, Aurora.

I don’t get why VMware would enter this market, at this time, with this product.

2 comments ↑ Back to top

Some of my consulting lines are all about context. Let’s just jump right into a project kickoff meeting with an architect and a few developers:

The Conversation

Me: “So, what pains is this SQL Server having?”

Phil I. Buster: “When I first started out here, we had SQL Server 2005. The app only had a couple of users back then, so as we added functionality, we decided to keep the business logic in the database. It wasn’t too bad at first – ”

Me: “Ballpark, when was this?”

(The rest of the employees in the room give subtle winks at each other and elbow each other, knowing that Phil is getting ready to tell the application’s life story, and you can tell they’ve heard it a million times.)

Phil: “2005. And about six months into the project, we lost the key architect who’d initially designed the stored procedures. Here’s how the database looked in 2006 before we decided to rewrite it.” (Phil walks over to the wall and points at a Visio diagram taped together from tens of yellowing pieces of paper.)

Developer: “I’m not sure this is really the best use of Brent’s time…”

Me: “No no, Phil, keep going.”

What That Line Does

Things to keep in mind when using this consulting line

Things to keep in mind when using this consulting line

Put your pen down, because otherwise this line will make you want to jam it into your ears.

It’s really tempting to remind Phil of your hourly rate, or try to aggressively steer him back toward your original question, or playfully ask if everyone else wants to take a bio break while Phil brings you up to speed.

But here’s the thing – there are some people who really need to share their scarred emotional history on a project before they can face the current problems. Sometimes they want to make sure it’s clear that they’re not to blame for the issues, or that they were doing the best job they could, or that they’ve tried everything in the book over the last few years.

If you try to cut Phil off right away during the beginning of the relationship, it sets the tone that you don’t care about Phil’s work, input, or feelings. You have to let him get his story out and trust that eventually, his story will arrive at the present day.

What Happens Next

You listen attentively and patiently, and you never roll your eyes at the other attendees. (You don’t have to – they’re going to understand that you’re a patient saint, even if you don’t give them the slightest verbal clue that you’re just as frustrated as they are.) Whenever Phil mentions something that might be relevant to today’s pains, you write that down, but otherwise, encourage him to keep right on going.

Eventually, Phil’s going to arrive at the present day.

And your psychiatry work probably won’t be over, either. During your first couple of meetings, as you open up various parts of the system’s problems, Phil’s going to see something that sparks his memory and makes him want to tell you another story. At that point, you can take a slightly different tactic and ask that we put this story in the parking lot.

Another common variation of Phil is The Guy Who Wants to Demo Every Piece of the Software Before He’ll Show You The Problem. He has to show you his application because you’ve surely never seen anything like it, and he’s doing something that no other data professional could ever have conceived. Be patient with this guy too – and sometimes, he’s actually doing something cool.

If you enjoyed this technique, check out more of my favorite consulting lines.

6 comments ↑ Back to top

SQL Server 2016 brings support for JSON. For those of you who don’t know, JSON is JavaScript Object Notation – it’s a way of representing application level objects as a string. Imagine XML without anywhere near as much line noise, and you’re pretty much there.

Our Sample Document

In this example, to retrieve orders and their line items from the AdventureWorks database. We need to get the data out in this format for fast processing. In the end, our data should look something like:

{ "orders" : 
    [ /* an array of orders */
        { "order" : 
            { "number" : "123456" ,
                /* more stuff goes here */
                "items" : 
                        { "item": { /* item info goes here */ } }

Holy cow, that’s a lot of stuff just to represent and order and the line items.

Here’s our starter query:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        d.ProductID AS [item.product_id],
        d.OrderQty AS [item.quantity],
        d.UnitPrice AS [item.unit_price],
        d.UnitPriceDiscount AS []
FROM    Sales.SalesOrderHeader h
        JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
WHERE   h.SalesOrderNumber = 'SO43659'

First Attempt

The documentation hints that we might just be able to use `FOR JSON AUTO` option to automatically format our JSON. What could go wrong?

Syntax is hard

Syntax is hard

Our results aren’t what we want at all! The structure is close, but not quite what we’re looking for.

Getting the Structure Right

We are getting all the data we want, but we’re not getting the structure that we want. Our requirements say that we need to have and `orders` array and then each `order` should be a separate item in that array. An orders line items should also be stored in a separate array.

The square brackets make an array in JSON. The curly brackets denote an object.

Using the root() function will nest our results in a root object. In this case, we can use root('orders') to create a basic orders object to hold our list of orders.

Making each item part of an array inside the order is going to be trickier. We can still use FOR JSON AUTO to get the job done:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS []
         FROM   Sales.SalesOrderDetail d 
         WHERE  h.SalesOrderID = d.SalesOrderID 
         FOR    JSON AUTO) AS items
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'

We’re almost right!

So wrong I can taste it.

So wrong I can taste it.

Well, that’s … something. To get the output we want, we’re going to have to use a correlated subquery – fans of generating XML in SQL Server may remember jumping through hoops to get the right structure out of SQL Server.

We’re on the right track, but what we really need is better formatting.

The Best Solution, For Now

Since the FOR JSON AUTO isn’t generating the JSON output that we want, we can go down a manual route and use FOR JSON PATH. If this is starting to sound like XML, you’re absolute right.

After attaching a debugger to SQL Server, Paul White shows that the JSON writer is using the old XML code under the hood:

Anyway, the SQL to generate the right JSON document:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS []
         FROM Sales.SalesOrderDetail d 
         WHERE h.SalesOrderID = d.SalesOrderID
         FOR JSON PATH
        ) AS [order.items]
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'

Check out the correct (and formatted) results!

Getting JSON out of SQL Server

It’s easy to get incorrectly formatted JSON out of SQL Server. If you do need to get well formatted JSON out of SQL Server be prepared to put in a lot of work coercing your query into the right shape. You may find that you need to do a large number of nested sub-queries, too.

6 comments ↑ Back to top

Must be Dell DBA Days:

Join us live as we performance tune – and break – SQL Servers:

  • Thursday Morning: Finding Your Slow SQL Server’s Bottlenecks in Record Time with Wait Stats
  • Thursday Afternoon: How to Prove Hardware is the Problem
  • Friday Morning: SQL Server High Availability, Disaster Recovery, and Licensing Explained
  • Friday Afternoon: Watch SQL Server Break and Explode

Register now and join in.

6 comments ↑ Back to top

“Throughout history, slow queries are the normal condition of man. Indexes which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people who don’t think bit columns are selective enough to lead index keys. Whenever this tiny minority is kept from creating indexes, or (as sometimes happens) is driven out of a SCRUM meeting, the end users then slip back into abject query performance.

This is known as “Business Intelligence.”

–Bobby Q. Heinekens

Fake quotes and people aside

Let’s look at a scenario where you have a BIT column that’s fairly selective, and perhaps the rest of your predicates are ranged. This isn’t so out of the ordinary, especially because people like to know when stuff happened and how many times it happened.

“How many lunches have I eaten today?”

“Where did that bear learn to drive?”

“Am I being slowly disassembled on a molecular level by millions of tiny black holes?”

Yes, China or Florida, and Probably!

So let’s figure this out quick

;WITH E1(N) AS (
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
        IDENTITY (BIGINT, 1,1) AS [ID] , 
    ABS(CHECKSUM(NEWID()) / 100000000) + 1 AS [CustomerID],
        ISNULL(CONVERT(DATE, DATEADD(MINUTE, -[N].[N], GETDATE())),     '1900-01-01') AS [OrderDate] ,
        CASE WHEN [N].[N] % 19 = 0 THEN 1 ELSE 0 END AS [isBit]
INTO [NotAshleyMadisonData]
FROM    [Numbers] [N]
ORDER BY [OrderDate]


([isBit], [OrderDate])

([OrderDate], [isBit])

Here’s one table with one million rows in it. Since it’s random, if you run this on your own it may turn out a little different for you, but I’m sure you can adapt. You are, after all, wearing the largest available diaper size.

I’ve also gone ahead and created two indexes (neither one filtered!) to avoid the appearance of impropriety. The first one goes against the oft-chanted mantra of not leading your index with a BIT column. The other complies to your thumb-addled rules of index creation where your more unique column comes first, though not to an opposing rule to lead your index with equality predicates and then range predicates.

Only 52,631 rows out of a million have a BIT value of 1. And with the exception of the first and last date values, each date has 75 or 76 BIT = 1 columns.

If you had to do this in your head, which would you do first? Find all the BIT = 1 rows, and then only count occurrences from the desired range? Or would you Find your start and end dates and then count all the BIT = 1 values?

(Hint: it doesn’t matter, you’re not the query engine. Unless you’re Paul White. Then maybe you are. Has anyone seen them in the same room together?)

Images of Query Plans

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[isBit] = 1
AND [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2015-08-20' --All dates

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[isBit] = 1
AND [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2014-09-01' --About half the dates

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2014-09-01'
AND [namd].[isBit] = 1 -- Flipping them doesn't change anything

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[OrderDate] = '2013-09-26' --It's not until here that the other index gets used
AND [namd].[isBit] = 1
Peter Godwin will never play my birthday party.

Peter Godwin will never play my birthday party.

Put on your recap cap

This is another case where knowing the data, and knowing the query patterns in your environment is important. It’s easy to overlook or reject obvious things when you’re bogged down by dogma.

The stamp of approval for an idea shouldn’t come from blogs, forums, white papers, or hash tags. It should come from how much something helps in your environment.

Thanks for reading!

5 comments ↑ Back to top

5. Turning on auto-shrink

It might make disk space alerts go away for a minute, but it can cause lots of problems.

4. Enabling xp_cmdshell and giving the SQL Server account administrative rights in Windows

Young DBAs often do this to try to get a quick fix in place to manage backup files. Later on, they realize they can manage all those things in PowerShell or cmd SQL Agent job steps and give more limited permissions. But now they’re not 100% sure of what will break they disable the option and they live in fear of a security audit.

3. Running transaction log backups every 30 minutes, only during the day

“How often should we run these backups?” (Does internet search) “30 minutes seems popular.”

Your backup frequency should be driven by RPO and RTO. And don’t turn them off at night– usually people can still make changes in the system, and there’s no reason to have a giant log backup in the morning.

2. Looking for one magical feature that will fix all performance problems

When you first start out, it seems like all those expensive enterprise features must make everything blazing fast. Later on, you realize that humble things like indexes, memory, and carefully written TSQL are worth all the effort.

1. Being Rude to All the Developers

As a beginner DBA, it’s easy to work long hours, get stressed out, and start to feel like you’re the only one protecting the SQL Server. You start to say “no” a lot. And sometimes you lose your respect for other people.

But you can’t let that keep happening: if you go down that road, it’ll ruin your job. You’ve got to regularly step back, see things from other people’s perspective, and be part of the team.

16 comments ↑ Back to top