Common SQL Server Clustering, AlwaysOn, and High Availability Answers

Our live webcast topic this week was Q&A – you could bring any HA/DR questions, and we could avoid your answers. Just kidding. Here’s the webcast – we apologize for the audio, WebEx is still getting their act together. It cleans up after the first couple of minutes.

And here’s the questions we couldn’t get to during the webcast:

SQL Server Licensing Questions

Q: If using virtual machines and clustering / failing over at that level (not sql server) is there any reason that SQL Server Standard Edition won’t work? Someone once told us in a sql class that Enterprise Edition was necessary for this.

Answer from Brent: don’t you just love those “someone once told us” things? You’ll want to get them to tell you why. Standard Edition works fine in virtual machines. It may not be cost-effective once you start stacking multiple virtual machines on the same host, though, because you have to pay for Standard Edition for every guest.

Q: Hi, with mirroring being deprecated and AlwaysOn AG only available with Enterprise Edition – what are our HA options going to be with Standard Edition in the future? Any ideas if AlwaysOn synchronous will make it into Standard?

Answer from Jeremiah: You have a few HA choices with SQL Server 2012 Standard Edition and beyond. Even though mirroring is deprecated, you could feasibly use mirroring in the hope that something new will come out. Obviously, this isn’t a viable option. The other HA option is to use clustering. SQL Server Standard Edition supports 2 node clusters, so you can always use it for HA.

Training for Clustering, AlwaysOn

Q: Is there a good resource on setting up a lab environment for a clustering setup?

Answer from Kendra: I’m so glad you asked!

How to Manage AlwaysOn Availability Groups

Q: Did you experience or know “split brain scenario” in AlwaysOn Availability Groups that when secondary node is up to take over primary role, the transaction becomes inconsistent? And how to avoid it?

Answer from Brent: Ooo, there’s several questions in here. First, there’s the concept of split brained clusters – when two different database servers both believe they’re the master. Windows Server Failover Clustering (WSFC) has a lot of plumbing built in to avoid that scenario. When you design a cluster, you set up quorum voting so that the nodes work together to elect a leader. In theory, you can’t run into a split brain scenario automatically – but, you can most definitely run into it manually if you go behind the scenes and change cluster settings. The simple answer here: education. Learn about how the quorum process works, learn the right quorum settings for the number of servers you have, and prepare for disaster ahead of time. Know how you’ll need to react when a server (or an entire data center) goes down. Plan and script those tasks, and then you can better avoid split brain scenarios.

Q: Can you recommend any custom policies for monitoring AlwaysOn?  Or do the system policies provide thorough coverage?  Thank you!

Answer from Brent: I was a pretty hard-core early adopter of AlwaysOn Availability Groups because I had some clients who needed it right away. In that situation, you have to go to production with the monitoring you have, not the monitoring you want. The built-in stuff just wasn’t anywhere near enough, so most of my early adopters ended up rolling their own. StackOverflow’s about to share some really fun stuff there, so I’d keep an eye on You should also evaluate SQL Sentry 7.5’s new AlwaysOn monitoring – it’s the only production monitoring I’m aware of, although I know all the other developers are coming out with updates to their tools for monitoring too.

Q: Is it wise to have primary availability groups in one server of the nodes and have primary groups on another of the servers that form the cluster. Or is it better to have all primary groups on server 1 and secondary on server 2?

Answer from Brent: If you split the primaries onto two different nodes, then you can do some load balancing.

Q: Would you consider Always-ON AG Read-Only replicas as a method to offload or load balance reporting? Looks like the Read Intent option acts like a load balancer for reading off of those DBs, right?

Answer from Brent: Offload yes, load balance no. The read intent options give you the ability to push read-only queries to a different replica, but there’s no load balancing. Your clients just hit the first server in the list. If you need true load balancing, you’ll want to put all of the read-only replicas behind a real load balancing appliance.

Windows Clustering Setup and Management

Q: Where can I find a good list of cluster hotfixes SQL 2008 R2 and perhaps the OS as well? 

Jes here. You can go to the Update Center for Microsoft SQL Server to find the latest CU and hotfixes. Check the Support pages for Windows Server 2008R2. Updates aren’t released as cluster-specific. This is why it’s really important to have a test or QA environment that is also set up as a cluster, so you know if the cluster services are affected at all.

Q: What is the recommended order/procedure when you have to do Windows updates to servers in a cluster?

Answer from Kendra: Microsoft knew you were gonna ask this! Check out their “SQL Server failover cluster rolling patch and service pack process” KB here. But do yourself a favor and always deploy patches to a non-production test cluster first and let them burn in a bit.

Q: From your previous answers, it sounded like you don’t recommend use Windows 2008 R2 for AlwaysOn. Can you elaborate bit more on why Windows 2012 is better suited for this? I need more persuasive power to talk the rest of folks of my company to use it.

Answer from Brent: Sure, check out the AlwaysOn Availability Groups Lessons Learned video at the bottom of that page.

Q: Would you have a single DTC group or multiple groups configured for a 4 instance cluster?

Answer from Kendra: There’s no shortcut here: you have to decide on an instance by instance basis. For each instance you gotta determine how much it uses distributed transactions, and how impacted it might be if DTC were to temporarily be offline. Review Cindy Gross’ information on DTC to find out pros and cons of different approaches to configuring DTC.

SQL Server Clustering with VMware and Hyper-V

Q: Is VMWare HA a good alternative to use instead of a Microsoft Cluster?

Answer from Jeremiah: The HA choice comes down to where you want your HA to be managed. VMware HA pushes the high availability question out of the SQL Server realm and into the VMware infrastructure. More than anything else, this is a business decision – just be sure you’re happy with the decision of which team is managing your uptime.

Q: When using a virtualized active/passive 2008R2 cluster with underlying iSCSI storage can the nodes by on different hosts or is FoE needed to have nodes on different hosts? 

Answer from Brent: Check out VMware’s knowledge base article on Microsoft cluster support. It lays out your options for iSCSI, FC, FCoE, and more, and separates them by shared-disk clustering versus non-shared-disk (AlwaysOn Availability Groups).

Q: Any thoughts on implementing AlwaysOn in conjunction with a virtual SQL environment using VMWare HA/ Site Recovery Manager (SRM)?

Answer from Kendra:  With this level of complexity, when things get tricky it’s incredi-hard to sort out. You gotta have a rockstar team with great processes and communication skills to handle problems as they arise– and you are going to hit problems.

Even if you have the rockstar team, you want to first ask if there’s a simpler way to meet your requirements with a less risky cocktail of technologies. If you rush into what you describe, you’ll find that your high availability solution becomes your primary cause of downtime.

Shared Storage for Clusters

Q: Was reading a great article from Brent on SQLIO. How does this work on a SQL Cluster?

Answer from Kendra: You run SQLIO against the storage (not the SQL Server instance) so it works the exact same way.

Q: After Setting up The Cluster and adding the various CLUSTER DATA Drives how can I add additional Drives after gaining new internal storage?

Answer from Kendra: Before you touch production, make sure you’ve got a lab environment. If you don’t, check out the link above on how to build one.  The exact steps to do this are going to vary depending on your version of Windows, your version of SQL Server, and exactly what storage you’re talking about.

For new shared storage on Sever 2008 or later, the basic process is presenting the storage to all of the nodes, bringing the drive online on one node, creating a volume, adding the disk in the failover cluster, and then adjusting dependencies in the cluster as needed. (Dependencies can be adjusted online in SQL Server 2008 and later).

If you have new non-shared storage that you want to use under tempdb (such as SSDs), you’ve got to make sure that every node in the cluster has the drives for tempdb online / volumed/ formatted/ and configured identically, and then you can move tempdb files over to it. You will need to restart SQL Sever to make modified tempdb files recognize the new paths.

Sharding and Mirroring Questions

Q: I have a peer to peer replication with 3 nodes (all bidirectional). Very beneficial but a big pain to maintain. Is that what the industry feels?

Answer from Jeremiah: SQL Server peer-to-peer replication solves a very specific need – the ability to have multiple active SQL Servers where writes can occur and where you can have near real-time updates to the other servers. While peer-to-peer replication meets that need, it has a relatively heavy price tag in terms of DBA expertise, support, and licensing costs. Even experienced teams want to have multiple DBAs on staff to deal with on call rotations and, let’s face it, while peer-to-peer replication hasn’t been deprecated, it’s a difficult feature to work with.

Q: I’ve implemented db sharding on Oracle in several environments. Is there an applicable tech in SQL Server?

Answer from Jeremiah: Sharding is just a buzzword for horizontal partitioning. In a sharded database, either the application or a load balancing router/reverse proxy is aware of the sharding scheme and sends reads and writes to the appropriate server. This can be accomplished with SQL Server, Oracle, MySQL, or even Access. There are no technologies from Microsoft and I’d be wary of anyone attempting to sell something that Just Works® – database sharding is time consuming, requires deep domain knowledge, and adds additional database overhead.

Q: Currently using SQL 2008 Mirroring.  Planning a move to 2012.  Your thoughts about jumping 2012 and going straight to 2014 Always On technologies? 

Jes here. There were no major changes to Database Mirroring in SQL Server 2012, and I don’t foresee any coming in 2014. Eventually – we don’t have a specific version yet – Mirroring will be deprecated. Read our AlwaysOn Availability Groups Checklist to get an idea of the work involved in setting these up – it’s much more complicated than Mirroring – before you decide to jump in.

More Free AlwaysOn and Clustering Training Resources

And if your business needs help deciding which one is right for you, check out our SQL Critical Care® methodology. In just 4 days, we get to the root cause of your availability and performance pains, and we train you how to fix it. Learn more about our SQL Critical Care® now.

The Database Decision Diagram

Download the Database Decision Diagram

Download the Database Decision Diagram

Picking a database doesn’t have to be as one sided as “Use SQL Server”. Although SQL Server is a great general purpose database, it is a general purpose database. If there are other properties to your database, you might need to start looking at different database platforms to meet your needs.

Don’t believe me? Take a look at the video from “Nobody Got Fired for Buying SQL Server”. In it I cover a list of questions that you can use to help figure out the best database for getting the job done. You might wind up at SQL Server a lot, depending on your needs. That’s not a bad thing. Heck, it’s a good thing because SQL Server is solid.

Scale Out

Let’s face it, AlwaysOn Availability Groups solve the scale out read problem, but they don’t make it possible to do scale out writes in SQL Server. Whether you have a lot of data, need a lot of memory, or just want to push writes faster than a single server can handle – there’s no clear cut SQL Server solution right now.

Popular scale out databases:

Document Databases

Maybe you don’t need the rich semantics of the relational model, but you need something else. Document databases are designed around the idea of a document as a loose collection of properties formatted in a standardized way (XML, JSON, YAML). The idea behind a document database is that you pull back an entire document all at once. Document databases have query languages and indexes, both of which make it possible to provide rich query semantics while moving away from the relational model.

Popular document databases:

Graph Databases

Graph databases are designed to track the relationships between objects. If most important part of your application is tracking asymmetrical relationships, then a graph database may be what you need. What’s an asymmetrical relationship? When it’s important to know that Brent and I are friends on Twitter and Brent and I both follow Wolf Blitzer, who only follows me, that’s something you’d want to track in a graph database.

Popular graph databases:

Failure Scenarios

Just as querying functionality defines database choice, so does failure scenario. SQL Server responds well to a host of failure scenarios, depending on your SLAs, but it might not be everything that you’re looking for.

Make sure that you thoroughly understand the possible failure scenarios for your application. Understanding the requirements of how your application responds to failure will make your database choices easier. An application that only needs to survive single server failure within one datacenter has very different requirements than an application that has to survive entire datacenter failure.

Getting Started

Picking a database isn’t an easy choice. There are a lot of factors to consider from administration, development, feature set, and failure conditions. You can’t just pick something because you read about it on Hacker News or heard about it in a conference. (Well, you could, but you’ll probably be in a lot of pain.) Making the right database decision means that you won’t be scrambling to solve a lot of difficult problems long after you’ve made some questionable starting choices.

Download the database decision diagram or sound off in the comments if you’ve got more questions.

Query Plans: Memory Grants and High Row Estimates

SQL Server needs to have a reasonably correct estimate of how many rows your query is going to handle. If it doesn’t, things can go terribly, terribly wrong.

Most often, the problem is that SQL Server under estimates its work. In this case, the query optimizer thinks it has just a little bit of work to do, but in actuality it has much more. This results in “undersized” execution plans for the task– frequently a tiny little nested loop that’s run through over and over again while a larger join would do much better for the workload. This problem is often found in relation to the Ascending Key Problem and out of date statistics in SQL Server.

But sometimes we have the opposite problem: SQL Server thinks it has a LOT of work to do, but really it only a little bit.

This pattern creates a totally different set of issues and has a very different impact on the workload of your SQL Server.

A simple trick to inflate row estimates

I’ve run into problems with workloads where overestimates are happening several times over the years, each time with completely different root causes. Recently I’ve found a simple way to reproduce the phenomena in AdventureWorks that makes it easy to see some of the impacts of overestimating a query’s work.

Let’s start by looking at the execution plan for the query behind AdventureWorks2012‘s [HumanResources].[vEmployee] view. We’ll run this and get an actual execution plan as well as IO statistics.

SELECT e.[BusinessEntityID],
pnt.[Name] AS [PhoneNumberType],
sp.[Name] AS [StateProvinceName],
cr.[Name] AS [CountryRegionName],
FROM [HumanResources].[Employee] AS e
INNER JOIN [Person].[Person] AS p
ON p.[BusinessEntityID] = e.[BusinessEntityID]
INNER JOIN [Person].[BusinessEntityAddress] AS bea
ON bea.[BusinessEntityID] = e.[BusinessEntityID]
INNER JOIN [Person].[Address] AS a
ON a.[AddressID] = bea.[AddressID]
INNER JOIN [Person].[StateProvince] AS sp
ON sp.[StateProvinceID] = a.[StateProvinceID]
INNER JOIN [Person].[CountryRegion] AS cr
ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
LEFT OUTER JOIN [Person].[PersonPhone] AS pp
ON pp.BusinessEntityID = p.[BusinessEntityID]
LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt
ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
LEFT OUTER JOIN [Person].[EmailAddress] AS ea
ON p.[BusinessEntityID] = ea.[BusinessEntityID]

Here’s the general shape of our plan. Just note that the bars connecting operators are narrow– SQL Server has pretty low (and accurate) estimates of how much data will be flowing through these pipes:
Plan 1

Now, this query is already a little troubled. It has a lot of joins, including outer joins. It shows an optimization timeout in the plan. The plan estimated cost is 1.1 and it estimates it’ll bring back 229 rows. In my test environment it takes 16 ms to run and does several thousand logical reads across the many tables it joins. This means the query optimizer didn’t consider every possible plan, but hey, our runtime was pretty darn good anyway.

Plan 1 Timeout

Now let’s cause some real trouble. We’re going to make only one type change to this query– we’re going to add functions to the joins. These functions mimic what a user might do if they were concerned about leading/trailing characters. This is going to do several things:

  • Force SQL Server to do an implicit conversion to the columns in order to apply the functions
  • Effectively apply the functions row-by-row to do the comparison at runtime
  • Together this makes it incredibly hard for it to properly estimate the number of rows that come out of the joins.

Here’s our revised query:

SELECT e.[BusinessEntityID],
pnt.[Name] AS [PhoneNumberType],
sp.[Name] AS [StateProvinceName],
cr.[Name] AS [CountryRegionName],
FROM [HumanResources].[Employee] AS e
INNER JOIN [Person].[Person] AS p
ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))
INNER JOIN [Person].[BusinessEntityAddress] AS bea
ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))
INNER JOIN [Person].[Address] AS a
ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID]))
INNER JOIN [Person].[StateProvince] AS sp
ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID]))
INNER JOIN [Person].[CountryRegion] AS cr
ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode]))
LEFT OUTER JOIN [Person].[PersonPhone] AS pp
ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID]))
LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt
ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID]))
LEFT OUTER JOIN [Person].[EmailAddress] AS ea
ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID]))

There have been a LOT of changes to the shape of this plan, as well as to the size of those pipes!

Plan 2

Instead of being able to use column or index statistics to properly estimate how many rows that come out of the joins, instead we end up with estimates that are sized as if every row might join with every row (a Cartesian product). That requires some heavy lifting!

Our second plan has an estimated cost of 645 trillion. SQL Server thinks it might return 4,598,570,000,000,000,000,000 rows (however many that is– I’m not sure how you even say that). The return data set is estimated as a HUGE amount of megabytes.

In my test environment it takes 122,046 ms to run (as compared to 16 ms) and does more than 4.8 million logical reads (as compared to several thousand). It chews up lots of CPU on three of my four virtual CPUs for the whole time it runs, even though it’s not parallel. Of course, it returns the same 290 rows of data at the end as the original query. (It just takes a LOT more work to do it!)

We can learn a lot from examining why this query has to use 7,600 times more CPU. And we can use this query to reproduce some pernicious problems that sometimes attack production environments.

Big queries need big memory reservations

Let’s compare some information from our two queries. We’re using SQL Server 2012 so we get some very rich information in our execution plans regarding our “Query Workspace Memory Grant”. This type of memory is used for query execution specifically. When SQL Server starts running a query it needs to figure out a minimum amount of memory that’s appropriate for all the sorts, spools, joins, and other things it may need to do in memory in the query. The more work it needs to do, the more of a query workspace memory it may need to ask for.

This memory is totally separate from the memory SQL Server uses to cache data in memory (the “buffer pool”) and the memory SQL Server uses to cache execution plans. In fact, this memory is “stolen” from the memory used for caching data pages. (Hey, it’s gotta come from somewhere, right?)

Our revised query (overly high estimates due to the functions in joins) is on the left. Our original query is on the right.

Memory Grant Comparison

That’s a big difference, right?

If you run a lot of big queries like this at the same time, those bigger query workspace memory grants mean that you may run low on the available amount of workspace memory for those types of queries. When this happens, new queries that come in may have to wait for a query workspace memory grant to run. This shows up as a RESOURCE_SEMAPHORE wait, which is shown here with Adam Machanic‘s nifty sp_whoisactive stored procedure.

sp_whoisactive - Resource Semaphore

When this waits start happening, queries experiencing it literally can’t get out of the gate. They need a size of memory grant that just isn’t available, and they’ve got to wait around until their minimum grant can be met. (That’s fair, right? What wants to start running if there’s not enough memory to manage the query operators they need?)

Aside: Curious if you’re experiencing this problem in production? Our sp_blitz® procedure helps detect if you’re experiencing this type of wait.

Big queries may tear up tempdb

Another thing we can see with this experiment is that some large-sized operators may be super-expensive, even on small row sets.

Exactly what made our runtime so bad? Sure we thought a ton of rows were possibly going to come out of those joins, but in actuality they didn’t. It turns out that the second query actually really DID have to do a lot more IO than the first query. Here’s our STATISTICS IO output:

[code](290 row(s) affected)
Table ‘Worktable’. Scan count 583, logical reads 4864543, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘PhoneNumberType’. Scan count 1, logical reads 581, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘CountryRegion’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘StateProvince’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Address’. Scan count 1, logical reads 216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘BusinessEntityAddress’. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Employee’. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Person’. Scan count 1, logical reads 3820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘PersonPhone’. Scan count 1, logical reads 122, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘EmailAddress’. Scan count 1, logical reads 186, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Look at that first line. Why the heck did we have to do almost 5 million reads against ‘Worktable’? And what exactly is worktable?

Let’s go back into that plan again. Look at the spool operators– they are where all those big pipes start. First we’ll hover over the line coming out of the spool.

Plan 2 Spool View 1

These “spools” are worktables being built in memory (and tempdb) in SQL Server. Not only are there a high estimated number of rows coming out of these, but there’s a high actual amount of rows coming out of them as well.

If we look at the spool operator itself, we can see that each spool had to be accessed over and over again at runtime in our system to feed data back up into the nested loop join:

Plan 2 Spool View 2

Because SQL Server had to do a lot of heavy row-by-row comparisons and wasn’t sure exactly how many might feed out of each of them, it decided to build “spool” temporary tables behind the scenes. It loaded the the temp tables up in memory turn by turn and then had to go through a long, painful process of reading from them. It did this three times, in sequence.

You don’t have to have big data to have big query problems

The SQL Server optimizer is really clever, but often in production it gets into a bad situation. This can be caused by complex query patterns, poor data statistics, or TSQL anti-patterns.

This can be really tricky to identify, diagnose and fix in production if you’ve never seen it before. If you’re used to working with small databases, don’t worry– you can use techniques like I’ve outlined here to help you easily recreate in a small test environment what production might look like with big “beefy” queries running all at once.

Learn More in Our Execution Plan Training

Our How to Read Execution Plans Training explains how to get an execution plan, how to read the operators, and learn solutions to common query problems.

What is the CXPACKET Wait Type, and How Do You Reduce It?

When you query sys.dm_os_wait_stats, or check your waits with sp_Blitz® or sp_BlitzFirst®, one of your biggest wait types is probably CXPACKET. Let’s talk about what’s really going on. Here’s a 15-minute video explaining it, or you can keep scrolling down to read:

Our SQL Server Stand-In: a Class.

Who wants to help me scan the sales table?

Who wants to help me scan the sales table?

Pretend that you’re a schoolteacher in a classroom with a couple dozen students. You’ve got a class project that you need to accomplish as quickly as possible. The project is to build a list of all of the distinct words in the book Moby Dick, along with the frequency of each word.

There’s three ways you could do it:

1. Hand the entire book to a single student and let them work on it until it’s done. This student is going to be miserable, and it’s going to take a heck of a long time for this project to finish. However, this leaves the rest of the students to hang out and have fun playing Words With Friends.

2. Divide the pages of the book out between all of the students. Tear the book into a bunch of stacks of pages, distribute that work, and let them start working simultaneously. As they turn in their work to you, they can each get freed up to do other stuff. You (the teacher) will combine their work together into one master list. Each student’s paperwork will include the word “whale”, for example, and you’ll need to add up all of those “whale” references on one master list.

3. Divide the pages between a group of students (but not all of them.) Instead of putting all 24 students to work on the same project, you might tear the book into 8 sections and assign that work to 8 of your students. The other 16 students would be free to veg out in case another assignment comes in.

How This Relates to SQL Server

You’ve probably already figured out that the work involved is a query. The teacher is the master thread running a query. The students are CPU cores. Microsoft SQL Server can break a single query out across multiple cores as long as the work can be parallelized. (Paul White explains what kinds of work isn’t parallelized.)

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

Read that sentence again, because it’s really important. We’re going to come back to that.

The data warehouse server

The data warehouse server

The more students we get involved, the more work we have to do to coordinate their efforts and combine their results. The students, being rowdy goons, get frustrated when they get done with their work quickly and have to sit around waiting for other students to finish. This isn’t really a bottleneck per se – the students could go off and do other work – but they like to complain about how they had to wait around for the slow kids.

That complaining is CXPACKET – Class eXchange Packets. The class is turning in their packets, and complaining about the slow kids.

Just like kids complaining, CXPACKET by itself isn’t a problem. However, it’s an indication that something might be going wrong in this important sentence. Let’s read it again:

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

How to Reduce CXPACKET by Setting MAXDOP

The default Maximum Degree of Parallelism – the number of students we’ll get involved in a project – is zero, which means unlimited. That made sense back in the days when our SQL Servers only had a couple of cores, but today, we’ve got huge, ginormous processors. (I’m not bragging, I’m just saying.)

Microsoft KB 2806535 gives MAXDOP recommendations, and the short story is that you should set it to the number of cores in one processor, up to 8. (I’m simplifying here.)

The virtual machine has a little less power.

The virtual machine has a little less power.

Microsoft is saying that in most cases, as you go beyond 8 students working on a single project, there’s a diminishing amount of returns. You’re better off leaving the rest of the students free to work on other projects.

There’s a ton of bad advice on the web that goes a lot further and says you should set MAXDOP to 1 for OLTP-oriented workloads. They’re saying that if your application’s database needs are mostly very small inserts, updates, and deletes, then you should constrain SQL Server to only use one student at a time for any project, no matter how big that project appears to be.

That advice is usually no good because it assumes your server will never, ever run a big query. Sure, in some environments like Microsoft SharePoint, MAXDOP = 1 does make sense – but the key is really knowing the query workloads well. In most of the OLTP environments I see, people still want to run big, ugly reports in production, and we can’t stop them. When they do hit the server hard with those queries, I want to throw a few cores at the problem to make it finish faster. The key is knowing that there’s another knob we can tweak.

Defining Big Queries: Cost Threshold for Parallelism

Remember that really important sentence? Let’s read it again:

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

What exactly is a big query? Turns out we can actually tell SQL Server by setting Cost Threshold for Parallelism – the minimum cost of a query before SQL Server will consider dividing the work across multiple cores. It’s set in exactly the same place as MAXDOP – in SSMS, right-click on the server name and click Properties, Advanced.

The default Cost Threshold for Parallelism is 5 – but 5 what? Books Online says:

The cost refers to an estimated elapsed time in seconds required to run the serial plan on a specific hardware configuration.

That’s technically correct, but the words “specific hardware configuration” are a little tricky. They refer to an imaginary machine with an imaginary amount of CPU power and storage throughput. It’s like how we measure cars in horsepower. If you put together a tug-of-war between a Jaguar XFR-S and 550 horses, it’s not like it would be a dead heat. These are just numbers.

The real thing to know is that a query cost of 5 is a lot like 5 horsepower – it’s pretty darned small. With the defaults, SQL Server will aggressively try to parallelize everything.

The Real Way to Reduce CXPACKET Waits

Set MAXDOP per Microsoft’s recommendations in KB 2806535.

Set Cost Threshold for Parallelism in a way that lets your small queries stay single-threaded, but still gets more students involved when there’s a really big project. To do that, you have to…

Learn your workload. Use SQL Server’s plan cache to find the most resource-intensive queries on your system, and then analyze the costs of those queries. On one of my servers, for example, I know that we get hit with a lot of OLTP queries with a cost of around 30-40, and I don’t want those to go parallel. However, we also get a lot of reporting queries that scan the entire sales detail table, and those always have a cost of around 200 or higher.

This is a big topic, and tuning is different on a server by server basis. To get trained on learning your workload and taking the right steps for it, check out our Developer’s Guide to SQL Server Performance video.

What If Performance Gets Worse?

After setting Cost Threshold and MAXDOP, it’s possible that performance could suddenly get worse. You might have queries with a cost in the range of 5 to 50 that used to go parallel, and now suddenly go single-threaded.

Use sp_BlitzCache® to identify the top 10 most resource-intensive queries, making sure to run it with @sort_order = cpu, then reads, then duration, looking at your SQL Server’s plan cache overall. Look for queries with a cost between 5 and 50, and do query tuning or index tuning on those queries.

It’s also possible that 50 might not be the right number: say you’ve got a query with an estimated cost of 40, and it’s actually a wildly underestimated cost. SQL Server is making a plan that doesn’t take into account how much data actually comes back. In that case, you can either tune the query, or lower Cost Threshold to 39 to let that query go parallel.

Wanna learn more?

Learn more about our SQL Server Performance Troubleshooting class.

My Five Favorite That Conference Memories

Last week, I went to summer camp for geeks, and I had a great time! That Conference is a three-day, developer-focused, family-friendly technology conference. Camp was held for the first time in 2012. I heard such great things about it from last year’s campers that I decided to go this year!

The sessions focused on mobile, web, and cloud development. Being a SQL Server geek, I was a bit of a stranger in a strange land. I confess I couldn’t answer any of the .NET Rocks! trivia questions during the prize giveaway at the end, but I still took a ton away from this conference. Here are my five favorites!


I love running, and I try to find fellow runners at SQL Saturdays, PASS Summit, and even at our training classes. I saw that there would be a group running each morning of That Conference, so I decided to join them. I’m glad I did! I got to know a couple fellow runners and geeks better. The second day, one person told us it was the first time he’d ever run 3 miles straight, and another person said it was the first time in his life he’d ever woken up with the intent to exercise. Those comments made the early wake-ups worth it. Your physical health ties directly into your mental health and ability to enjoy life – don’t neglect it.


My primary focus is SQL Server, but when I work with clients, I’m not working with just the DBA. It’s frequently a group – a server admin, a manager, and a handful of developers that work with the system. They may not speak my language, and I’m not fluent in theirs. Most conferences focus on one topic – SQL Server, SharePoint, .NET. Not this one – it was a mix of .NET, PHP, Ruby, Java, mobile development, and more. Thus, this was the perfect conference for me to attend – learning what’s important to developers and speaking their language.

I attended sessions on PHP, Git, DevOps, website development and optimization, PaaS in the cloud, and Azure HA/DR, among other topics. I learned a ton. I also learned what developers care about – and it’s not how long the database backups take each night!

Fellow Campers

I met great people. Everyone that I sat down next to at breakfast or lunch was friendly and willing to chat. The speakers were top-notch.  It’s a great community-focused event where we were encouraged to network and share – and everyone did.


Arduino is “an open-source electronics prototyping platform based on flexible, easy-to-use hardware and software.” Let me rephrase that: Arduino lets you make dancing robots.


I had the distinct pleasure of watching Sharon Cichelli give a talk about Arduino. Rather than a dry, boring presentation, she showed us how she learned electronics and this programming language to make a dancing robot. Having never owned an electronics kit or touched a breadboard before, I wasn’t sure I would understand a word. However, in 10 minutes, I was captivated and understanding electronics diagrams. When she wrote simple code to make her robot fret and wiggle, I was in love. The best talks are given by people who are passionate about a topic, and she clearly was. I’m tempted to get a “100 Electronics Projects for Kids” kit and let my inner geek out now!

Open Campfires

The difficulty with every conference is that there are always topics I want to talk about or learn more about that aren’t on the schedule. There are times I want to find a group of people sitting around a table, chatting, and join them. The Open Spaces concept fills that void perfectly. There was a large hall set aside specifically for this. Half of it was tables that anyone could use at any time – to work, to chat, to program. The other half was set up in four circles of chairs. If you wanted to lead a session at any time – whether you had a question you wanted to ask other people, or toss ideas around – all you had to do was put a topic and your name on a post-it note on the board.

The board was filled with topics from “Bitcoin” to “Unblocking Creativity” to “Improving the Workplace”. I sat in a couple of them my first day at lunch and was hooked. People were sharing ideas, confidently talking about what worked for them and what didn’t, and teaching me a ton.

I decided to throw my own in, and put down “Performance Tuning: What and How” on the second day after lunch. I wanted to find out what steps developers took to tune their code, and how they would get to the point of saying, “It’s a database problem.” I also wanted to know what questions they had about SQL Server performance tuning. The worst part about this? It was only an hour. It flew by. Ten or so of us discussed everything from how indexes work to caching, execution plans to baselines.

My Open Space discussion

My Open Space discussion

What really impressed me was people’s willingness to talk. Everyone at that conference could have taught me something. Most people are scared of talking in front of people, so they never submit a session. But when you put them in a small group and let them talk about a topic they care about, they will open up. I learned most from those people, sharing their day-to-day challenges. It was inspiring!

Let’s Have S’mores!

I didn’t think anything could top the sessions and networking, but the conference ended with a s’mores and bacon bar. Perfect!

I can’t wait to go camping at That Conference next year!

Introduction to SQL Server Failover Clusters (video)

You’d love to know how to make your SQL Servers more highly available, but you’re just not sure what “clustering” is all about. We can help! In this 30 minute video Kendra Little shows you how failover clusters make your SQL Servers easier to manage. She uses diagrams and drawings to teach you the strengths and weaknesses of failover clusters and shows why clustering is becoming even more critical for high availability and disaster recovery.

This webcast was sponsored by Idera Software— and they gave away a free trip to the SQL PASS 2013 Summit to one lucky attendee. How cool is that???

Got Questions? Scroll on down and check out Q&A from the live webcast below the video, or read more about clustering SQL Server.

Webcast SQL Server Clustering Q&A

For failover cluster instances, does Windows Clustering needs to be enabled before you configure SQL Server Clustering? Or is it enabled by default? You need to enable the Failover Clustering feature on all nodes and create and validate the Windows Failover Cluster. You then install SQL Server onto the cluster (you tell the SQL Server installation GUI you’re installing it on the cluster and it walks you through a wizard).

So there is only one active node at any given time with one or more possible passive nodes. Right? Yes, exactly– any given clustered SQL Server Instance can only be active on one node (physical server) at any given time. (You can have more than one instance on each cluster, so you could potentially have multiple instances active on a single node, or multiple instances active across multiple nodes.)

Can we have two nodes on the same physical hardware? A “node” IS a physical server, so no.

Must applications be coded as cluster aware? No, apps have no idea that they’re connecting to a database that lives on a cluster. They just need to be able to handle the brief downtime during a failover so that if their connection is interrupted, they retry/keep on going.

Is there any test or lab microsoft provides to try to test clustering? No, you’ll want to build your own lab for that.

What about a SQL Server cluster with one physical node and one node that’s a virtual machine– Is this recommended? We don’t recommend it. Read why it gives us the heebie jeebies here.

Would a virtualized failover cluster avoid the downtime during the failover? No.

How do you configure and Manage MSDTC? Several people asked this one. Cindy Gross answers everything you need to know about configuring DTC on a failover cluster.

How many IP addresses/names do you need? We had a couple of versions of this question– and I don’t mention every name and IP address involved in clustering in the video. Here’s a list:

  • Windows Failover Cluster Name and IP: The windows cluster itself gets a name, but you won’t use that to access the SQL Server. (It’s just for cluster management purposes.)
  • SQL Server Network Name and IP: Each SQL Server instance gets its own name and IP that can move around the cluster. This is a good thing. This *is* what you use to talk to the SQL Server.
  • DTC Name(s) and IP(s): The number you need depends on how your cluster and DTC are configured. See the link on the MSDTC question above.
  • Physical Server Names and IPs: Just like any other server, each physical node gets a name and IP so you can manage it.

Do we need to configure a heartbeat network? Great question. Check out Brent’s article, “Everything You Know About Clustering is Wrong“. (Great title, eh?)

What manages which node is up or down? The Windows Failover Clustering feature does this. It’s a set of services built into Windows that work for all kinds of clustering, like Exchange or file/print servers.

Are there advantages to a cluster with shared storage vs a cluster running availability groups without shared storage? Yes – say your database is 1TB. With AlwaysOn AGs, you need at least 2TB of storage. There are also scenarios with very high amounts of data modifications or high numbers of individual databases that (could) work (relatively) easily on a failover cluster, but push the limits of Availability Groups.

Doesn’t Microsoft discourage using the terms “active/passive” and “active/active”? They sure do! But the phrases “single instance failover cluster” and “multi instance failover cluster” just don’t roll off the tongue, so they’re rarely used in the real world. When we say “multi instance failover cluster”, inevitably people ask, “oh, you mean active-active?”  So you can either use the “approved” terms, or the terms most people understand— whichever you prefer.

In a two node cluster with one SQL Server Instance (“Active/Passive”), what exactly is the status of the passive node? Is it something like the SQL Service needs to start when the active node fails?  Yes! You got it. And if you look at the SQL Server services on the “idle” node, you’ll see they’re in a stopped state and set to manual start. That’s by design– it’s the failover cluster that will start the services when it needs to happen.

Would it be possible for each node to host its own SQL instance which fails over to the other one respectively? – Our goal would be to host one 2008 R2 and one 2012 instance of SQL server. Technically speaking, this is called a “side by side deployment”, but I don’t know anyone who recommends it. This question reminds me of the Meatloaf song, “I’d do anything for love, but I won’t do that.” Let’s break down the problems:

  • Installation: You’ve got to install everything in a very specific order (lowest to highest) on every node. Get anything out of order and you’ve gotta start over.
  • Maintenance: More updates, more patches, more failovers
  • Troubleshooting performance or availability problems: Oh, that gets ugly. You’d be running in a scenario that very few people use. Let’s imagine that you start getting unexplained failovers. It’s now much more hairy to troubleshoot.

So I urge you to consider other options. If you do go this route, work extra hard to make sure you have a very similar setup (ideally identical) in a non-production environment to mitigate risk and heartache.

To avoid confusion: For service packs and cumulative updates, you can use a rolling upgrade process on the cluster. That’s still totally cool.

For instant file initialization, you need to set that up on each node or is it “cluster” aware? And do you recommend it for clusters? You have to set it up on each node, and yes, we’d still recommend it.

Other than checking from SQL Server logs, Windows EventViewer logs, can we find the last failover duration? Nope, not outside of those places.

Is clustering available on the express edition? Technically, no, but you can cluster it manually yourself. If you’re considering this one, step back and take a good look at the problem you’re trying to solve– we’re a little worried.

Can I set up fake instance in my DNS that points to myServer\ThisInstance to start porting over to what will be in place after the cluster is in place?  Setting up before hardware order is placed we get to tweak existing sprocs?  Or is this a BAD IDEA?  Yes, you can create a pointer in DNS to do this. I call it a “DNS friendly name”, some people call it a “DNS pointer”, some people just say “CNAME”. This can be very useful.

More Microsoft SQL Server Clustering Resources

Whether you want help choosing between an active/passive and an active/active cluster, or if you’re the kind of DBA who knows that’s not even the right name for failover clustered instances anymore, check out our SQL Server clustering training page.

The Stages of Performance Tuning

Performance tuning is a weird art to learn, because you do it in public. Sure, you may puzzle over a technical problem at your desk or revise code on your own– but when it comes to really tuning hard problems, 99% of the time you’re doing it with other people. That’s the nature of real performance tuning.

Just like learning to dance, it can be awkward to get into performance tuning. What if you miss something obvious? What if the idea you suggest is totally stupid? What if you try to do things that are, frankly, impossible?

You’re worried that they’re all gonna laugh at you.

Everyone has these fears at first. If you keep at it, you’ll get through it.

Stage 1 – The Kid’s Recital: Everyone starts out here. You’ve read some blog posts, you’ve done a little practicing, and now when things get tough, you say, “Hey, what about this!?!?”

People can tell that you’re a little new at this, and maybe you’re a little overeager, but you probably make up for it in excitement. Don’t get stage fright. You know that sometimes people don’t think you’re a total pro, but you’ve got to just keep practicing and getting out there and performing. Go out there and try to do your thing.

You secret power at this stage: Listen to other people. Learn from them as much as you can.

Stage 2 – The Rebecca Black: Eventually, you learn to sing a song that’s pretty catchy. You find a few ways to make something blazing faster. People start to listen to you more and more, and they come to you to ask your opinion more often.

At this stage you just know a few tricks, but they seem powerful. You feel like “This stuff wasn’t so hard! I know everything I need to know!”

The rush is blissful, but usually brief. Suddenly you’re rushing into trouble and trying to tackle too much alone. It turns out that you really know a few tricks, and soon enough you start to make mistakes. Hopefully, it humbles you. (If it doesn’t, well, bad news. Nobody’s gonna wanna work with you if you get stuck in this phase.)

You wonder if you’ll ever be more than a one-hit-wonder.

Stage 3 – The True Karaoke Star: If you keep at it, you’ll get comfortable, and you’ll learn to embrace what you don’t know. You’re used to solving performance problems. You’ve got not just one or two tricks in your bag, but many different techniques, and you know their limits. You’ve got good sources to go to, and you love to question your assumptions.

More importantly, you’ve made it to that golden place where you’re no longer embarrassed, but you’re not over-confident, either. You have your voice and you also listen to people. You don’t panic at the first sign of trouble, and you also know that the show doesn’t have to be all about you– other people can get on stage, too.

Sometimes it Takes a While. Finding the right job and lifestyle where you can grow and learn is the hard part. That probably won’t fall into your lap. It’s OK. Everyone has to start somewhere– and for some of us it takes volunteering, working nights, or taking risks on our career to try something new.

Whichever way you get there, don’t get discouraged. You can be a rock star, too.

Success in the Cloud

Right now, someone in your company is thinking about moving your operations up into the cloud. If your company is like 95% of companies out there, “the cloud” is really a synonym for Amazon Web Services (AWS). Thinking about a move to AWS is happening across many companies, and it’s not something that you need to be scared of – nobody is going to lose their job. However, it’s important to understand what you and your company can do to make this transition successful, painless, and an opportunity for growth.

Performance Baseline

How fast is your application right now? Do you know when you’ve hit peak load or even which metrics signify peak load?

Having a performance baseline is the only way to make sure that changes are helping. A general feeling that things are getting better isn’t enough, you need to know. During our engagements with customers, we evaluate how applications are performing today. We gather metrics around disk and CPU performance and look for signs of existing and potential bottlenecks. Once we know where things are, it’s easy to figure out where things are going.

As you’re looking to move to AWS, take a look at your current performance baseline. Are you happy with those numbers? Things probably won’t get better after the move; virtualization has overhead and everything in AWS is virtualized. Once you’ve got a baseline of your performance, it’s easy to test your theories about performance and make decisions about how aggressively you need to tune your application to meet your promises to your customers.

Service Level Agreements

SLAs are so important that near the beginning of every engagement I ask, “Do you have an SLA with your customers? What about internal SLAs between your teams?”

An SLA can be as simple as responding to outages within a certain window or as complex as measuring the performance of specific actions in the application. Having an SLA in place early can make it easy to evaluate your ability to move up to AWS and still maintain acceptable performance. If you know that a specific set of web pages must return faster than an average of 250ms during peak load, you have an easy measurement to determine if performance is acceptable.

Distilling performance down to a set of metrics makes it easy to make decisions, spot problems, and design for the future. In addition to a performance baseline, SLAs also give guarantees. If query performance drifts outside of agreed upon norms, the SLA can describe who is going to work on improving performance and how quickly that work gets scheduled.

SLAs aren’t just performance related – they relate to how fast you can bring a system online in the event of an outage, how you will respond to potential performance issues, and what the shape of that response looks like. Everyone aspires to five 9s of uptime, but what’s important is how you handle unforeseen outages.

An SLA shouldn’t be a detailed, painful document that resembles of a software license. A well designed SLA will serve to drive customer interaction and push teams to take responsibility for making the application perform well.

Proactive Monitoring

What does your monitoring environment look like today? If you have monitoring in place, I suspect that you’re looking for problems as they’re happening. To maintain good performance in AWS, you need look for performance problems before they occur.

You can use your existing monitoring, or you can use Amazon’s CloudWatch to create alerts around trends in resource utilization. If you know your performance baselines, you can configure alerts to notify you when things are out of the ordinary. If CPU utilization has never gone higher than 55%, even under peak load, it’s helpful to set up an alarm to fire when CPU utilization has been higher than 60% over a period of 15 minutes.

It’s better to be aware of potential problems than to respond to a fire. Make sure performance warnings are different from alerts about actual problems, but also make sure that you’re doing something about warnings as they arrive. Being proactive about monitoring performance does more than help you keep things running smoothly; proactive monitoring gives you insight into where you can tune your applications and make things better in the long run.


What are your thoughts around AWS? What’s the big picture? Having a vision around AWS is going to be critical to a successful transition.

A vision around AWS can’t be a simple statement like “We’re going to move our operations into the cloud and save a lot of money.” While saving money is a nice goal, the vision around AWS needs to be something more than fork lifting your existing infrastructure into the cloud.

Having a strong vision about how AWS can help your organization meets its goals is critical. This doesn’t have to be highly detailed and include specific features, but make sure that the vision includes an understanding about how you will be deploying and refining your application over time. Moving to AWS is not a quick fix for any problem. Your vision needs to include what today looks like, what your goals look like, and how you’ll be working toward them over the next 3, 6, 9, and 12 months.

Taking full advantage of the rich feature set in AWS takes time, and your vision should reflect how you will make the move and monitor application behavior in order to make good decisions about direction and functionality.

Buy In

Most important is buy in. Everyone involved needs to buy into the idea that a move to AWS makes sense for the organization. This may be the hardest item to accomplish, but it’s worth making sure that your team is on the same page.

Having a team that’s accepting of the move will make the transition easier. Part of this acceptance is the realization that things will not be the same as they were before. Gone are the days of giant back end servers with hundreds of gigabytes of memory and multiple network cards. Performance is no longer a purchase order away. Your team needs to accept these statements as facts; you can’t easily upgrade your way out of performance problems in AWS.

Instead of buying their way out of problems, the team needs to be committed to investigating new ways to solve their problems. With AWS it’s simple to design a rapid prototype and direct load to the new prototype system. It’s important that everyone is on board with testing changes at any level – from a single function to the entire infrastructure. In AWS these sweeping changes are easy to do, but you need buy in from everyone involved that things don’t need to be the same as they were before.

Drifting Away Into Cloudy Success

If you don’t have all of these factors in place, are you going to fail? Probably not. These are distinct traits that I’ve found in companies who have successfully moved their infrastructure into AWS. The more of these that a company has possessed, the happier I’ve found them to be with AWS. As you’re considering a move into AWS remember that there are more than technical challenges to moving into AWS – there are human and organizational challenges that need to be met in order to ensure success.

SSD RAID Load Testing Results from a Dell PowerEdge R720

We’ve got a client that does big batch jobs every day, loading hundreds of gigabytes of data or more in short bursts. They were frustrated with slow performance on the batch jobs, and after we performed our SQL Critical Care® with ’em, it was really clear that their hardware was the bottleneck. They were using a virtual server backed by an iSCSI SAN, and they were getting bottlenecked on reads and writes. We could put some more memory in it to cache more data, preventing the read problem, but we would still get bottlenecked trying to write lots of data quickly to the shared storage.

We recommended two things: first, switch to a standalone bare metal SQL Server (instead of a virtual machine), and second, switch to cheap commodity-grade local solid state storage. Both of those suggestions were a little controversial at the client, but the results were amazing.

Why We Switched from VMware to Bare Metal

Theoretically, virtualization makes for easier high availability and disaster recovery. In practice, there are some situations – like this one – where it doesn’t make sense.

In the event of a failure, 15-30 minutes of downtime were acceptable. The server was important, but not mission-critical. In the event of an outage, they didn’t mind manually failing over to a secondary server. This meant we could avoid the complexity of a failover cluster and shared storage.

Slow performance was not acceptable during normal production. They wanted to put the pedal to the metal and make an order-of-magnitude improvement in their processing speeds with as few code changes as possible.

They weren’t going to pay a lot for this muffler server. They’re a small company with no full time DBA and no glut of servers laying around. Buying a server was a big deal – we only had one shot to buy a server and get it right the first time. In this move, we were able to free up VMware licensing for other guests, too.

What We Designed: Dell R720 with Local SSDs

The Dell R720 is a 2-processor, 2-rack-unit server with room for 16 2.5″ drives across the front of the server, and two RAID controllers. It’s got room for up to 768GB of memory. It’s my favorite 2-processor SQL Server box at the moment.

I’m not against shared storage – I love it – but when I’m dealing with large batch jobs, a limited budget, and no clustering requirement, it’s tough to beat local SSDs. The R720 lets us use a big stack of 2.5″ solid state drives with two RAID controllers for processing data. Quantity is important here since SSDs tend to be relatively small – 512GB or less. Some larger drives exist, like the Crucial M4 960GB, but performance doesn’t quite match the 512GB-class yet.

The Dell R720XD is a similar server, but it’s absolutely slathered with drive bays, handling up to 26 2.5″ drives. While that sounds better – especially with today’s fastest SSD drives still being a little size-constrained – the R720XD only has one RAID controller instead of the R720’s two.

For our Plan B – where we’d fail over if the primary server died – we actually stuck with a virtual server. We built a small 2-vCPU, 8GB RAM guest with SQL Server. We keep it current using the database backups from the primary server. Remember, this application is batch-oriented, so we just need to run backups once a day after the batch completes, and then restore them on the secondary server. When disaster strikes, they can shut down the VMware guest, add more CPU and memory power to it, and it’s off and running as the new primary while they troubleshoot the physical box. It’s not as speedy as the primary physical box, but that’s a business decision – if they want full speed, they can easily add a second physical box later.

Picking Solid State Drives for the Database Server

When picking drives to populate the R720’s 16 bays, that’s where the tough decision comes in. You’ve got three options:

1. Use Dell-approved, Dell-sold drives. These are ridiculously, laughably, mind-bogglingly expensive given the size and performance:

Dell PowerEdge R720 Drive Choices

Dell PowerEdge R720 Drive Choices

A 400GB MLC drive is $1,200 rack, so filling all 16 bays would cost $19,200. To put things in perspective, the server itself is about $10k with 2 blazing fast quad-core CPUs, 384GB of memory, and spinners on the fans, so buying Dell’s SSDs triples the cost of the server.

2. Use commodity off-the-shelf SSD drives. In the latest Tom’s Hardware SSD Hierarchy (scroll halfway down that page to see the tiers), the Samsung 840 512GB drive is in the top tier. Anandtech’s review called it “the fastest consumer SSD we’ve ever tested.” It’s available on Amazon for under $500, less than half the cost of the Dell drive, meaning we could fill the R720 with 8TB of smokin’ fast storage for under $8k, plus leave a couple of hot spares on the shelf.

There are risks with this approach – Dell won’t guarantee that their controller and their software will work correctly with this combination. For example, during our load testing, the DSM SA Data Manager service repeatedly stopped, and we couldn’t always use the Dell OpenManage GUI to build RAID arrays.

3. Ignore the drive bays, and use PCI Express cards. Drives from Fusion-IO, OCZ, and Intel bypass the RAID controller altogether and can deliver even faster performance – but at the cost of higher prices, smaller space, and tougher management. You can’t take four of these drives and RAID 10 them together for more space, for example. (Although that’s starting to change with Windows 2012’s Storage Spaces, and I’m starting to see that deployed in the wild.)

For our design, we ended up with:

  • Dell PowerEdge R720 with 2 quad-core CPUs, 384GB memory – $10k
  • 16 Samsung 840 Pro 512GB SSDs – $8k
  • Hardware total: under $20k
  • SQL Server Enterprise Edition licensing for 8 cores – $56k

Kinda keeps things in perspective, doesn’t it? The hardware seems insanely overpowered until you look at how much licensing costs. At that point, why wouldn’t you buy this kind of hardware?

Why I Load Test SSDs in RAID Arrays

The R720 has two separate RAID controllers, each of which can see 8 of the Samsung drives. The drawback of this server design is that you can’t make one big 16-drive RAID 10 array. That’s totally okay, though, because even just a couple of these race car drives can actually saturate one RAID controller.

I wanted to find out:

How few drives can we get away with? For future client projects, if we didn’t need to fill up the drive bays in order to get capacity, could we saturate the controllers with just, say, 4 drives instead of 8? Can we leave enough space to have hot spare drives? I run the performance tests with 2, 4, 6, and 8 SSD drives.

How much of a performance penalty do we pay for RAID 5? RAID 10 splits your drive capacity in half by storing two copies of everything. RAID 5 lets you store more data – especially important on limited-capacity solid state drives – but is notoriously slow on writes. (Thus, the Battle Against Any Raid Five.) But what if the drives are so fast that the controller is the bottleneck anyway?

Should we turn the controller caching on or off? RAID controllers have a very limited amount of memory (in our case, 1GB) that can be used to cache reads, writes, or both. In the past, I’ve seen SSD-equipped servers actually perform slower with the caching enabled because the caching logic wasn’t fast enough to keep up with the SSDs. Dell’s recent PowerEdge controllers are supposed to be able to keep up with today’s SSDs, but what’s the real story?

Does NTFS allocation unit size still matter? In my SQL Server setup checklist, I note that for most storage subsystems, drives should be formatted with 64K NTFS allocation units for maximum performance. Unfortunately, often we get called into client engagements where the drives are already formatted and the database server is live in production – but the NTFS allocation unit is just 4K, the default. To fix that, you have to reformat the drives – but how much of a difference will it make, and is it worth the downtime?

The answers to these questions change fast, and I need to check again about once a quarter. When I need to double-check again, and I’m working with a client on a new server build with all SSDs, I offer them a big discount if I can get remote access to the server for a couple of days. During that time, I use the storage testing tools described in my Best Free SQL Server Downloads video, and I tweak and tune the storage like I show in my storage training class.

Load Test Result Highlights

Turning off read caching didn’t affect performance. The controller’s small cache (1GB) just isn’t enough to help SQL Servers, which tend to cache most of their data in memory anyway. When we need to hit disk, especially for long sustained sequential reads, the controller’s minimal cache didn’t help – even with just 4 SSDs involved.

The controller’s write caching, however, did help. Write throughput almost tripled as opposed to having caching disabled. Interestingly, as long as write caching at the controller was enabled, it didn’t matter whether read caching was enabled or not – we saw the same benefit. I would expect higher write throughput if all of the 1GB of cache was available to cache writes, but that doesn’t appear to be the case with the R720’s controllers at least.

NTFS allocation unit size made no difference. This combination of RAID controller and drives is the honey badger of storage – it just don’t care. You can leave the default caching settings AND the default NTFS allocation unit size, and it’s still crazy fast.

In RAID 10, adding drives didn’t improve random performance. We got roughly the same random performance with 4 drives and 8 drives. Sequential read throughput improved about 35% – good, but maybe not worth the financial cost of doubling the number of drives. Sequential writes saw a big boost of about 60%, but keep in mind that sustained sequential writes is a fairly rare case for a database server. It’s unusual that we’re not doing *any* reads, and we’re writing our brains out in only one file.

SSD speeds still can’t beat a RAMdrive. With SQL Server Standard Edition being confined to just 64GB of memory, some folks are choosing to install RAMdrive software to leverage that extra cheap memory left in the server. If your queries are spilling to TempDB because they need memory for sorts & joins, this approach might sound tempting. Microsoft’s even got an old knowledge base article about it. The dark side is that you’re installing another software driver on your system, and I always hate doing that on production systems. Just for giggles, I installed DataRam’s RAMdisk for comparison. The SSDs are on the left, RAMdisk on the right, and pay particular attention to the bottom row of results:

RAMdisk vs RAID 10 SSDs

RAID 10 SSDs vs RAMdisk

The bottom row, 4K operations with a queue depth of 32, is vaguely similar to heavy activity on multiple TempDB data files. This particular RAMdrive software manages about 4x more write throughput (and IOPs as well) than the RAID 10 array of 8 drives. (For the record, a RAID 0 array of 8 drives doesn’t beat the RAMdrive on random writes either.)

And finally, here’s the performance penalty for RAID 5. RAID 10 is on the left, 5 on the right. Same number of drives, same cache settings, same allocation unit settings:

RAID 10 versus RAID 5

RAID 10 versus RAID 5

It’s not a surprise that RAID 5 is faster for reads, but in this round of testing, it was even faster for sequential and large writes. The only place where RAID 5 takes a hit: the bottom right, 4K writes with queue depth 32.

If you’re running a Dell PowerEdge R720 loaded with Samsung 840 Pro SSDs, you’re probably better off with RAID 5 than RAID 10 – but why?

Load Test Lowlights: The Controller

We’re not just testing the drives here – we’re also testing the RAID controller. To get a true picture, we have to run another test. On the left is a RAID 10 array with 8 drives. On the right, just one drive by itself:

RAID 10 versus just one drive

RAID 10 versus just one drive

The top two lines are sequential performance, and the RAID array helps out there as you would expect. Having more drives means more performance.

The jawdropper hits in the bottom half of the results – when dealing with small random operations, more drives may not be faster. In fact, the more drives you add, the slower writes get, because the controller has to manage a whole lot of writes across a whole bunch of drives.

See, we’re not just testing the drives – we’re testing the RAID controller too. It’s a little computer with its own processor, and it has to be able to keep up with the data we’re throwing at it. In the wrong conditions, when it’s sitting between a fast server and a fast set of solid state drives, this component becomes the bottleneck. This is why Dell recommends that if you’re going to fill the server with SSDs, and you want maximum performance, you need to use the R720 instead of the R720xd. Yes, the R720xd has more slots – but it only has one RAID controller, so you’re going to hit the controller’s performance ceiling fairly quickly. (In fact, the controller can’t even keep up with a single drive when doing random writes.)

This is why, when we’re building a new SQL Server, we want to test the bejeezus out of the drive configurations before we go live. In this particular scenario, for example, we did additional testing to find out whether we’d be better off having multiple different RAID arrays inside the same controller. Would two four-drive RAID 10 arrays striped together be faster than one eight-drive RAID 10 array? Would we be better off with a single RAID 1 for TempDB, and a big RAID 5 for everything else? Should we make four mirrored pairs, and then stripe them together in Windows?

You can’t take anything for granted, and you have to redo this testing frequently as new RAID controllers and new SSD controllers come out. In testing this particular server, for TempDB-style write patterns, a RAID 0 stripe of two drives was actually slower than a single drive by itself!

So as Jezza says on Top Gear, and on that bombshell, it’s time to end. Good night!

Upcoming Contests & Training from Vendors

First things first, I’m not getting paid to write this.

You’re going to read some of this stuff and say to yourself, “Brent, you’re just pimping vendor events because you’re in bed with vendors.” Nope – I’m very proud to say that we don’t run ads here, and I only write about things that I personally care about.

I’m telling you about these free events because the vendors tell me about them, and if I was you, I’d wanna know. When I was a DBA, I never got the chance to attend conferences for free. I love how the webernets have changed career education.

Red Gate’s SQL in the City: Free Local Conferences

Me at SQL in the City 2012

Me at SQL in the City 2012

Red Gate puts on free one-day conferences across the US:

These are all-day events broken up into 60-90 minute sessions, just like a SQLSaturday or a conference, with a few different tracks for developers and DBAs. They don’t have the exact agenda for this year’s events yet, but you should sign up anyway, and I’ll explain why.

The conference is entirely sponsored by Red Gate, so the sessions are a mix of SQL Server training plus Red Gate product education. For example, in one session I attended last year, the presenter explained the need for source control in databases, talked about the pains of doing it the native way, and then showed how Red Gate makes it easier.

If this conference comes to your area, you need to attend – whether you’re a DBA or a developer. I know, it’s got vendor stuff in it, but you need to go anyway, because the non-Red-Gate-specific stuff is still really valuable. The two Red Gate evangelists, Steve Jones and Grant Fritchey, are both great speakers, plus Red Gate brings in good local speakers from the community.

You can also watch videos from last year’s SQL in the City, including my Six Scary SQL Surprises video.

Idera Webcast: Learn Clustering, Maybe Win a Free #SQLPASS Ticket

SQL Server Failover Clustering

It really *is* nice.

This Wednesday, August 21, Kendra and Idera are doing a free webcast together: A Gentle Introduction to Clustering SQL Server. Here’s the abstract:

You’d love to know how to make your SQL Servers more highly available, but you’re just not sure what “clustering” is all about. We can help! Microsoft Certified Master Kendra Little will introduce you to failover clusters and explain how clusters can make SQL Servers easier to manage. She’ll use diagrams and drawings to teach you the strengths and weaknesses of failover clusters for SQL Server, and show why clustering is becoming even more critical to designing solutions for high availability and disaster recovery in SQL Server.

And here’s the fun part: register and attend the webcast, and you get entered into a drawing to win a full conference ticket to the 2013 PASS Summit in Charlotte.

Read full promotion terms and conditions. Then register here! Don’t forget to attend – must be present to win.

SQLSentry.TV: Like YouTube for SQL Server Tuning

The fine folks over at SQLSentry have a new project – SQLSentry.TV. The first 6-part set of videos feature Aaron Bertrand and Kevin Kline go really far in-depth on writing and tuning queries. It’s a TON of material – several hours altogether.

I always recommend that my clients do a weekly lunch & learn session. Every Friday (or whatever day), the company should bring in lunch (pizza, sandwiches, etc) and pile the developers and database users into a conference room. Everybody eats and watches a 30-45 minute long video, and then afterward talk about what you learned that’s relevant to your own environment. The SQLSentry.TV videos are a great fit for that format.

If you also happen to own SQLSentry’s products, you should also check out the Product Features and Product Demos tabs at the top of I often run into “shelfware” – people who buy monitoring products and then never do anything with ’em. Learning about the products in these videos helps you avoid the shelfware problem.

And Of Course, Ours

Every Tuesday, we host a free 30-minute webcast covering different topics. Here’s what we’ve got coming up:

  • 8/20 – Getting Your Manager to Pay For Our Training
  • 8/27 – High Availability & Disaster Recovery Q&A
  • 9/3 – 5 Things to Love About SQL Server Standard Edition
  • 9/10 – How to Prove Your SQL Server Needs More Memory
  • 9/17 – Why Index Fragmentation Doesn’t Matter

Register at See you there!