Introduction to SQL Server Failover Clusters (video)

Last Updated 7 years ago

You’d love to know how to make your SQL Servers more highly available, but you’re just not sure what “clustering” is all about. We can help! In this 30 minute video Kendra Little shows you how failover clusters make your SQL Servers easier to manage. She uses diagrams and drawings to teach you the strengths and weaknesses of failover clusters and shows why clustering is becoming even more critical for high availability and disaster recovery.

This webcast was sponsored by Idera Software— and they gave away a free trip to the SQL PASS 2013 Summit to one lucky attendee. How cool is that???

Got Questions? Scroll on down and check out Q&A from the live webcast below the video, or read more about clustering SQL Server.

Webcast SQL Server Clustering Q&A

For failover cluster instances, does Windows Clustering needs to be enabled before you configure SQL Server Clustering? Or is it enabled by default? You need to enable the Failover Clustering feature on all nodes and create and validate the Windows Failover Cluster. You then install SQL Server onto the cluster (you tell the SQL Server installation GUI you’re installing it on the cluster and it walks you through a wizard).

So there is only one active node at any given time with one or more possible passive nodes. Right? Yes, exactly– any given clustered SQL Server Instance can only be active on one node (physical server) at any given time. (You can have more than one instance on each cluster, so you could potentially have multiple instances active on a single node, or multiple instances active across multiple nodes.)

Can we have two nodes on the same physical hardware? A “node” IS a physical server, so no.

Must applications be coded as cluster aware? No, apps have no idea that they’re connecting to a database that lives on a cluster. They just need to be able to handle the brief downtime during a failover so that if their connection is interrupted, they retry/keep on going.

Is there any test or lab microsoft provides to try to test clustering? No, you’ll want to build your own lab for that.

What about a SQL Server cluster with one physical node and one node that’s a virtual machine– Is this recommended? We don’t recommend it. Read why it gives us the heebie jeebies here.

Would a virtualized failover cluster avoid the downtime during the failover? No.

How do you configure and Manage MSDTC? Several people asked this one. Cindy Gross answers everything you need to know about configuring DTC on a failover cluster.

How many IP addresses/names do you need? We had a couple of versions of this question– and I don’t mention every name and IP address involved in clustering in the video. Here’s a list:

Windows Failover Cluster Name and IP: The windows cluster itself gets a name, but you won’t use that to access the SQL Server. (It’s just for cluster management purposes.)
SQL Server Network Name and IP: Each SQL Server instance gets its own name and IP that can move around the cluster. This is a good thing. This *is* what you use to talk to the SQL Server.
DTC Name(s) and IP(s): The number you need depends on how your cluster and DTC are configured. See the link on the MSDTC question above.
Physical Server Names and IPs: Just like any other server, each physical node gets a name and IP so you can manage it.

Do we need to configure a heartbeat network? Great question. Check out Brent’s article, “Everything You Know About Clustering is Wrong“. (Great title, eh?)

What manages which node is up or down? The Windows Failover Clustering feature does this. It’s a set of services built into Windows that work for all kinds of clustering, like Exchange or file/print servers.

Are there advantages to a cluster with shared storage vs a cluster running availability groups without shared storage? Yes – say your database is 1TB. With AlwaysOn AGs, you need at least 2TB of storage. There are also scenarios with very high amounts of data modifications or high numbers of individual databases that (could) work (relatively) easily on a failover cluster, but push the limits of Availability Groups.

Doesn’t Microsoft discourage using the terms “active/passive” and “active/active”? They sure do! But the phrases “single instance failover cluster” and “multi instance failover cluster” just don’t roll off the tongue, so they’re rarely used in the real world. When we say “multi instance failover cluster”, inevitably people ask, “oh, you mean active-active?” So you can either use the “approved” terms, or the terms most people understand— whichever you prefer.

In a two node cluster with one SQL Server Instance (“Active/Passive”), what exactly is the status of the passive node? Is it something like the SQL Service needs to start when the active node fails? Yes! You got it. And if you look at the SQL Server services on the “idle” node, you’ll see they’re in a stopped state and set to manual start. That’s by design– it’s the failover cluster that will start the services when it needs to happen.

Would it be possible for each node to host its own SQL instance which fails over to the other one respectively? – Our goal would be to host one 2008 R2 and one 2012 instance of SQL server. Technically speaking, this is called a “side by side deployment”, but I don’t know anyone who recommends it. This question reminds me of the Meatloaf song, “I’d do anything for love, but I won’t do that.” Let’s break down the problems:

Installation: You’ve got to install everything in a very specific order (lowest to highest) on every node. Get anything out of order and you’ve gotta start over.
Maintenance: More updates, more patches, more failovers
Troubleshooting performance or availability problems: Oh, that gets ugly. You’d be running in a scenario that very few people use. Let’s imagine that you start getting unexplained failovers. It’s now much more hairy to troubleshoot.

So I urge you to consider other options. If you do go this route, work extra hard to make sure you have a very similar setup (ideally identical) in a non-production environment to mitigate risk and heartache.

To avoid confusion: For service packs and cumulative updates, you can use a rolling upgrade process on the cluster. That’s still totally cool.

For instant file initialization, you need to set that up on each node or is it “cluster” aware? And do you recommend it for clusters? You have to set it up on each node, and yes, we’d still recommend it.

Other than checking from SQL Server logs, Windows EventViewer logs, can we find the last failover duration? Nope, not outside of those places.

Is clustering available on the express edition? Technically, no, but you can cluster it manually yourself. If you’re considering this one, step back and take a good look at the problem you’re trying to solve– we’re a little worried.

Can I set up fake instance in my DNS that points to myServer\ThisInstance to start porting over to what will be in place after the cluster is in place? Setting up before hardware order is placed we get to tweak existing sprocs? Or is this a BAD IDEA? Yes, you can create a pointer in DNS to do this. I call it a “DNS friendly name”, some people call it a “DNS pointer”, some people just say “CNAME”. This can be very useful.

More Microsoft SQL Server Clustering Resources

Whether you want help choosing between an active/passive and an active/active cluster, or if you’re the kind of DBA who knows that’s not even the right name for failover clustered instances anymore, check out our SQL Server clustering training page.

The Stages of Performance Tuning

My Five Favorite That Conference Memories

55 Comments. Leave new

John
August 26, 2013 8:17 am

I never went through a better explanation of SQL Server clusters every, in books there was this boring description, here it became so vivid with examples and illustrations.

Reply
Hans
September 16, 2013 5:35 am

Excellent explanation, easy to understand, thank you for this.

I still have a question about this: when the failover mechanism detects that it needs to failover the sql services to the other node, does it have to reboot the node that fails, during the failover process?

regards,

Reply
- Kendra Little
  September 16, 2013 8:26 am
  
  Hi Hans,
  
  Great question! No, it does not have to reboot windows.
  
  Reply
Tadej Bajc
October 29, 2013 10:11 am

Hello,

Let say that I have SQL Failover cluster with two nodes and the third node which is not in SQL Failover cluster but it is a member of WFC. I know that it is possible to have AlwaysON feature on this configuration, but it is also possible to have automatic failover.
I am curious how to minimize downtime when failover happening. Can be done this with AlwaysON or the downtime will always be?

Best regards,

Reply
- Kendra Little
  October 29, 2013 10:30 am
  
  Hi there!
  
  I’m afraid I don’t completely understand what your question is. If you are asking if there is always a downtime during failover, then the answer is yes– when SQL Server fails over from one node to another, there is always a downtime. (The duration varies).
  
  If you have always on with a Windows Failover Cluster plus an Availability Group Secondary, automatic failover is only within the Windows Failover Cluster Instances. (FCI).
  
  Hope this helps– and if I misunderstood your question, please clarify.
  
  Kendra
  
  Reply
Josh Haley
December 13, 2013 12:51 pm

Hello,

Is there any compelling reason to have the exact same hardware on the failover node? For instance, do you ever hear of a failover node being slightly less robust than the main server? Or do they really ideally need to be as identical as possible?

Thanks!

Josh

Reply
- Brent Ozar
  December 13, 2013 12:54 pm
  
  Usually when you take in all the factors, the hardware is the least expensive part of the whole thing. You’re talking about shared storage and Microsoft licensing, both of which are more expensive than saving a few bucks on the node sizes.
  
  Plus, if you do patching, you usually want to patch the standby node to prep it ahead of time, then fail over during an outage window, and patch the other node. If both nodes are equally sized, you only have to take one outage – you can take your time patching the other node, and you don’t have to worry about failing back.
  
  Reply
Josh Haley
December 20, 2013 12:12 pm

First, thank you all for your responses (on this and other posts). You are helping immensely.

If the cluster is holding MDF/LDF on the SAN and backups need to be elsewhere, what is recommended? Local wouldn’t make sense as the failover node would not have access to the failed JBOD, etc, so it would need to be a shared resource. What are some good options that still allow good throughput for all of those backups and validations?

Thanks!

Reply
- Brent Ozar
  December 21, 2013 6:18 am
  
  Josh – typically this is done with a standalone file server or a server on a different SAN.
  
  Reply
vikas
February 15, 2014 9:22 am

Cool Stuff, Well Explained.

Reply
Deb
March 1, 2014 4:30 am

Hello , I am in process of creating a cluster environment .
can I create 2 two node cluster with help of 3 physical machine . some thing like A ,B & C .
B Machine will be part of both A & C cluster .
the cluster is like A–B (Active & Passive ) B — C (Active and Passive) .

Reply
- Brent Ozar
  March 1, 2014 7:35 am
  
  Deb – each node can only be part of one cluster, but you can have one cluster with 3 nodes and 2 different services (like 2 instances of SQL Server.)
  
  Reply
Deb
March 1, 2014 10:02 am

Thank you !! One more scenario ,if I have two node cluster (A–B) with Active Passive . Can I add a third node (C) with out disturbing the existing service (Sql Instalation ). That mean making this 2 node cluster to 3 node cluster .

That mean can I have a liberation to add or remove nodes to the existing cluster with out disturbing existing Sql installation .

some thing like I can create/remove nodes dynamically whenever I needed .

Reply
- Francis Apel
  April 15, 2014 10:16 am
  
  Deb,
  
  You can add nodes and even remove (evict) nodes easily (easily being a relative term of course) all without affecting the uptime of your cluster. Just one more reason I love working with clusters.
  
  Reply
Laurence pesdan
April 1, 2014 3:13 pm

Brent, I work on application that is measured in milseconds for DB peformance. Can you share your insight on if Always on is slower in terms of insert/update/deletes to the DB versus Windows failover clustering. Since were are talking a shared SAN versus some form or replication stream my mind tends to think the shared SAN has less overhead and will be faster. we will setup a test to prove this out this theory but i wanted to get your thoughts on the matter.

Reply
- Kendra Little
  April 1, 2014 3:19 pm
  
  Hi Laurence,
  
  I’m not Brent, but I did write this post, so I’m guessing it’s fair game to answer.
  
  With a Windows failover cluster instance there’s one copy of the data. If you’re using an AlwaysOn Availability Group and streaming data across to secondaries, that’s obviously more overhead. Even if you’re using an asynchronous mechanism to stream the data to the AG secondary, it’s more work and of course there could be a performance impact depending on your implementation.
  
  Kendra
  
  Reply
Allan
April 28, 2014 9:20 pm

Hi Brent,
Great read, thank you. Question for you.
Our current setup uses a very high number of small databases versus one big database.
We are looking into setting up a cluster, but we are getting mixes opinions on whether to convert all our individual databases (about 5000) to one big individual database, or keeping our current setup.
Some say that it would be easier to manage one large database, and ensure high availability, while some say that managing a huge number of small databases would not be an issue.
What is your take on this?
Thanks
A.

Reply
- Brent Ozar
  April 28, 2014 9:25 pm
  
  Hi! We’ve got a post about that here: https://www.brentozar.com/archive/2011/06/how-design-multiclient-databases/
  
  Reply
  - Allan
    April 28, 2014 10:03 pm
    
    Great, that looks exactly like what I was looking for, you just made my day 😉
    
    Reply
Miles Schor
July 2, 2014 11:31 am

Hi Kendra,

Great explanation of HA and SQL Clustering. Now for my question. I am confused on how to best setup SQL HA in a Virtual environment. Recently I built out a 2 node test SQL cluster on a VMware Host. But this would not afford me true HA for SQL Server the same way you have explained here. If there is a failure of the VM host, the SQL cluster is useless. HA would have to be done leveraging vSphere and v Motion. Then there would have to be a cold migration of SQL to the new host. And the SQL services would not be monitored in the same fashion they are in a SQL cluster. So I am looking for advice and direction on what the best way is to getting the same results you show here but in a virtual SQL environment.

Thanks for your help.

Your Old Friend, Miles (I remember you when you were just a baby DBA)

Reply
- Kendra Little
  July 2, 2014 11:39 am
  
  Hi Miles! Hope you’re doing well.
  
  Yeah, having more than one VM in a cluster on the same host kind of defeats the point. Most hypervisors have ways for you to set up rules that make sure that the VMs will always be on different points to limit your exposure to single points of failure like that.
  
  However, things get tricky with virtualized clusters. Network communication is incredibly important to keeping the failover cluster online — failures or even planned migrations of vms from host to host can cause interruptions in network communication and you could potentially lose quorum and have an outage.
  
  Vmware and hyper-v both have some good documentation for best practices if this is something that you absolutely need to do. If you have the option I would definitely start with a non-virtualized cluster and since it’s less complex. Brent writes more about that here: https://www.brentozar.com/archive/2012/09/why-your-sql-server-cluster-shouldnt-be-virtualized/
  
  Great to hear from you!
  Kendra
  
  Reply
Mohan Chhapadia
July 4, 2014 10:06 am

Hi Kendra,

Excellent tutorial video for a high level clustering.

We are currently on virtual environment where we have 10-15 number of virtual servers running in our organization and we are looking to move to clustering.

From your video I understand it’s not much of a wise choice going for clustering on top of a visualized environment due to potential network connectivity etc.

Could you advise the best option in case management doesn’t work to spend much money?

Best regards,
Mohan

Reply
- Brent Ozar
  July 4, 2014 10:09 am
  
  Mohan – sure, check out our High Availability and Disaster Recovery Planning Worksheet. It lets management pick how much money they want to spend and how much data they’re prepared to lose, and then shows what your options are:
  
  https://www.brentozar.com/archive/2014/05/new-high-availability-planning-worksheet/
  
  Reply
Daisuke
November 14, 2014 10:46 am

Hi,
Thank you for the video and Q&A. Totally great and helpful.

I get a little confused regarding this question’s answer:”Would it be possible for each node to host its own SQL instance which fails over to the other one respectively? – Our goal would be to host one 2008 R2 and one 2012 instance of SQL server.”… (Not because of the song reference; that song I know).

Confusion arises because I’ve read that you would rather use a multi instance failover cluster (i.e to benefit from upgrade testing). So the difference here is that with your proposed multi instance failover cluster, only the critical/most important database fails to the other node while the “lesser-important databases” will not fail over the other one?

Thanks in advance!

Reply
- Kendra Little
  November 14, 2014 1:35 pm
  
  Hi there,
  
  If I have a choice and performance and availability are important, I’d always rather have just one instance of SQL Server on its own cluster. That’s because there’s no risk of multiple instances having to run on a single node and impacting each other’s performance. To mitigate the risk for testing, you want to have a non-production cluster that you can burn in patches / firmware updates / configuration changes on.
  
  Does this cover your question? I’m not 100% sure I understood everything you’re asking, so just let me know if I answered the wrong question.
  
  Kendra
  
  Reply
  - Daisuke
    November 25, 2014 4:14 pm
    
    Yes, thank you very much again.
    
    Reply
    - Sergio Diogo
      January 29, 2015 5:33 am
      
      Hi Kendra,
      
      I also have a question on regards to active-active configuration (in our case we are considering to double the resources on each node to accomnodate cases where both instances are running on the same node).
      
      Can versions of SQL Server be different or they need to be the same, on this active/active configuration?
      
      Thanks and Regards,
      Sergio
      
      Reply
      - Kendra Little
        January 29, 2015 9:31 am
        
        Great question. Search the post for the phrase “side by side deployment” – that’s what this is called. In the example I answer a question about mixing SQL Server 2008R2 and SQL Server 2012, but the answer goes for mixing other versions as well. (It’s a long post, took me a minute to find this section myself, and I knew I wrote it!)
Raja Adepu
January 29, 2015 7:52 am

Excellent video and explanation. Good one to start learning SQL Cluster.

Reply
Ganesan
April 29, 2015 11:01 pm

I am looking for some installation/configuration steps for a active/active/passive cluster ,

2 active nodes which has one failover node.

Let me know if you have any links/docs to follow

Reply
- Brent Ozar
  April 30, 2015 6:22 am
  
  Ganesan – once you get past a single-instance (active/passive) cluster, every cluster is different and needs its own installation/configuration steps. There’s no simple checklists for those, unfortunately.
  
  Reply
KL
May 7, 2015 8:56 am

Hi Kendra,

I want to ask about the failback option for cluster resources. When looking at the failover cluster manager, I do not see any failback option set. We had an incident where something triggered a failover. Then, about a week later, that resource failed over again (back to its original preferred node). The explanation we were given was that it had simply failed back to its preferred node like it is expected to. I had read about what preferred nodes really mean so I have my doubts… Is that possible without setting a failback option? Is there somewhere else where this can be set?

Reply
- Kendra Little
  May 7, 2015 10:04 am
  
  It shouldn’t fail back unless that’s specifically configured. Who gave you that explanation?
  
  Reply
  - KL
    May 7, 2015 10:08 am
    
    Thanks for your comment.
    
    The Windows team gave the analysis.
    Is there a way I can go back to see why the instance failed over? Looking at the critical events in the failover manager, it simply says that the resource failed.
    
    Reply
    - Kendra Little
      May 7, 2015 10:11 am
      
      The Windows team as in Microsoft support? Or do you mean a sysadmin team internally?
      
      You can dig into the Windows clustering logs specifically, but do allocate a bunch of time for it– it can take a bit of elbow grease to slog through the details and separate out messages that just look weird (but are normal) from those actually related to the problem.
      
      Reply
KL
May 7, 2015 10:28 am

An internal sysadmin team.

The Windows clustering logs can be found in Application and Services Logs -> Microsoft -> Windows -> FailoverClustering -> Operational, correct?

I am very curious to find the actual root cause so I think I will investigate further.

One other SQL instance in the same cluster has the Priority Boost option enabled, so I’m not sure if the strangeness we see with around this cluster has something to do with it? Could that be possible?

Thank you.

Reply
- Kendra Little
  May 7, 2015 10:38 am
  
  I would start with the plain old system and application event logs, but yes, you can go further in event viewer. You might want to also use get-clusterlog and dump information to a file you can copy off and work through: https://technet.microsoft.com/en-us/library/ee461045.aspx
  
  Priority Boost is definitely something that could cause unexpected failures on a cluster because it could impact the network for the entire node. Check out the limitations and warnings in Books Online: https://msdn.microsoft.com/en-us/library/ms188709.aspx
  
  Reply
KL
May 7, 2015 10:57 am

Thanks for all that advice. Extremely helpful!

I’ve warned about the Priority Boost but having the system available 24×7 is keeping us from turning it off. Need to find a time window for it.

Get-ClusterLog is not a recognizable cmdlet on the server. I’ve tried “cluster log” and that seemed to have generated something. However, it did not generate enough logs to go far enough to the date when the incidents occurred. I think it only went back as far as the last time the server was rebooted. Am I missing an option, maybe?

The Windows Event logs though, go back to the date when it happened.

Thanks again!

Reply
danx
August 13, 2015 5:39 am

awesome! thanks
never thought id enjoy clustering as much 😀
cheers
dan

Reply
Andrew
October 14, 2015 3:44 pm

I have an existing FCI inside of a 2012 Availability Group and have a need to make an additional secondary into another FCI of its own but must remain as a secondary in the AG to run additional queries against without hitting the main SQL server. I am curious if this is something that can be done with or is it something where they must be standalone instances as secondaries.

Thank you,

Andrew

Reply
- Kendra Little
  October 14, 2015 3:54 pm
  
  You can have more than one failover cluster in an AG. You only get automatic failover within the failover cluster for the primary replica, and they all have to be part of the same windows cluster/ same domain.
  
  Reply
  - Andrew
    October 14, 2015 4:01 pm
    
    Thank you for the reply!
    
    Just so I have it right I can have a second instance of an FCI in the AG but that second FCI will not automatically fail over between hosts?
    
    Reply
    - Kendra Little
      October 14, 2015 4:09 pm
      
      Let’s say you have a 2 node failover cluster in Dallas, and a 2 node failover cluster in Houston, and they’re part of the same AG. The AG syncs data between the two “clusters”. (At this point the word cluster gets super confusing, because they ALL have to be part of the same Windows cluster.)
      
      Right now databases in the AG are primary (read-write) on DAL01. You only get automatic failover to DAL02.
      
      If you manually fail the AG over to HOU01 so it is now the primary replica, after that point automatic failover is only possible to HOU02.
      
      I think I have that written down properly, I’m a little cross eyed now. This is easier with pictures.
      
      Reply
Wayne
November 18, 2015 2:38 pm

Hi,

I am trying to understand the cause of recent cluster failovers in our environment. We use 2 X Cisco UCS blades creating a cluster with fiber shared storage running sql2014 on Windows Server 2012 R2 STD. When we failover the UCS for maintenance on the chassis the cluster instances fail. The logs show disk loss and network loss. We have checked the configuration on the ucs side and its all redundant. we don’t have windows teaming on the nics but cisco network failover should take care of moving the network traffic and mpio should take care of moving the shared storage. Have you had any experience with what I am describing here that could help me, even if it leads me to search I the right place to start me on the path to finding the cause ?

Thanks,
Wayne

Reply
- Brent Ozar
  November 19, 2015 7:26 am
  
  Wayne – troubleshooting a broken production cluster is a little outside of what we can do in a blog post comment, but if this is a production cluster, I’d start with a support call to Microsoft. It’s $500, and they work the problem with you until it’s done.
  
  Reply
Zahidul Hassan
December 21, 2015 7:49 pm

Thanks guys for the post and valuable comments. Can anyone share any resource on SQL 2012 De-clustering.

Reply
- Brent Ozar
  December 22, 2015 7:52 am
  
  Zahidul – what do you mean de-clustering?
  
  Reply
  - Zahidul
    July 4, 2016 6:48 pm
    
    Hi Brent, I have one SQL 2012 cluster. Now I like to break the cluster and run separate instances. This is a Windows Server 2008 r2 cluster with SQL Server 2012.
    
    Reply
Gert
July 4, 2016 4:53 am

Hi

Thank you for this explanation.

I was wondering how (SQL)instance options are handled. Say you change the memory-settings, will that propagate through to the other node?

Reply
- Brent Ozar
  July 4, 2016 7:03 am
  
  Yep!
  
  Reply
Ulvi Babashov
October 18, 2016 8:07 am

Hello

I have 2 windows failover cluster and each of them consist of 2 nodes (virtual machines). And on each cluster has its own ms sql role which uses seperate shared discs :
1. wincluster1 (wsfcnode1 and wsfcnode2) – sqlclust1
2. wincluster2 (wsfcnode3 and wsfcnode4) – sqlclust2

For example on the sqlclust1 side I have Testdb database.

Question : Can I setup alwayson available group for this database between sql failover clusters?

Reply
- Brent Ozar
  October 18, 2016 11:21 am
  
  Ulvi – for Q&A, head over to a Q&A site like http://dba.stackexchange.com.
  
  Reply
Derek
December 8, 2016 2:57 am

Hi Kendra
We are migrating from a mirrored sql environment to a cluster.
With an ‘active/active’ set up what is the best practice for the tempdb files in the SAN for each intstance.
Can they be in the same location ( that would be confusing) or separate data file and log file location?

Is there any best practices how to organise SAN ?

Reply
Albert Wienckoski
January 31, 2017 5:34 pm

HI Kendra or Brent,

I already have sql 2008 instance already installed on a server and now want to install sql cluster fail over on that instance. I want to know is it possible to install sql cluster on an already existing sql instance and if so what is the proper way to do it?

Or do I need to install a new sql cluster instance and the import my other instance into the new sql cluster.

Reply
- Brent Ozar
  February 1, 2017 6:59 am
  
  Albert – get the book Pro SQL Server 2008 Clustering by Allan Hirt:
  
  http://amzn.to/2kR1igA
  
  Reply