Need High Availability for SQL Server? Use Windows Server 2012.

SQL Server DBAs are slow to adopt Windows Server 2012 and Server 2012 R2. I frequently find teams planning new implementations of SQL Server 2012 on Windows Server 2008 R2.

There seem to be two main reasons for this:

  1. DBAs want to change a minimal number of factors, and are slow to trust a new version of the OS. Windows 2008R2 feels “safe.”
  2. Windows Server 2012 was hard to love when it first came out. With no Start menu, Windows 2012 was more baffling than exciting to most production DBAs.

But these reasons don’t hold up– and staying on an old OS isn’t doing you any favors. If you’re planning a SQL Server 2012 (or upcoming 2014) installation, you should also be using Windows Server 2012 or Server 2012 R2.

Windows Server 2008R2 isn’t “Safer” or “Simpler” than Server 2012 R2

Let’s break down that first concern above. The essense of it is that staying on Windows Server 2008R2 seems less risky than using a new OS.

But let’s look at that closely. The most recent service pack for Windows 2008R2 was released in March 2011— that was Service Pack 1. (I bet I’m not the only person who has searched in vain for SP2 repeatedly, thinking I must have just missed it.)

Lots of fixes have been released since Service Pack 1. Here’s a few highlights:

Which patches you need varies by your configuration, and this is just the tip of the iceberg.

When you’re using high availability solutions, the list of patches you need to consider just gets bigger– and the risks get higher. In this post Brent describes a cluster-specific bug that was released after Stack Exchange experienced Availability Group outages due to a clustering issue.

Although Windows 2008R2 has been out for a while, managing an Availability Group or a Windows Failover Cluster on it isn’t simpler– it’s actually more complicated! You’ve got more patches you need to be aware of and manage, and you still need to keep an eye out for new fixes as they’re released.

Failover Clustering has huge new features in quorum

Windows Failover Clustering is increasingly critical to high availability in SQL Server– it’s the underpinning for clustered SQL Server instances and for AlwaysOn Availability Groups.

Don’t get me wrong– Windows 2008 R2 has some pretty nice features when it comes to setting up a cluster, particularly compared to older versions of Windows. I love the Cluster Validation Wizard. I find the Failover Cluster Manager snap-in to be pretty intuitive and easy to manage.

But Failover Clusters have grown up significantly– particularly when it comes to your options with making sure that your applications stay online. And having more ways to keep applications online is probably the exact reason that you’re using features involving clustering!

Dynamic Quorum and Vote Assignment – Windows Server 2012

Windows Server 2012 introduced “Dynamic Quorum“. Dynamic Quorum lets the cluster add and remove votes for nodes if they shut down or fail. This makes it more likely for the cluster to maintain a majority if you have multiple sequential failures.

Windows Server 2012 also lets you easily customize quorum configuration and remove votes from some nodes if you’d like to. To remove node votes with Windows Server 2008R2, you’d have to know to install KB 2494036 and then reconfigure quorum.

Bonus: as of Windows Server 2012, the Windows Failover Clustering feature is part of Windows Standard Edition.

Dynamic Witness, Tiebreakers – Windows Server 2012 R2

Who's got the votes? Much easier to see in Server 2012 R2.
Who’s got the votes? Much easier to see in Server 2012 R2.

Windows Server 2012 R2 introduces even more quorum features. The Dynamic Witness feature lets the cluster decide when it will or won’t give a vote to the witness you have configured.

This means that if you have multiple sequential failures, the cluster has more options and is smarter about keeping the right number of votes available. It also means that the guidance about creating a witness is simpler: since the vote can be managed dynamically, you should always create a witness.

Windows 2012 R2 has more features, too– the cluster is smarter about what happens when things start to come back online after you force quorum, and it has options for specifying a tiebreaker if you have multiple failures at the same time.

And it’s a simple thing, but I love it the most: in 2012 R2 you can see what’s happening with dynamic votes for nodes by using the Failover Cluster Manager (no PowerShell commands required). This makes testing different failure scenarios much more straightfoward as you can quickly and easily see how many votes there are, and who’s got them.

Oh, and they fixed that Start menu thing, too.

You can’t easily migrate your Windows 2008R2 Cluster to Windows 2012+

Right clicking on the Start Menu in Windows Server 2012 R2
Right clicking on the Start Menu in Windows Server 2012 R2

The final reason I wouldn’t plan a new highly available environment on Windows 2008R2 is simply that I don’t want to be stuck there. You might think, “We’ll start out on a known OS and then upgrade soon” — but that’s easier said than done.

If you’re on a Windows 2008R2 cluster and you want to upgrade, you will need to create a new, separate failover cluster using Windows Server 2012, and then migrate to it. That requires separate storage, separate servers, and a careful migration plan. It’s absolutely doable, but it’s not simple, cheap, or easy. (Note that the built-in “Migrate a Cluster” wizard doesn’t support SQL Server installs for Windows 2008R2 -> Windows 2012.)

What about SQL Server 2008? Or 2005?

If you’re doing a new server buildout, you may decide to install SQL Server 2008 R2 on a Windows 2012 / 2012 R2 failover cluster. That’s supported, but to get installation to work you may need to use a slipstream installation and take a few other steps. (I think slipstreaming is great, but budget extra time to get everything tested and working.)

If you’re thinking about running SQL Server 2005 on Windows 2012 or Windows 2012 R2, that’s just not supported by Microsoft.

It’s Time to Embrace Our New OS Overlord

I really like Windows 2008 R2. I’ve used it for years, it’s very familiar, and I feel like I know its quirks. But I also recognize that it’s got a lot of baggage. It’s not really simpler or easier to manage.

The improvements to Windows Failover Clustering features in later versions of the operating system should be very compelling to anyone considering clustering SQL Server or using Availability Groups. It’s not always perfect and it’s certainly complicated, but it’s much better than what you’ve got in Windows 2008R2.

It’s time to start planning your new SQL Server implementations with Windows 2012 and higher.

Previous Post
Top Classic Country Songs for SQL Server DBAs
Next Post
The Worst Database User

15 Comments. Leave new

  • We had SQL 2012 with AlwaysOn Availability Groups on Windows 2008 R2 powering Stack Overflow for a time. I seriously cannot stress how bad of an idea that is, take Kendra’s advice here and upgrade. Windows 2008 R2 clustering and its interaction with SQL 2012 has caused us more downtime than any other piece of infrastructure, by far.

    There are many problems with Windows 2008 R2 clustering that will cause your SQL nodes to fail when they shouldn’t – and in SQL 2012 that means nodes effectively going offline not read-only as they will in 2014.

    Just don’t even think about it, unless you really like pain and low availability.

    Reply
  • Totally agree, since moving to Windows Server 2012, fail over goes a lot faster and smoother too when doing planned patching or anything else, overall a lot easier to manage.

    Reply
  • worst part about windows 2012 is you need recent hardware. we use HP Proliant servers and Generation 5 servers from 2007 are still very usable in production. but you can’t put windows 2012 on them.

    if we wanted to install windows 2012 and sql 2012, other than the licensing costs you are looking at $50,000 or so for two new servers. if you don’t have any Proliant G7’s that are free then you need to buy more for your DR site

    Reply
    • Kendra Little
      March 6, 2014 3:05 pm

      Alen,

      Will it actually not install due to a driver issue, or is the issue that it’s not supported by HP?

      For many people hardware that old is going to be considered end of warrantee/end of life, and not suitable for production. However, it could still be used in a staging or pre-production environment– in which case if it installs it might be just fine even if it’s not technically supported by the manufacturer. So I’m curious.

      Kendra

      Reply
      • I’ve always used HP Smartstart and from what i read it’s not an option when you go through their startup script. might work with a Windows 2012 dvd or on a flash drive

        you can still buy support for G5’s from HP and third parties and the servers are pretty good and powerful enough for some production work. each one can take around 2TB of usable storage if you max out all 8 drive bays. we have a cluster of them running great plains and a few other SQL servers.

        i’m waiting for a new G8 to arrive today or tomorrow, but most of the workload that we are going to put on it can easily be done on a G5. Same with another G8 i set up yesterday.

        going to have a few free ones soon after we consolidate 10 different SQL servers onto 2 new ones and will use those for testing. for Windows 2012 i’ll see if i can put on 2008 R2 and then 2012 onto a hyper-v instance

        Reply
    • Victor McElwain
      September 23, 2014 8:50 am

      I would simply install VMWare 5.1 or 5.5 and then install windows 2012 as a VM even if it’s dedicated to one VM. It’s an easy and cost effective work around if your essentially trying to get more life out of your legacy hardware. Even if you use the free license your going to be better off.

      Reply
      • Victor – can you elaborate on how that protects you from a Windows patch problem, SQL patch problem, C drive filling up, BSOD, etc? Those are issues that are considered SQL Server high availability needs.

        Reply
  • KB 2553549 is a great Windows 2008 bug, especially when it happens across a customer’s whole estate at the same time (because their servers were all installed on nearly the same date and they don’t patch).

    Reply
    • I have a HP proliant DL160G6 running 2008 standard 64bits, anyone know if KB2468345 aplies? We have serious disk performace issues.
      I updated Raid P410 Driver from HP today, but still having bad disk performance.

      Thkns

      Reply
  • I was about to buy some memory for one my main production clusters, taking the nodes from 256GB per node to 768GB per node, when I saw this article.

    Does anyone know if the hot fix for KB 2566205 (Scalability of the I/O performance decreases in Windows Server 2008 R2 if more than 256 GB of physical memory is used) resolves the issue or should I just stick with 256GB?

    Reply
    • Paul – I haven’t seen that as an issue myself, but typically when we see environments that need over 256GB of RAM, we also recommend that they get on the current versions of Windows for other reasons, so we’re probably not the best folks to ask.

      Reply
  • Hi Brent

    I have a 2 Node cluster on nearly same location . I am using windows 2012 r2 OS and SQL 2014.

    I am bit confused over which quorum mode I should choose.
    Also, when can i use dynamic quorum.Please if you can help!
    Thanks

    Reply
  • Thanks for it Brent. Just now saw the video. I don’t see the dynamic quorum configuration covered on this.
    Actually wanne know if I need to configure dynamic quorum explicitly in cluster or will it takes automatically.
    As I have 2 node cluster with file share on.third server. So shall I chose dynamic quorum setting and choose file share over there.

    Reply
  • I have two node cluster ( win 2012 ) + SQL Server 2016 (std edition) Alwayon (primary = node1 and secondary = node2 ) ( Active -passive).
    I have below issue faced,
    (1) Some reason my primary server ( node1) restart than control goes to Secondary (node2) becomes primary
    Some reason after restart take more time to up , In this time we are unable access query and taking more time from node2 (currently primary ) . What was issue ?
    After node1 up than after query processing fast .
    (2) Some reason my primary reason shutdown and control in node2 becomes primary but we are unable to access query .
    How to shortout this type of issue ?
    (3) Which command for failover primary to secondary ?
    can you send me script for failover ?
    So i will create SQl agent Job for failover .

    Thanks

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.