Blog

The Second Concern for SQL Server Virtualization: Recoverability

Your first concern is licensing, and once you’ve got that under control, it’s time to talk about backups, restores, and uptime.

The business begins by defining the RPO and RTO requirements for high availability, disaster recovery, and “oops” deletions. Use our High Availability Planning Worksheet and fill in the “Current” column on page 1:

High Availability Worksheet Page 1

High Availability Worksheet Page 1

You need to give the business a rough idea of the current state – how much data you’ll lose under each of those scenarios, and how long the SQL Server will be down. Be honest – don’t overstate your capabilities, because that just means you’ll get less budget money to build the system the business wants.

Then let the business users fill in the “Biz Goal” column for each scenario. Of course, by default, everyone wants to pick zero data loss and zero downtime, but that’s where page 2 of the worksheet comes in:

High Availability Worksheet Page 2

High Availability Worksheet Page 2

Give both page 1 & 2 of the worksheet to the business users and let them pick the right availability requirements for their budget. Technical managers will want to leap to specific methods (“we have to use VMware replication”) but keep steering the conversation back to the big picture – what does the database need?

The costs aren’t meant to be exact estimates, just rough ballpark numbers of the hardware, software, installation, and ongoing maintenance costs for a couple/few years. To learn more about the technologies in each box, check out our High Availability and Disaster Recovery Resources page.

If your business checks different boxes for each of the three scenarios (HA, DR, and OOPS), then you’ll probably end up with a mix of different SQL Server technologies. For example, the OOPS scenario is usually handled by transaction log backups and a separate standby restore server so that you can pull out just the data you need, but that isn’t going to help you for high availability if you need 1-minute automatic failover with zero data loss.

After the business checks boxes for high availability, disaster recovery, and “oops” deletions, it’s time to pick the right recoverability option. There are virtualization-friendly options in every box, but it’s important to note what isn’t in these boxes.

VMware, Hyper-V, and Xen Alone Aren’t High Availability.

Virtualization admins often think, “If something goes wrong with hardware, the virtual server will just start up on another host.” VMware, Hyper-V, and Xen all do a great job of recovering from these kinds of failures, but those aren’t the only kinds of failures we need to avoid. Note the list of failure types in our High Availability scenario.

The not-so-fine print

The not-so-fine print

Sure, hypervisors protect you fairly well from Windows crashes, RAID controller failures, bad memory chips, or somebody unplugging the wrong box.

But what about patching and OS-drive-full type problems?

I’ve heard admins say, “No problem – before we do any patching, we’ll shut SQL Server down, take a snapshot of the VM, and then start the patch. If anything goes wrong, we’ll just roll back to the snapshot. No data loss, not much downtime.”

Oh, I wish. Let me tell you a story.

One of my clients was preparing a new SQL Server for production. As part of their prep, they needed to apply SQL updates to it, so they started the installation process, and …

A few minutes later the phone calls started pouring in.

Because they were patching the wrong box. They were patching the production SQL Server VM, not the new VM. To make matters worse, they ran into the SSISDB bug, and their production server was down for hours while they figured it out.

How Recoverability Influences Virtualization Design

When you’re designing solutions for HA, DR, and OOPS, read the scenarios described in this simple worksheet. Expect that sooner or later, at a time you can’t predict or control, every one of these is going to happen to you. (Well, maybe not zombies in the data center.) Your technical solution is driven by the business’s requirements for RPO/RTO in each scenario. Understand what virtualization alone can give you, and when you’re going to have to add SQL-Server-level solutions.

Your design then needs to take into account one more challenge: capacity. How many databases will you have, how large will they be, and how fast will the data change? For example, a solution involving log shipping, a 1TB database, a 10% change rate per day due to batch load jobs, and a 1Gb Ethernet pipe probably isn’t going to be a 1-minute-data-loss solution.

It all boils down to one simple question:
can you back up, DBCC, and restore the databases fast enough?

Not only do you have to answer the question for today, but for the life expectancy of the server as well. In my next virtualization post, I’ll explore capacity planning as it relates to RPO/RTO, and then we’ll be able to put the whole picture together of what our VM environment will look like.

The Very First Concern with SQL Virtualization: Licensing

Every single thing you do with a virtual SQL Server starts with one very important concern: how are you going to license it?

The 3 Common Ways to License Virtual SQL Servers

The software licensing section of my local library

The software licensing section of my local library

Developer Edition, by the guest – Developer Edition has all the power of Enterprise Edition, but it’s way cheaper (like in the $50/server range) and you’re not allowed to use it for production purposes. Because the licensing is so cheap, just mix these VMs in with the rest of your guests on a shared pool of hosts. Developer Edition licensing is handled differently than other SQL Server editions, typically bundled with your developers’ MSDN subscriptions.

Enterprise Edition, by the host – When you have 5 or more guests, it usually makes sense to buy a dedicated pair of host servers in an active/passive configuration. For example:

  • Dell R720 Primary – 2 sockets, 4 cores each, 768GB memory – ~$25k USD
  • Dell R720 Secondary – same hardware config – ~$25k USD
  • 8 physical cores of SQL Server Enterprise Edition licensing and Software Assurance – ~$55k USD (you only have to pay licensing on the physical cores regardless of whether you enable hyperthreading)
  • Hypervisor, Windows, management software licensing, shared storage – left as an exercise for the reader

You can configure your hypervisor (VMware, Hyper-V, Xen) to use one of the hosts as the primary, and when it fails, automatically boot up the guests on the secondary R720. This way, you only have to license one of the two hosts, and you can run an unlimited* number of SQL Servers on this host. (* – Not really.) Even better, even though they may be small VMs, you can still use Enterprise Edition on these guests and get access to cool EE-only features like online index rebuilds.

As your SQL Server needs grow, you can license the Secondary and balance the VM load across both, and even start to add additional licensed hosts. The key, though, is to avoid running any non-SQL-Server loads here because these are your utmost expensive host cores.

Standard Edition, by the guest – When you have 4 or fewer SQL Server instances, and they don’t need enough hardware horsepower to merit a dedicated host pool, then you’re usually best off licensing each guest with Standard Edition. This way, you can intermingle them on the same hardware that the rest of your virtual guests use.

At around $2k USD per core, it’s roughly 1/4 the price of Enterprise Edition, but you have to keep an eye on the number of virtual cores you’re using. You may read training material that says “just add vCPUs when you have performance pressure,” but this is expensive advice when you’re licensing each guest by the core. As you approach 16 vCPUs of Standard Edition licensing, you need to build a plan to migrate over to the host-based Enterprise Edition licensing approach. 28 vCPUs of Standard costs about the same as 8 vCPUs of Enterprise, but only the latter lets you run an unlimited number of guests, use over 64GB of memory on SQL 2012 and prior, do online index rebuilds, etc.

How Licensing Affects Your Administration

If you need SSRS, SSIS, or SSAS, and you’re licensing hosts with Enterprise Edition, you simply deploy these other services on separate virtual servers. You never make these other services fight with the core SQL Server engine for the same memory. However, if you’re on Standard, you saved money on licensing, but you spend extra performance troubleshooting time. You’ll need to install these services on the same VM as the engine to keep costs down, but you’ll need to understand how to triage performance.

When you design a high availability solution, the edition determines your choices as well. If you’re on Standard, and you need automatic failover for OS & SQL problems, you’ve backed yourself into a corner of either failover clustered instances or synchronous database mirroring. These solutions add complexity that shops without a DBA often can’t afford, so they end up going without OS/SQL protection and sticking with the simpler hypervisor-based failovers. On the other hand, once you’ve decided to license the hosts with Enterprise Edition, you can take advantage of AlwaysOn Availability Groups and asynchronous database mirroring without extra licensing costs – even for your smallest VMs.

Knowing your licensing means faster decision-making on architecture, and that’s why you want to start your virtualization projects here. Licensing is usually the most expensive and most restrictive part of the project – get it right first.

To learn more about licensing SQL Server, check out the 2014 Licensing Guide PDF.

SQL Server Virtualization Q&A Session [Video]

The PASS Virtualization Virtual Chapter hosted a Q&A session with me last week. We talked about storage configuation options like VMDK/VHD vs RDM, how licensing works, what’s the biggest SQL Server I’m comfortable virtualizing, and much more:

For more tips, check out our virtualization resources page.

Brent Answers Your Virtualization Questions

I’m doing a 1-hour open Q&A session on May 8th for the PASS Virtualization Virtual Chapter.  Bring your VMware and Hyper-V questions about setup, performance, management, monitoring, or whatever, and I’ll answer ‘em.

You can even get a head start here – post your questions in the comments below, and I’ll build slides to answer ‘em ahead of time.  That way you can make sure you get the best answer possible.  (Well, from me anyway, ha ha ho ho.)

Then come join us on the webcast and hear the answers. See you there!

VMware HA is Not Database Server High Availability

I served with High Availability. I knew High Availability. High Availability was a friend of mine.  VMware HA, you’re no High Availability.

See, for us database administrators, high availability means protection when:

  • The system drive fills up because some potato decided to download a bunch of files
  • An operating system or database server update goes horribly awry
  • Or even when an OS or SQL update goes right – because the beauty of real high availability solutions is that they let you patch the standby node first, make sure it works, and then fail over to it so you can patch the other node.

Don’t get me wrong – I love VMware, and I love using VMware HA for database servers.  It’s a fantastic way to get higher availability for those old dinosaur database servers running SQL Server 2000 that we just can’t kill, yet still run important apps.  But in systems where uptime really matters, a single virtual machine isn’t the answer to high availability.  That’s where solutions like clustering, database mirroring, replication, and AlwaysOn Availability Groups come into play.

Thankfully, there’s good news: when VMware HA is paired with SQL Server technologies, they can both work even better.  Two standalone physical database servers running AlwaysOn Availability Groups are more reliable than just one server, but two virtual machines doing the same thing are even more reliable.  They’re protected from hardware failures because they can be spun up on any VMware host in the datacenter.  They’re more flexible because we can add CPU power or memory quickly based on demand.

I’ve blogged about why your SQL Server cluster shouldn’t be virtualized, and that still holds true.  If you need to build a hybrid AlwaysOn solution involving both failover clustered instances (FCIs) and standalone instances, I would rather not put the FCIs in VMware first.  But if you’re under pressure from management to cut costs and cut your datacenter footprint, put the rest of the instances in virtual machines.  You’ll gain the power and comfort you want from physical machines while getting even higher availability from the virtual machines.  Everybody wins, and the future will be better tomorrow.

Wanna learn more? Check out our VMware, SANs, and Hardware training videos. It’s a 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview:

Buy it now.

Big Changes for Big Virtual Machines in VMware vSphere 5

Even if your SQL Server is the only guest on a host, it still might not be as fast as bare metal.

One of the reasons is NUMA, which stands for Not Ur Momma’s Architecture.  Okay, no, smart reader, you caught me – it actually stands for Non-Uniform Memory Access.  In your momma’s architecture (Symmetric Multi-Processing), any CPU could access any memory all at the same low price.  In today’s NUMA servers, a single motherboard with two CPUs and 128GB of memory can actually be divided into different nodes.

Exhibit A1 – Mmmm, steak sauce.

When a process running on CPU #1 wants to access memory that’s directly connected to it, that’s local access, and it’s fast.  However, when that same process wants to grab data stored in CPU #2’s memory, that’s remote access, and it’s not as fast.

The performance penalty of remote memory access varies greatly from system to system, and you can measure it with Coreinfo from Sysinternals.  (That Russinovich knows everything.)  Blogger Linchi Shea went so far as to test the overhead of local versus remote access on one particular system, and he saw about 5% performance reduction.  He considered that the worst case scenario for the server hardware he was using, but keep in mind that the situation will be much worse on servers with higher costs for remote memory access like IBM’s x3850 and x3950.

Windows exposes NUMA configuration details to applications, and it’s up to the app to tune itself appropriately.  SQL Server has been NUMA-aware since 2005, and Microsoft’s continued to add improvements for it through 2008 and 2012.  To learn more about how SQL Server handles NUMA, check out Gavin Payne’s SQLbits presentation, The NUMA Internals of SQL Server 2012.

How Virtualization Screws Things Up

The good thing about virtualization is that it abstracts away the hardware.  You can run any virtual SQL Server on any server in the datacenter without a reinstall.  You can even move virtual machines from one host to another, live, without a service restart – even if the underlying hardware is completely different.  You can use multiple VMware hosts with completely different NUMA architectures – different numbers of cores per NUMA node, different amounts of memory per node, etc.

In order to pull this off, virtualization just presents a lump of CPUs and memory to our guest.  Our virtual machine has no idea what the underlying NUMA configuration is – and it can’t, because it could change at any time when we’re moved from one host to another.  This isn’t a performance problem for most apps because they don’t need to know anything about NUMA.  They just want a lump of CPUs and memory.

Unfortunately, this is a performance problem for SQL Server because it actually wants to know the underlying configuration – and wants to tune itself for it.  This is why when even running on a host with no other guests involved, performance still won’t match bare metal.

How vSphere 5’s Virtual NUMA Fixed Things Up Again

There are three key decisions that will make your life easier (and possibly your performance better).

First, isolate your virtual SQL Servers onto their own hosts.  With SQL Server 2012’s licensing, when you buy Enterprise Edition for the host’s CPUs, you get unlimited virtual machines on that host.  For a while, this wasn’t easily doable in VMware because of their incredibly stupid memory limits with licensing, but thank goodness they fixed that license stupidity recently.  I can’t imagine a software vendor being dumb enough to limit their product to 64GB of memory in this day and age. <cough>sqlserverstandardedition</cough>  I’m so glad VMware listened to their end users and fixed that limitation.  <cough>microsoftpayattention</cough>  Restricting a database server to just $500 worth of memory, why, that’d be like releasing a tablet with 4 hours of battery life.  <cough>mylastpostasanmvp</cough>

Second, in that pool of hosts, use identical hardware running vSphere 5.  All of the hosts need to have the same NUMA architecture.  This does come with a drawback: it’s harder to do hardware refreshes.  Most shops just buy new hardware as it becomes available, throw it into the VMware cluster, and let VMware DRS automatically rebalance the load.  Unfortunately, the SQL Servers won’t be able to vMotion onto this hardware if it has a different NUMA configuration.  The guests will need to be shut down at the next maintenance window, get a different NUMA config, and then be booted on the appropriate hosts.

Finally, configure vSphere 5’s Virtual NUMA on your guests.  This is done automatically for guests with more than 8 vCPUs, but at 8 or less, you’ll need to enable it manually.  Presto, SQL Server will see the underlying architecture and tune itself appropriately.  (Well, not entirely appropriately – now SQL Server just solves the easy problems for you, and creates new hard problems.)

To enable virtual NUMA on VMs with 8 or less vCPUs, follow the instructions on page 41 of the Performance Best Practices for VMware vSphere 5.1 PDF.  And hey, while you’re in there, get your learn on – it’s an excellent resource for SQL Server DBAs who want to know if their shop is doing things right.

Wanna learn more? Check out our VMware, SANs, and Hardware training videos. It’s a 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview:

Buy it now.

Reader Letter from Kelly J.

Hi Brent –

I would imagine you get many of these notes of appreciation for what you do but I just couldn’t leave the office for the day without say “Thank you so much”.  I would have never have thought that my personal investment would have paid such dividends.

Today, armed with the awesome information, knowledge and suggestions you provided in your SQL Server for VMware training,  I headed into what I thought would be a rather contentious meeting as we have been experiencing some serious performance issues for almost a year and the discussions just never went anywhere.

To make a long story short – it was amazing, I was able to bring them on board, got my reservations, got access to the ESX Hosts for checking performance and an admission that maybe the SAN could have been better configured for SQL Server.

I have been a fan for years – thanks again and I look forward to reading and hearing more from you, as always.

Best Regards,
Kelly J.

Wanna know what Kelly was raving about?  My 3-hour training session is half off until October 25. SOLD OUT!

css.php