I’m doing a 1-hour open Q&A session on May 8th for the PASS Virtualization Virtual Chapter. Bring your VMware and Hyper-V questions about setup, performance, management, monitoring, or whatever, and I’ll answer ‘em.
You can even get a head start here – post your questions in the comments below, and I’ll build slides to answer ‘em ahead of time. That way you can make sure you get the best answer possible. (Well, from me anyway, ha ha ho ho.)
Then come join us on the webcast and hear the answers. See you there!
Even if your SQL Server is the only guest on a host, it still might not be as fast as bare metal.
One of the reasons is NUMA, which stands for Not Ur Momma’s Architecture. Okay, no, smart reader, you caught me – it actually stands for Non-Uniform Memory Access. In your momma’s architecture (Symmetric Multi-Processing), any CPU could access any memory all at the same low price. In today’s NUMA servers, a single motherboard with two CPUs and 128GB of memory can actually be divided into different nodes.
When a process running on CPU #1 wants to access memory that’s directly connected to it, that’s local access, and it’s fast. However, when that same process wants to grab data stored in CPU #2′s memory, that’s remote access, and it’s not as fast.
The performance penalty of remote memory access varies greatly from system to system, and you can measure it with Coreinfo from Sysinternals. (That Russinovich knows everything.) Blogger Linchi Shea went so far as to test the overhead of local versus remote access on one particular system, and he saw about 5% performance reduction. He considered that the worst case scenario for the server hardware he was using, but keep in mind that the situation will be much worse on servers with higher costs for remote memory access like IBM’s x3850 and x3950.
Windows exposes NUMA configuration details to applications, and it’s up to the app to tune itself appropriately. SQL Server has been NUMA-aware since 2005, and Microsoft’s continued to add improvements for it through 2008 and 2012. To learn more about how SQL Server handles NUMA, check out Gavin Payne’s SQLbits presentation, The NUMA Internals of SQL Server 2012.
How Virtualization Screws Things Up
The good thing about virtualization is that it abstracts away the hardware. You can run any virtual SQL Server on any server in the datacenter without a reinstall. You can even move virtual machines from one host to another, live, without a service restart – even if the underlying hardware is completely different. You can use multiple VMware hosts with completely different NUMA architectures – different numbers of cores per NUMA node, different amounts of memory per node, etc.
In order to pull this off, virtualization just presents a lump of CPUs and memory to our guest. Our virtual machine has no idea what the underlying NUMA configuration is – and it can’t, because it could change at any time when we’re moved from one host to another. This isn’t a performance problem for most apps because they don’t need to know anything about NUMA. They just want a lump of CPUs and memory.
Unfortunately, this is a performance problem for SQL Server because it actually wants to know the underlying configuration – and wants to tune itself for it. This is why when even running on a host with no other guests involved, performance still won’t match bare metal.
How vSphere 5′s Virtual NUMA Fixed Things Up Again
There are three key decisions that will make your life easier (and possibly your performance better).
First, isolate your virtual SQL Servers onto their own hosts. With SQL Server 2012′s licensing, when you buy Enterprise Edition for the host’s CPUs, you get unlimited virtual machines on that host. For a while, this wasn’t easily doable in VMware because of their incredibly stupid memory limits with licensing, but thank goodness they fixed that license stupidity recently. I can’t imagine a software vendor being dumb enough to limit their product to 64GB of memory in this day and age. <cough>sqlserverstandardedition</cough> I’m so glad VMware listened to their end users and fixed that limitation. <cough>microsoftpayattention</cough> Restricting a database server to just $500 worth of memory, why, that’d be like releasing a tablet with 4 hours of battery life. <cough>mylastpostasanmvp</cough>
Second, in that pool of hosts, use identical hardware running vSphere 5. All of the hosts need to have the same NUMA architecture. This does come with a drawback: it’s harder to do hardware refreshes. Most shops just buy new hardware as it becomes available, throw it into the VMware cluster, and let VMware DRS automatically rebalance the load. Unfortunately, the SQL Servers won’t be able to vMotion onto this hardware if it has a different NUMA configuration. The guests will need to be shut down at the next maintenance window, get a different NUMA config, and then be booted on the appropriate hosts.
Finally, configure vSphere 5′s Virtual NUMA on your guests. This is done automatically for guests with more than 8 vCPUs, but at 8 or less, you’ll need to enable it manually. Presto, SQL Server will see the underlying architecture and tune itself appropriately. (Well, not entirely appropriately – now SQL Server just solves the easy problems for you, and creates new hard problems.)
To enable virtual NUMA on VMs with 8 or less vCPUs, follow the instructions on page 41 of the Performance Best Practices for VMware vSphere 5.1 PDF. And hey, while you’re in there, get your learn on – it’s an excellent resource for SQL Server DBAs who want to know if their shop is doing things right.
Wanna learn more? Check out our VMware, SANs, and Hardware training videos. It’s a 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview:
When people buy my virtualization training video, one of the followup questions I get most often via email is, “Can I build SQL Server clusters in VMware and Hyper-V?”
In theory, yes. Microsoft’s knowledge base article on SQL Server virtualization support says they’ll support you as long as you’re using configurations listed in the Server Virtualization Validation Program (SVVP).
But the real question for me isn’t whether or not Microsoft supports virtual SQL Server clusters.
The question is about whether you can support it.
Us geeks usually implement clusters because the business wants higher availability. Higher availability means faster troubleshooting when the system is down. We need to be able to get the system back up and running as quickly as possible. Getting there usually means reducing the amount of complexity; complex systems take longer to troubleshoot.
Adding virtualization (which also means shared storage) makes things much tougher to troubleshoot. If the business wants a highly available SQL Server, ask yourself these questions before virtualizing a SQL Server cluster:
- Do you have a great relationship between the SQL Server, storage, and network teams?
- Do all of the teams have read-only access to each others’ tools to speed up troubleshooting?
- Do all of the teams have access to the on-call list for all other teams, and feel comfortable calling them?
- Do you have a well-practiced, well-documented troubleshooting checklist for SQL Server outages?
- Does your company have a good change control process to avoid surprises?
- Do you have an identical environment to test configuration changes and patches before going live?
If the answer to any of those questions is no, consider honing your processes before adding complexity.
But the Business is Making Me Do It!
They’re making you do it because you haven’t clearly laid out your concerns about the business risk. Show the business managers this same list of questions. Talk to them about what each answer means for the business. Was there a recent outage with a lot of finger-pointing between teams? Bring that up, and remind the business about how painful that troubleshooting session was. Things will only get worse under virtualization.
To really drive the point home, I like whiteboarding out the troubleshooting process for a physical cluster versus a virtual cluster. Show all of the parts involved in the infrastructure, and designate which teams own which parts. Every additional team involved means longer troubleshooting time.
Once the business signs off on that increased risk, then everyone’s on the same page. They’re comfortable with the additional risk you’re taking, and you’re comfortable that you’re not to blame when things go wrong. And when they do go wrong – and they will – do a post-mortem meeting explaining the outage and the time spent on troubleshooting. If the finger-pointing between the app team, SQL Server DBAs, network admins, virtualization admins, and sysadmins was a problem, document it and share it (in a friendly way) with management. They might change their mind when it’s time to deploy the next SQL Server cluster.
Want to learn more? We’ve got video training. Our VMware, SANs, and Hardware for SQL Server DBAs Training Video is a 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview: