The PASS Virtualization Virtual Chapter hosted a Q&A session with me last week. We talked about storage configuation options like VMDK/VHD vs RDM, how licensing works, what’s the biggest SQL Server I’m comfortable virtualizing, and much more:
For more tips, check out our virtualization resources page.
I’m doing a 1-hour open Q&A session on May 8th for the PASS Virtualization Virtual Chapter. Bring your VMware and Hyper-V questions about setup, performance, management, monitoring, or whatever, and I’ll answer ‘em.
You can even get a head start here – post your questions in the comments below, and I’ll build slides to answer ‘em ahead of time. That way you can make sure you get the best answer possible. (Well, from me anyway, ha ha ho ho.)
Then come join us on the webcast and hear the answers. See you there!
I served with High Availability. I knew High Availability. High Availability was a friend of mine. VMware HA, you’re no High Availability.
See, for us database administrators, high availability means protection when:
- The system drive fills up because some potato decided to download a bunch of files
- An operating system or database server update goes horribly awry
- Or even when an OS or SQL update goes right – because the beauty of real high availability solutions is that they let you patch the standby node first, make sure it works, and then fail over to it so you can patch the other node.
Don’t get me wrong – I love VMware, and I love using VMware HA for database servers. It’s a fantastic way to get higher availability for those old dinosaur database servers running SQL Server 2000 that we just can’t kill, yet still run important apps. But in systems where uptime really matters, a single virtual machine isn’t the answer to high availability. That’s where solutions like clustering, database mirroring, replication, and AlwaysOn Availability Groups come into play.
Thankfully, there’s good news: when VMware HA is paired with SQL Server technologies, they can both work even better. Two standalone physical database servers running AlwaysOn Availability Groups are more reliable than just one server, but two virtual machines doing the same thing are even more reliable. They’re protected from hardware failures because they can be spun up on any VMware host in the datacenter. They’re more flexible because we can add CPU power or memory quickly based on demand.
I’ve blogged about why your SQL Server cluster shouldn’t be virtualized, and that still holds true. If you need to build a hybrid AlwaysOn solution involving both failover clustered instances (FCIs) and standalone instances, I would rather not put the FCIs in VMware first. But if you’re under pressure from management to cut costs and cut your datacenter footprint, put the rest of the instances in virtual machines. You’ll gain the power and comfort you want from physical machines while getting even higher availability from the virtual machines. Everybody wins, and the future will be better tomorrow.
Even if your SQL Server is the only guest on a host, it still might not be as fast as bare metal.
One of the reasons is NUMA, which stands for Not Ur Momma’s Architecture. Okay, no, smart reader, you caught me – it actually stands for Non-Uniform Memory Access. In your momma’s architecture (Symmetric Multi-Processing), any CPU could access any memory all at the same low price. In today’s NUMA servers, a single motherboard with two CPUs and 128GB of memory can actually be divided into different nodes.
When a process running on CPU #1 wants to access memory that’s directly connected to it, that’s local access, and it’s fast. However, when that same process wants to grab data stored in CPU #2′s memory, that’s remote access, and it’s not as fast.
The performance penalty of remote memory access varies greatly from system to system, and you can measure it with Coreinfo from Sysinternals. (That Russinovich knows everything.) Blogger Linchi Shea went so far as to test the overhead of local versus remote access on one particular system, and he saw about 5% performance reduction. He considered that the worst case scenario for the server hardware he was using, but keep in mind that the situation will be much worse on servers with higher costs for remote memory access like IBM’s x3850 and x3950.
Windows exposes NUMA configuration details to applications, and it’s up to the app to tune itself appropriately. SQL Server has been NUMA-aware since 2005, and Microsoft’s continued to add improvements for it through 2008 and 2012. To learn more about how SQL Server handles NUMA, check out Gavin Payne’s SQLbits presentation, The NUMA Internals of SQL Server 2012.
How Virtualization Screws Things Up
The good thing about virtualization is that it abstracts away the hardware. You can run any virtual SQL Server on any server in the datacenter without a reinstall. You can even move virtual machines from one host to another, live, without a service restart – even if the underlying hardware is completely different. You can use multiple VMware hosts with completely different NUMA architectures – different numbers of cores per NUMA node, different amounts of memory per node, etc.
In order to pull this off, virtualization just presents a lump of CPUs and memory to our guest. Our virtual machine has no idea what the underlying NUMA configuration is – and it can’t, because it could change at any time when we’re moved from one host to another. This isn’t a performance problem for most apps because they don’t need to know anything about NUMA. They just want a lump of CPUs and memory.
Unfortunately, this is a performance problem for SQL Server because it actually wants to know the underlying configuration – and wants to tune itself for it. This is why when even running on a host with no other guests involved, performance still won’t match bare metal.
How vSphere 5′s Virtual NUMA Fixed Things Up Again
There are three key decisions that will make your life easier (and possibly your performance better).
First, isolate your virtual SQL Servers onto their own hosts. With SQL Server 2012′s licensing, when you buy Enterprise Edition for the host’s CPUs, you get unlimited virtual machines on that host. For a while, this wasn’t easily doable in VMware because of their incredibly stupid memory limits with licensing, but thank goodness they fixed that license stupidity recently. I can’t imagine a software vendor being dumb enough to limit their product to 64GB of memory in this day and age. <cough>sqlserverstandardedition</cough> I’m so glad VMware listened to their end users and fixed that limitation. <cough>microsoftpayattention</cough> Restricting a database server to just $500 worth of memory, why, that’d be like releasing a tablet with 4 hours of battery life. <cough>mylastpostasanmvp</cough>
Second, in that pool of hosts, use identical hardware running vSphere 5. All of the hosts need to have the same NUMA architecture. This does come with a drawback: it’s harder to do hardware refreshes. Most shops just buy new hardware as it becomes available, throw it into the VMware cluster, and let VMware DRS automatically rebalance the load. Unfortunately, the SQL Servers won’t be able to vMotion onto this hardware if it has a different NUMA configuration. The guests will need to be shut down at the next maintenance window, get a different NUMA config, and then be booted on the appropriate hosts.
Finally, configure vSphere 5′s Virtual NUMA on your guests. This is done automatically for guests with more than 8 vCPUs, but at 8 or less, you’ll need to enable it manually. Presto, SQL Server will see the underlying architecture and tune itself appropriately. (Well, not entirely appropriately – now SQL Server just solves the easy problems for you, and creates new hard problems.)
To enable virtual NUMA on VMs with 8 or less vCPUs, follow the instructions on page 41 of the Performance Best Practices for VMware vSphere 5.1 PDF. And hey, while you’re in there, get your learn on – it’s an excellent resource for SQL Server DBAs who want to know if their shop is doing things right.
Hi Brent -
I would imagine you get many of these notes of appreciation for what you do but I just couldn’t leave the office for the day without say “Thank you so much”. I would have never have thought that my personal investment would have paid such dividends.
Today, armed with the awesome information, knowledge and suggestions you provided in your SQL Server for VMware training, I headed into what I thought would be a rather contentious meeting as we have been experiencing some serious performance issues for almost a year and the discussions just never went anywhere.
To make a long story short – it was amazing, I was able to bring them on board, got my reservations, got access to the ESX Hosts for checking performance and an admission that maybe the SAN could have been better configured for SQL Server.
I have been a fan for years – thanks again and I look forward to reading and hearing more from you, as always.
Wanna know what Kelly was raving about?
My 3-hour training session is half off until October 25. SOLD OUT!