Here’s how to tell when you should learn NUMA.
Say you’ve got an IBM 3950, a high-end rack-mount server with 4 sockets, 4 cores each (16 cores total) and 64 gigs of memory. If you’ve got several of these servers, you can connect them together via a daisy chain. IBM makes a special interconnect cable that plugs into the back of these. After wiring them together, you can go into the BIOS of each server, do some tweaking, and presto – you have one ginormous server. If we’ve got four of these identically configured 3950s, that means we now have a server with 64 cores (4 servers x 4 sockets each x 4 cores each) and 256 gigs of memory (4 servers x 64.) This is what IBM demoed at last year’s PASS Summit keynote – they were running several of these servers daisy-chained together to form one 192-core SQL Server.
When you press the power button on the main node, all four servers boot together – but only the keyboard/monitor of one of the servers is active. It really functions as one big server. Data passes back and forth over the interconnect cable to keep everybody marching together. You install Windows just like you normally would.
From a hardware perspective, though, you have to pay attention. You have to have exactly the same processors in each node, but what if:
- You don’t have the same amount of memory in each node
- You have IO cards (host bus adapters, network cards, etc) in only one of the nodes
- You start a program that does CPU work on one node, but needs all 256gb of memory (and that memory lives on other nodes)
You’re going to pay a performance penalty whenever you have to send data across that interconnect between servers. It’s still quick, don’t get me wrong – we’re not talking Ethernet speeds here – but it’s not as fast as when it’s all on the same motherboard.
This is where NUMA comes in – among other things, it helps manage processors and memory to make sure that each processor is using the right memory. If you’re managing a server with more than 2 CPU sockets, you want to learn the basics of NUMA just to make sure you’re not paying a performance penalty for a simple configuration tweak. These same problems exist for high-end enterprise-class servers even when they’re not built of daisy-chained boxes, but I like using this example because it’s easier to understand. I wish I’d have taken more pictures of these when I was working at Southern Wine, because we used ’em there, and it’s easier to understand when you see it visually too.
I’m on a crappy cell modem connection or I’d blog more links, but if you’ve got good intro links for other readers about NUMA, post ’em here in the comments.
6 Comments. Leave new
So, in the scenario with the IBM 3950’s, what happens if a node fails?
I take it that the nodes use shared storage of some type?
Yeah, the nodes aren’t individual in any way. It’s the same as if you have a motherboard failure in your computer – the computer goes down altogether, unfortunately.
Ouch. So I’d imagine that DB Mirroring would be you’re best bet for some type of failover….I don’t guess that you could have an active/passive cluster in this type of scenario.
Just think of the daisy chain as one computer, so you can’t mirror between them, since you are only running one instance of the operating system.
There is however nothing preventing you from using shared storage if you have another daisy chained setup (or just another comparable computer) that you can fail over to.
Right, I wouldn’t think that you can mirror between the nodes…but there shouldn’t be anything to prevent you from mirroring to a different instance on a completely separate server.
Better late than never. We’re using VSphere so this was of interest…
https://itnext.io/vmware-vsphere-why-checking-numa-configuration-is-so-important-9764c16a7e73