In Part 1 of my multipathing series, I talked about what paths are, and today I’m going to be talking about multipathing. SAN multipathing software has two goals, in this order:
Using SAN Multipathing for Failover Protection
Your server absolutely, positively has to be able to access its drives at all times. When servers can’t access their hard drives, horrendous things happen. When hard drives were directly attached to servers, this wasn’t a big risk, but storage area networks bring in a lot of risky factors. Cables get unplugged or get bent beyond repair. Switches fail. Network configurations don’t go according to plan.
(Side note: I think this was one of the biggest reasons SAN administrators didn’t want to go to iSCSI. They saw how our network cables looked, and they didn’t want their precious fiberoptic cables getting that same treatment.)
Multipathing software mitigates this risk by enabling the SAN admin to set up multiple routes between a server and its drives. The multipathing software handles all IO requests, passes them through the best possible path, and takes care of business if one of the paths dies.
In the event of a problem like an unplugged cable, the multipathing software will sense that IO has taken too long, then reset the connections and pass the request over an alternate path. The application (like SQL Server) won’t know anything went wrong, but the IO request will take longer than usual to perform. Sometimes in SQL Server, this shows up as an application-level alert that IO has taken more than 15 seconds to complete.
To make this work, SAN administrators build in redundancy at every possible layer of the SAN infrastructure – multiple HBAs, multiple switch networks, multiple connections from the controllers, and so forth. But most of the time, all this extra connectivity sits around idle. It’s designed to be used for protection, but not necessarily performance: it’s active/passive gear where only one thing is active at a given time. The secondary goal of multipathing is performance, but it’s a far, far second. SAN administrators are so conservative, they make database administrators look like gambling addicts. They’re perfectly comfortable leaving half or more of the infrastructure completely unused.
Do We Really Need More Bandwidth?
Depending on the SAN infrastructure, the theoretical speed limits are around:
- 1GB Fibre Channel or iSCSI – around 125 MBs/second (this is the most commonly deployed iSCSI speed)
- 2GB Fibre Channel – around 250 MBs/second
- 4GB Fibre Channel – around 500 MBs/second (this is the most commonly deployed FC SAN speed)
- 10GB iSCSI – around 1250 MBs/second
These limits were fine ten or fifteen years ago when hard drives weren’t all that fast, but here’s some sample read speeds from today’s desktop-class SATA drives:
- One drive – around 130 MBs/second (from TomsHardware reviews)
- RAID 5 array of five drives – around 300 MBs/second (from my home lab)
Forget 15k drives or solid state drives – even just with today’s SATA drives, 4GB Fibre Channel can get saturated fairly quickly during large sequential read operations, like SQL Server backups or huge table scans on data warehouses. Sadly, I see so many cases where the IT staff bought a SAN with dozens or hundreds of hard drives, hooked it up to a server with just two 4GB fiberoptic connections, and they can’t understand why their storage isn’t much faster than it was with local disks. Even if they get savvy to the basics of multipathing and try connecting more 4GB HBAs, their storage speed doesn’t necessarily increase.
Enter Active/Active Multipathing
Active/active multipathing is the ability to configure a server with multiple paths to the storage and simultaneously use all of them to get more storage bandwidth. This type of multipathing software is usually sold by the SAN vendor, not a third party, because it’s a lot more complicated than it looks at first glance. Talk to your SAN vendor and ask how much their active/active multipathing software costs, and what it’s compatible with. EMC’s PowerPath even works with gear from multiple vendors.
But before you plunk down a lot of hard-earned cash – well, it’s not that hard-earned for storage administrators, but I’m talking to database administrators here – you need to ask one very important question: what exactly does this software mean by active/active? In your feeble mind, you probably believe that you can have one array, accessed by one server, and spread the load evenly over two or more Host Bus Adapters. Not so fast – some vendors define active/active as:
- Only one path can be active per array at a given time. If you have four HBAs, you’ll need four arrays in the SAN, and SQL Server will need to spread the data across all four arrays. This means designing your database filegroups and files specifically for the number of HBAs in use on your server.
- All paths work for sending data, but only one can receive. I’ve seen this in iSCSI active/active multipathing solutions. For SQL Server, this means you can insert/update/delete/bulk-load data at breakneck speeds, but your selects still crawl.
- Active/active works, but failover sticks. Say you have two paths to your data, and one of the paths goes bad for some reason. All traffic fails over to the alternate path. When the bad path comes back up (like the cable is plugged back in, the power comes back on, the port is replaced, etc) traffic doesn’t automatically balance back out. It stays on the single path. The only way to find this out is with expensive SAN-monitoring software or by browsing through SAN configuration screens periodically.
For virtual servers, I’ve got bad more news: the only true active/active SAN multipathing today is in VMware vSphere 4.0 with EMC PowerPath. Stephen Fosketts explains the storage changes in vSphere. If you’re on VMware v3.5 or prior, on Windows Hyper-V, or on vSphere 4.0′s lower licensing tiers, you’re stuck with one HBA of throughput per server per LUN (array). This is one reason why you might not want to virtualize your high-end SQL Servers yet: they don’t get quite the same level of throughput that you can get on physical hardware. Don’t let that scare you off virtualization, though – remember, you’re probably reading this article because you don’t have true active/active multipathing set up on your physical SQL Servers, either.
There’s a lot of catches here, and the SAN salespeople are always going to smile and nod and say, “Oh yeah, ours does that. That’s good, right?” It’s up to you: you have to ask questions and test, test, test. Get a time-limited evaluation copy of their multipathing software and test your SAN performance with SQLIO, as I explain over at SQLServerPedia. It’s the only way to know for sure that you’re getting the most out of your storage investment.