In the beginning, computer makers created servers with hard drive bays built in. And it was good.
Built-in drive bays are easy to manage: when you need more storage, you just plug in another hard drive. You don’t have to fork out a lot of money or make room – the server already has the drive bays built in. It’s so easy, a developer could do it.
Built-in drives are reliable because they’re directly connected inside a server with just a few wires. People aren’t going to walk past the server and nudge the cables loose. As long as the server has power, the drives have power, so admins aren’t worried about drives suddenly going offline while the server is up.
They also have drawbacks: when you run out of drive bays, you run out of options. If you have extra drive bays in the server next door, you can’t cable them together and use the extra bays. And perhaps worst of all, when you buy a server, you have to be pretty confident that you’re buying one with the right number of drive bays – not too many, because bigger servers cost money, and not too few since it’s not easy to add more later.
Later, computer makers designed Direct Attached Storage: shelves of drives that could be attached directly to a server. One of these DAS units could hold a dozen or more drives, thereby giving the sysadmins more storage capacity for a server. They are cheaper than buying a new server.
They introduced two reliability risks: the DAS’s power supply could fail, thereby taking the drives offline, or the connection between the DAS and the server could be tugged loose. Hardware makers mitigated the risks by adding redundant power supplies on enterprise-class models, and they’ve started taking steps to reduce the risk of SAS/SATA connectivity problems.
Direct Attached Storage units usually dedicated to a single server, which means that even if you only need one or two additional drives, you still have to buy a whole 12-bay chassis. The extra bays sit around unused, wasting space in the datacenter.
The Solution: Storage Area Networks
Storage Area Networks (or SANs) put a full-blown network between the servers and the drives. SANs consist of a few parts:
- Drives (hard drives or solid state drives)
- Drive enclosures (shelves with space for a dozen or more drives)
- Controllers (kinda like computers that connect to the drives, and have network ports)
- Network switches (could be fiberoptic networks or conventional Ethernet)
- Host Bus Adapters (the fancy-pants SAN name for network cards that plug into your server, thereby connecting your servers to your drives)
Suddenly, this picture starts to get kinda complicated. There’s a lot of parts here connected to other parts with a lot of cables. Servers really need to see their hard drives at all times, so what do engineers do? They build in redundancy: each part has backups, and backup ways to connect to other parts.
In the picture above, we’re showing just two parts of the SAN. The top big black box is a single controller, and there’s two drive enclosures underneath it. In the SAN world, this is a really basic SAN, and SAN administrators would say it’s not really redundant. The rest of us would be stunned at how much redundancy is already included, though – check out a picture of the back side of that controller and just one of the drive enclosures.
This picture shows two units: the top half is a controller, and the bottom half is one drive enclosure. The connections are:
A – One pair of fiberoptic cables that connect to a SAN network switch.
B – One pair of fiberoptic cables that start a loop down to the drive enclosures.
C – Another pair of fiberoptic cables that represent the other side of that loop started by B.
D, E, F – pairs of fiberoptic cables that run up to B/C above and to the other drive units.
At any time, the SAN admin can walk to the back of the rack, pull one pair of the B/C/D/E/F cables out that carry communication between the controller and the drive enclosures, and business will keep right on truckin’. In fact, if the SAN has been set up according to best practices, the admin can probably pull more than one cable without taking the system down.
SANs Have Multiple Paths to Access Data
A common misperception is that SAN admins sleep so well at night because their pillows are stuffed with money. While SAN admins do make a fortune, the sad fact is that money-stuffed pillows are surprisingly lumpy and noisy.
Instead, the reason SAN admins sleep so well is because the SAN has so many paths between each server and its drives, and a single failure just won’t stop the SAN. Furthermore, most production SANs have multiple controllers, multiple network switches, and beyond that, two or more completely separate networks (called fabrics). If all hell breaks loose and one defective network switch goes down or broadcasts garbage, there’s a totally independent network that stays up.
If this was your home network, it would be like having a cable modem and a DSL modem, with two separate routers. Your home computer would have two separate network cards, each with a different TCP/IP address, connected to the two different routers. If any one component failed, you could still continue to watch your mission-critical “adult material” without interruption.
Or could you?
In the event of a real failure, like if you were watching Hulu over your cable modem and it started to go down, odds are your movie would start to stutter and cut out. The traffic wouldn’t be automatically and instantaneously switched over to the DSL modem. You would probably have to do something manually, or heaven forbid, reload the movie again. That doesn’t cut it for, say, a SQL Server trying to access its data files over the SAN: it simply can’t go down.
This is where SAN multipathing comes in: it needs to know what paths are available, what paths are not working well, and proactively route traffic over the best possible path. In the next part of my series tomorrow, I’ll talk about the basics of multipathing, and then talk about the differences between Fibre Channel, iSCSI, and virtualization multipathing.