HP C-Class Blade Chassis Review Part 2: The Cisco/Brocade Interconnect Switches

In my last article about the C-Class chassis, I talked a little about the interconnect switches, and today I’m going to dive deeper.

As with most full-factor servers these days, each single-height HP blade like the BL460c includes an onboard dual-port network card. The difference between standalone and blade servers starts here: thoe two ports connect to two different interconnect bays (bays 1 & 2). They are hard-wired to these bays, each of which must contain a network switch. With standalone servers, a network admin can cable both of the server’s network cards to the same switch (HP, Cisco, etc), possibly utilizing empty ports on an existing switch. With a blade infrastructure, however, the admins need to buy at least a pair of switches for each blade enclosure.

We use two Cisco switches in those interconnect bays, each of which route to a different core Cisco switch. We tie the whole thing together by using the free HP network teaming software included with the blades, which means either of the interconnect bay switches can go down without taking down the blade’s networking. This does have weaknesses, but I’ll talk about that in an upcoming article about the HP Virtual Connect infrastructure.

A blade cannot live by two ports alone, so the BL460c includes 2 mezzanine card bays. The two mezzanine bays are HP’s version of an internal PCI Express slot designed for the tiny blade form factor, and they accommodate a variety of mezzanine cards including dual-port and quad-port network cards, dual-port SAN HBAs, and even Infiniband. This makes even the small BL460c well-suited for a variety of low to mid-range database server duties, especially for iSCSI shops. At our shop, some of the database server setups include:

  • Standalone high-performance OLTP server – one mezzanine bay holds a SAN HBA for storage, the other bay is empty
  • Clustered high-performance OLTP server – one bay has a SAN HBA, and the other bay has a dual-port network card
  • Standalone iSCSI OLTP server – one bay has a multipurpose iSCSI network card

The more connectivity a blade needs, the more switchgear needs to be involved. A typical C7000 chassis configuration might look like this:

  • Interconnect bay #1: a Cisco 3020 network switch
  • Interconnect bay #2: a Cisco 3020 network switch (for redundancy)
  • Interconnect bay #3: a Brocade SAN switch
  • Interconnect bay #4: a Brocade SAN switch (for redundancy)
  • Interconnect bay #5: a Cisco 3020 network switch (for 4-port NICs, especially for VMware)
  • Interconnect bay #6: a Cisco 3020 network switch (for redundancy on the 4-port NICs)

The Problem with Lots of Switches

Having six additional switches for every 16 servers (or even less servers, depending on whether the shop uses full-height blades) presents some problems.

To me, the beauty of blades is their reduced complexity: it’s all about making deployments easier, more predictable, and faster. Adding more switchgear doesn’t eliminate that simplicity, but it doesn’t help the case. I still have to put in a ticket for our networking staff to set up the Cisco switches, and I can’t double-check their work. The only way I find out that the setup wasn’t right is when I put in the new blade and it can’t communicate on all of its network cards. I don’t have to put in a ticket for the SAN admin because – well, because it’s me – but the other Windows admins have to wait for me to configure their SAN connections. In all, this can add days of lag time for a new blade setup, and that takes the shine off the simplicity of blades.

This is made more frustrating by the fact that most of our blade configurations come from just a couple of types: VMware hosts with specific VLAN, iSCSI and SAN needs, SQL Servers with SAN needs, and plain vanilla servers with one subnet. This should be a cookie-cutter setup job, but because the setup is done by multiple teams, there’s lag times, misunderstandings and finger-pointing when something goes wrong.

The fact that there’s a growing army of switches makes the initial configuration that much more difficult: we have to be extremely explicit with the network staff. Where we could easily just specify Core Switch A or Core Switch B before, now we have to specify which blade chassis we’re working with, which bay the switch is in, and so forth. Plus, when we hire new network administrators, they’re not always familiar with blade switches, so we have to walk them through the datacenter to explain how the different switches uplink to the core switches.

More switches also mean more firmware administrative headaches. These are another six switches that we have to keep on a synchronized versions of firmware. For example, we recently ordered a new chassis with Brocade switches, and the new switches arrived with a newer version of firmware. Thankfully we caught that before we plugged it into our infrastructure, because that firmware version was not compatible with other switch firmware versions in our fabric.

Another problem with this sudden growth of switches is that some management software is licensed by the switch port, regardless of whether that switch port is actively in use. We license our SAN path management software by the switch port, and the instant we plug in another pair of Brocades, we have to license that software for the additional switch ports. In some of our C7000s, we only have half of the servers connected to the SAN – meaning we’re paying licensing for more switch ports than we’d use.

The Limits of 2 Mezzanine Cards with Conventional Switches

The two mezzanine card slots are the BL460c’s first weakness as a database server: it can’t get seriously high throughput with conventional switches.

Most midrange fiber channel SANs don’t have true multipathing for their arrays. Each LUN (drive letter) is sent through a single 4gb HBA until that HBA path fails over, and then it switches to the other HBA. For SQL Servers, especially data warehouses, this presents a bandwidth problem.

We did a Microsoft Technology Center lab for our data warehouse in the winter of 2007, and one of the findings was that we were hitting a SAN throughput bottleneck. We were using two 4gb HBAs with IBM’s RDAC failover multipathing, which does not truly load balance between HBAs. The recommendation was to switch to at least 4 HBAs – something we couldn’t do with the BL460c blades. Granted, we weren’t running a data warehouse on a BL460c, but my point is that it shouldn’t be done for performance reasons.

The same thing holds true with iSCSI, especially when using just 1gb switches. Since each pair of network cards is divided between two Cisco switches, we’ve been unable to get 2gb of combined throughput at any given time even when using the vendor’s multipathing software. We got an eval system from LeftHand Networks hoping it would resolve that issue, but the onsite tech agreed that it just couldn’t be done if the two network cards were connected to two different Cisco switches.

Summary: A Problem, but There’s a Solution…

These problems haven’t slowed our adoption of C-Class blades with conventional switchgear – the switches were an inconvenience that we can get around.

There’s also a solution to most of these problems: the HP Virtual Connect system. More about that in the next article in the series.

Continue to Part 3: HP Virtual Connect Review

Previous Post
SQL performance tuning: it’s about training too
Next Post
SQL Backup Software: Part 5 – Justifying the Cost

11 Comments. Leave new

  • When i connect 1 uplink to blade switch from my network switch,it works fine

    But when i connect both the uplinks simultaneously to both the blade switches,it becomes problematic and the whole chasis gets disconnected from the network

    Please suggest what the reason might be

  • You need to call HP to get support questions answered.

  • Sumee, on your network switch you will need to truncate those ports before you hook up your blade.

  • Hi,

    you neither need HP nor to trunk. The different virtual connect modules dont show up as stack. You definitely need to enable RSTP (spanning tree) to avoid ethernet loops. I assume that you created such loops if you lost most of connectivity.

    Best regards, Frank

  • Henry Malibiran
    July 7, 2010 8:07 pm

    I have C7000 and 8 blade server with 2x GbE2c L2/L3 blade switch. I need to connect to Cisco switch. The design will be 2x uplink port (p21 and p22) will connect to 2x port of Cisco swtich. I configured trunk on p21 and p22 of Blade switch and ether-channel port on 2x port of cisco. The problem is I cannot ping/communicate the other server outside the blade switch. Anyone can help me on this problem. Many Thanks.

    • Hi,

      Etherchannel on the CISCO side is the right choice, but maie sure that VLAN tagging is properly set on both sides and make sure that VLAN1, which remains default on dthe GbE side is the native VLAN on CISCO side and that it is the one VLAN which is carried untagged.

      Best regards. F.

  • Hi Frank,

    I have a c3000 with 8 blades, connected to two C3020 Cisco switches. From some blades i can only ping the first switch, and from other blades i can only ping the second switch. Is this a normal behaviour?

    • Hi Fernando,

      wether this is normal or nut depends on two things which relate to each other.

      Point one is the Bonding driver configuration of your blade. If you bond interfaces on both switches and have a active-passive configuration this might apply if, point two, there is no connection (uplink) to a somehow shared backbone with an appropriate VLAN setting.

      So if you want this to be correct and make everybody communicate with everybody, check the bonding (aka adapter teaming) on the blades and then check your vlans and connections.

      Regards F.

  • The advantage of the Cisco 3120 or the likes is that they can be stacked and managed as one logical switch, with a shared back plane.

    Really the only advantages I see of the VC is one of management. You can move configurations around and have a nice GUI that server admins understand.

    The VC are limited on their link aggregation, number of vlans they can actively support and I am not really impressed on what they can do from a cos/qos point of view.

    The 64 million dollar question is what do the guys that like switches do when they want 10 GB? HP is not offering the Nexus 4K so if you want switching you are left with pass throughs to an external switch or the Procurve 6120.

    From the little that I can find the 6120 does not stack, I can Link aggregate it to the switch in the corresponding bay but that is far from stacking, its just a link to another switch. If you want 4 10 Gb ports in each blade (which believe it or not some of my vmware costumers are asking for) I have to manage 4 separate switches in each blade enclosure.

    I have customers looking at running View with 125+ desktops per blade, thats close to 2000 VM and I am going to need more than 2 10G connection to handle the networking from a client point of view, not even taking into account if the data is on nfs.

    I understand that I look at a little differently because I work on the system and networking side but if you want to work with vmware you have to get up to speed on networking. Data center networking is going to keep changing at a very fast pace. Things like trill and otv are going to change tons of this.

    I think HP is trying to make the VC go up against what the UCS can do. I am not sure thats the right move given the architecture of the C7000.

  • I have a serious bandwidth problem when using 4/24 brocade switches as interconnects in the C-7000 chassis. The bandwidth I’m getting is about 10Mbs (or less) for each port across the entire chassis. I have 10 bl460c G1 blades running ESXi 4.1 U1 with Emulex HBAs.

    HP level two support and Brocade cannot seem to find a problem. They are looking for errors and are not finding any.

    It seems you have extensive knowledge of the chassis design and I’m hoping you can point me in the right direction.

    Thanks

    • Hi, Robert. Sorry to hear about that. I don’t have any quick fixes – after all, we’re talking about something the manufacturer’s support can’t even figure out – so this sounds more like a consulting engagement to do troubleshooting. I’m available for that, but I didn’t want to turn this into a sales pitch – I bet you’re looking for easy, free fixes, as I would be doing in your shoes. If you get to the point where you want to engage us for consulting, email us at help@brentozar.com. Good luck!

Menu
{"cart_token":"","hash":"","cart_data":""}