The debate on Stackable vs Chassis based switches has a long and proud pedigree. Ever since the first switches arrived on the market, established vendors have always produced chassis based switches for the higher end of the market for performance and reliability. The cheap manufacturers, and new entrants to market have always produced stackable switches and, inevitably, claimed how fantastic they were compared to chassis based switches.
What’s interesting is that those same companies (provided they stay in business for long enough) always make a chassis based switch in the end. Case in point, HP ProCurve.
When customers are confronted with the question of which to choose, the cheap vendors smile and point to the price tag. The chassis vendor would then attempt to educate the customer on the advanced technologies and the enhanced features in their device. Mostly, the customer didn’t understand, worked out that stackable switch are less than half the price per Ethernet port and raised a purchase order.
The next time around, the same customer always bought the chassis based switch. Once bitten, twice shy. Let me try to put some meat onto the discussion.
Well, at least I’m gonna try.
The Zen Master was meditating over his network, at oneness with the Flow. A student approached the master and asked “May I not stack the Cisco C3750 switches to create more connections to the Flow ?”.
The Master looked carefully at the young apprentice and said “It is believed that the sum of the parts is greater than the whole and that combining many into one creates more Flow.” And the student nodded, because that was his thought.
The Master smiled and then said “But that which is many, always remains many and is never truly one”
And the student was enlightened.
The manufacturing quality of fixed format switches is lower than chassis.
Chassis based switches are designed, manufactured and built for much higher MTBF and MTTR than a fixed format switch and this is reflected in better software defect ratios, and lower hardware failure rates. Hardware performance is improved to due to better airflow, better design, more testing and quality assurance.
In particular, the software quality of chassis based switches seems to have much lower defects, bugs, fixes and patches. I currently believe that this is due to software simplicity of building a single OS compared to a distributed operating system. It could also be that the additional cost of the device allows for the development of better software and better testing.
More elements means less overall reliability.
This is counterintuitive and most people don’t get this until it’s pointed out.
Having six power supplies instead of two means greater chance of failure. Lets assume that both stackable and chassis power supplies are of the same quality (not true, but lets assume) and have the same chance of failure. Since there are six units that could fail at any time there is three times the probability of a failure. That is six chances versus two chances means it is three times more likely to have a power failure in a stack of switches.
This impact gets worse when considering the Time To Repair. That is, you must replace the entire switch and possibly, restore the configuration in a stack, or the firmware. On a chassis, the configuration is stored centrally and not lost. A replacement line card (provided its the same model) will return to service almost immediately.
Bandwidth Limitations and Shared Bus Architectures
The Cisco Stackwise product documentation indicates that the C3750 is only 32Gb/s full duplex bus or 16 Gb/s in a single direction (although they claim 32Gb/s since it is a counter rotating ring in a stunning piece of marketing math. Therefore a stack of eight 3750 switches are sharing a total backplane capacity of 16GB/s, which isn’t a lot. Calculating bandwidth of a shared bus is notoriously difficult but you might say it’s less than 2Gbps per switch for a fully loaded stack.
To efficiently load balance the traffic, packets are allocated between two logical counter-rotating paths. Each counter-rotating path supports 16 Gbps in both directions, yielding a traffic total of 32 Gbps bidirectionally. The egress queues calculate path usage to help ensure that the traffic load is equally partitioned.
Whenever a frame is ready for transmission onto the path, a calculation is made to see which path has the most available bandwidth. The entire frame is then copied onto this half of the path. Traffic is serviced depending upon its class of service (CoS) or differentiated services code point (DSCP) designation. Low-latency traffic is given priority.
When a break is detected in a cable, the traffic is immediately wrapped back across the single remaining 16-Gbps path to continue forwarding.
Shared Bus design are also very limited in throughput. The reason vendors like to implement Shared Bus technology is that it is cheap and easy to build and has a high clock speed / headline data rate. But the difference between actual goodput and throughput can be a lot. That loss of bandwidth means a slow network.
If you combine the risk of bus congestion at critical overload events, and the impact of electronic failure (which typically fails the entire bus), I don’t have a positive outlook on Shared Bus Architectures.
Reliability and Availability are NOT the same thing
Conceptually, Cisco claims that the loss of single switch of a stack will not cause failure of the entire stack and therefore this will provide greater AVAILABILITY. Availability is not the same as reliability. Because failures happen more often, your network needs more High Availability features to compensate such as redundant uplinks, STP optimisations and fancy tuning, routing protocol optimisations such as ECMP and BFD.
Software reliability (Failure in a stack is typically / often takes the entire stack down)
Taking this one step further, field experience shows that losing one switch in the stack, often causes the entire stack to fail or perform badly until the faulty switch is powered down or removed. The level of software integration that occurs when the stack forms seems to be quite challenging. This is my actual experience over the last fifteen years and no matter what the vendors tell you, that the actual experience. Every vendor has told me that this isn’t possible and assure you that that one unit can never bring the stack down but that is exactly my experience. Nortel, 3Com, ProCurve, Cisco whatever.
The Stack Connector
The Cisco cable that is used to connect the Cisco 3750 hasn’t worked very well in practice. The connector is very large and heavy and seems cause a physical connection problem. My experience suggests that you need to reseat the connector every six months or so to stop mechanical failure of the stacking cables. This requires a outage to the stack (even though it shouldn’t).
The same applies to all other stack vendors. Those physical connectors need to handle a high speed electrical signal within very very tight parameters and are easily affected by stray RF or physical degradation such as oxidation. I have always expected that chassis backplanes should be affected by the same problems but that doesn’t seem to be a problem. I’m guessing that the better physical environment of a backplane is more conducive to better connections.
There are many features available in the Catalyst 6500 that are not available in other switches. Because chassis switches have bigger processor engines, they are able to handle more features. This includes basic features such as large number of VLANs since there is enough CPU to handle STP and BPDU generation, FHRP protocols, faster timers on routing protocols such as OSPF (250milliseconds hello timer for OSPF on C6500) as well more advanced and less common features such as MPLS. This experience is the same for, say, Nortel ERS8600 which has a superior feature set to their (( I am not completely up to date with ProCurve and cannot comment on their equipment ))
Conceptually, a single software process that controls the entire SINGLE device is a better technical choice than attempting to form a coherent set of discrete elements into single LOGICAL device aka form a single switch into a single stack. This software complexity is what leads to feature poverty (and higher failure rates as discussed previously).
With all of the above, you might think that I don’t like stackable switches. Technically, you are right. A chassis based switch is ALWAYS more reliable – physical, software and operational and has more capability, performance and features. As a designer, I need to be careful of the budget. So if the business criteria mean that money is an crucial factor, then stackable is a viable choice.
But, cheap is as cheap does. Don’t expect a outstanding experience with stackable switches and be happy with what you get.
If you are happy with Stackables
If you are happy with your Stackables, then fine. But I would bet money (and I’m not a gambling man) that you aren’t doing anything challenging.