TL:DR A recent project bought a low-cost network for the data centre. It cost less one-third of the market leader & half the cost of a well-known merchant silicon vendors. As a result, it is planned to last for two, maybe three years before it will be replaced. From this project I learned that “fast & cheap networking” could make a big impact on new data centre designs and business attitudes. Plus it was much more satisfying as a professional project. I’m now wondering – is networking too expensive ?
During a recent consulting engagement I was asked to advise on a high performance 10GbE network. The target was for more than 1000 ports of 10 Gigabit Ethernet at the lowest possible price while maintaining performance within a specific band. Criteria for bandwidth & latency were set in addition to setting contention requirements at maximum of 2:1 because the traffic volumes are high for the application.
This network is the foundation for cloud computing application but is connected to the existing network at certain points. It supports a large, high performance computing platform including more than 5 Petabytes of high-speed IP Storage & more than five hundred x86 servers. The physical servers have dual 10GbE for performance (not redundancy) & the server will run a hypervisor with high level of compute, memory & network utilisation.
The majority of network switch today are based on the Broadcom Trident II chipset with has 48x10GbE & 4x40GbE interfaces. Using the 4 x 40GbE for ECMP connection would lead to a content ratio of 480Gb to server & 160Gb up. This lower density might be suitable for many enterprises but is not suitable for cloud deployments where the physical server can host tens of virtual servers. Combined with IP Storage for this network, the utilisation is much higher than some networks. As always, it depends™.
The Buying Process
In order to compare vendors equally, the bid process had the following structure:
- vendor/reseller was to offer a solution that was fully supported & their own design
- the bid must include all cables, SFPs, accessories and must be vendor certified (no grey or 3rd party)
- the bid includes installation, testing & handover included in the price
- the bid must include three years maintenance and will be purchased in a single order.
- the bidder was told that price was a high priority and should influence product selection
- the bid described that very few features were required but named those that were mandatory
I’ll highlight that most vendors/resellers were badly organised. The bid responses often did not address the requirements & the many of the bids left out cables, maintenance & installation in spite of the bid clearly stating that all pricing must be included.
The cost to the customer in time wasted because of poor bid responses was substantial. Resellers & vendors wasted a lot of time asking questions that had little relevance to our requirements (which were clearly stated in the bid documents). We wasted many hours attempting to compare the offered solutions.
When buying a 10 Gigabit network, it became obvious that the largest cost is “vendor supported” cables & interface modules. The final cost of switch hardware was less than the total modules/cables cost for this project. This was unexpected.
Variation in pricing on identical cables was as much 10 times between vendors. It was a critical requirement was that each vendor must approve the design & bill of materials in order to ensure a fully supported network. This means that all cables were “genuine” & vendor certified. This variation in pricing was unexpected.
The casual observation is that cables appear to include act “volume licensing” program for some vendors which possibly takes advantage of the processes in some companies but the disparity was shocking. 
Vendors & resellers do not know or understand how much power their devices use or how to assign a dollar value to running these devices on a yearly basis. Available documentation on device power consumption is awful. More needs to be done.
The final comparison of the bids showed a wide range if pricing. I’m not permitted to quote exact pricing so I’ll use numbers that show the relative scale.
Market Leading Vendor = $2.8MM
Alternate Vendors (multiple) with merchant silicon switches = $1.6MM
Lesser Known Vendor “B” = $800K
We evaluated the technology of the Alternate Vendors closely & then carefully analysed what functions would be lost with Vendor “B”. There are a number of tradeoff here but the savings of $700K (or $2.2MM) is a significant motivator to look at the solution carefully.
Instead of buying a branded vendor, we bought the network from little known vendor for the majority. We also some equipment (about 10%) from a second vendor. Two reasons.
- A dual vendor purchasing policy helps to ensure competitive pricing. Previous experience with a single vendor has led to poor pricing & support. Quote: “When a vendor has to be competitive, they stay competitive.”
- There are certain features on the alternate platform that we needed. So we bought the second vendor products for those features.
The network is now nearing production readiness & moving through the final stages of acceptance testing. There are a number of small problems with implementation due to the usual poor planning by vendor & reseller but nothing that wasn’t expected by a suitably cynical & experienced purchaser.
We have already evaluated the next generation of networking hardware from mid-market network vendors. Because the network cost was so low, the ROI period is less than 2 years. As a result the current planning is to replace the network in 2 years. In practice, it will probably creep to three years but that timeframe is still half the previous ROI.
The Unseen Impact of Expensive Networking
The previous network ROI of six years was extremely damaging to the data centre. Simply, the network equipment was so “old” that it didn’t support simple but necessary features such as RSTP & PIM. The software on the current devices was not particularly reliable & had a number of quirks that made it hard to use.
Another interesting point was that the cost of networking was so high that they couldn’t afford a network person. The network was installed & effectively unmaintained until a problem emerged. This led to poor quality outcomes for server & storage & also a number of security issues.
Designing For Change
The final choice was for an ECMP Single-stage Clos architecture for network architecture to make it easier to replace the network devices. The ECMP network design offers growth & ease of operation. The current backbone has six switches with 32x40GbE interfaces connecting to 19 Leaf Switches. The next phase of growth will increase the spine to a larger size.
We know that the current products lack many features that other products have but these were easily solved using other technologies that cost little, were free or were solved by changing the operational practice. For example, because the switches were so cheap, we bought separate devices to be exit point from the underlay network & overlay network because it was operationally prudent to have the VTEP functions in different box. & the box was so cheap, it was worth the investment.
But we also know that the next generation of network switches based on Broadcom merchant silicon, will offer new features & function that are interesting to the use case. In particular, the 100GbE interfaces will provide extra bandwidth in the spine which is expected to be important as the IP Storage traffic volume increases.
The EtherealMind View
When I first looked at the requirements for this project, it seemed obvious a mid-tier networking vendor with merchant silicon switches would fit best. The final solution was surprising.
The most interesting lessons I learned could be summarised as follows:
- ECMP network designs are awesome. They scale, they are easy to operate & look easy to upgrade (I say, look, because I haven’t done it yet & maybe it is harder than it looks).
- Cheap hardware changes the way you build networks. Instead of spending hundreds of hours researching & justifying a single expensive purchase, the project was able to make a rapid decision & move into implementation. It was refreshing to move through the decision process quickly.
- Cheap also means replaceable Having a network at a reasonable cost means that the investment cycle can be radically changed. A faster investment cycle means regular upgrades.
- Low Cost Got Business Attention Saving a couple of million dollars really focussed the business on tradeoffs. We were able to get business acceptance on many new ideas simply because the dollars made it practical & sensible.
- We bought some spare equipment because it was cheap. Instead of counting every tiny elements & having endless discussions on components in the bill of materials we simply bought extras. Instead of hours wasted on my consulting time, we bought a small amount of extra kit & focussed on the installation. Overall cost included resource time was reduced.
- Features in the network are “missing” but they can be worked around with the right team & DevOps thinking. Integration between the storage, VMware, Server & networking solved these problems.
- Some features will come later. We are working with the vendors to get early versions of code to access certain features. This network design is slightly ahead of marketplace & the business accepts some tradeoffs are necessary. We have designed some aspects to allow minimal interruption when the features arrive.
- Go multi-vendor The multivendor strategy worked better than I thought. I haven’t really experienced and multi-vendor network since the late 1990’s and was somewhat wart of the idea. Instead of wasting time as I expected, both vendors got something & were willing to work harder because the motivation of the next big purchase being just couple of years away kept up the motivation. I didn’t feel discarded like a dirty rag as has happened in other projects.
I would caution that this approach is not suitable for every company. But I think it shows that it is possible to drastically reduce the capital cost of building a data centre network. There are a number of tradeoffs in the implementation but, lets face it, saving a million bucks can make those tradeoffs acceptable to the business. At a personal level, this was one of most satisfying projects I’ve assisted in a long time because the managers and engineers came out happy. That doesn’t happen very often.
This whole exercise seems to highlight an issue that I’ve been wrestling with. Is networking is too expensive ? This project suggests that it is.
- Many projects buy the least amount of hardware needed for the project. Buying a 10GbE switch & cables for a few ports meets the immediate requirements. The actual overall cost of the switch is massively inflated by costly “vendor certified” cables but it’s not visible to ITIL-compliant processes because ITIL does not consider overall costs, only incremental costs. ↩