I was going to call this article “Ethernet Switches for Virtualization Engineers” but, really, everyone should have some understanding of the internals of an Ethernet switch. But particularly I want to focus on how multicast and broadcasts are handled in a high-speed, low latency environment like a Data Centre Network.
It’s vital to understand that latency is critical to your application performance. It is common for a single transaction to take hundreds of round trips so a small increase in latency on each round trip has a large impact on the perceived performance. The client will send a chunk of data and wait for acknowledgement. Even setting up the TCP connection takes a few round trip – remember that TCP sessions are setup, and each data transfer is confirmed.
A modern network chassis switch (at time of writing) will have latency around 10 microseconds measured port to port. For example, a Cisco Nexus 7000 is about 8 microseconds & Brocade VDX 8770 claims less than 5 microseconds. There are many reasons why a switch can be faster or slower depending on silicon, backplane, architecture but lets consider just one.
Remember, the latency interval is the time taken to receive a packet, decode the address, lookup the forwarding table, switch the packet (and copy it if needed) and transmit out of an Ethernet interface. That’s really fast processing. How does an Ethernet switch do this ?
Switch Architecture
Let consider with a line card from a Nexus 7000 switch. In this example, an approximation of silicon pathways inside a single M1 N7K-M108X2-12L series line card from a Nexus 7000 from a Cisco Live 2012 presentation showing the module architecture which approximates the internal silicon:
What They Do ?
What does each of those blocks is silicon chip on the board do ?
| Switch Element | Description |
|---|---|
| Replication Engine | Frames that must be sent to multiple ports are duplicated and dispatched from this chip as needed (more below) |
| Forwarding Engine | This is the chip with TCAM lookup tables and makes the routing and/or switching decisions. In other words, a table of addresses and output ports eg. an Ethernet frame with a destination MAC address 000c:1234:4567 is dispatched to Port 2. |
| VOQs | Virtual Output Queues. This is a very high speed memory modules that performs frame queueing in silicon. Queueing is needed to ensure that the fabric is not overrun in the outbound direction. Also, packets arriving from the fabric must no overrun the MAC interfaces. |
| Fabric | Interface chip to the switch fabric. For the NX7K, this is a five interface connection to the fabric modules on a clos switch design. |
| 10G MAC | Media Access Control for 10 gigabit Ethernet port. Think of it as the signal encoder for SFP interface. |
| Linksec | Encryption processor for line rate cryptography if you are using Linksec. |
Most of these functions should be obvious, but virtualization people considering VXLAN in a Multicast environment should have some awareness of the replication engine.
Replication Engine
Most likely you have not have heard much about replication engines in your switches. But since VXLAN has arrived we are seeing a lot more demand for Multicast in network designs. In simple terms, Multicast is a method for a server to transmit a single packet and for the network to duplicate it to as many clients as needed.
Think about that. Your network switch is duplicating Ethernet frames at wire speed, with a latency of around 5-10 microseconds. It can do this for hundreds of Multicast receivers in the network without you knowing (or caring) how it is done.
The replication engine also handles the Broadcast and Unknown frames so that ARP frames are handled efficiently and MAC flooding during address discovery doesn’t slow down the switch in other areas.
It’s worth noting that cheaper or older switches used to perform the replication functions using a general purpose compute engine that was highly latent. It took a long time to transmit the frame to the CPU, then the processing in the network OS took tens of microseconds. I’ve seen these networks melt down under specific circumstances.
Different Approaches
A word of warning. Don’t get too attached to the details on the technology as being described above. There are a number of different approaches. For example the following image shows the engine architecture for one of the Cisco Nexus F2 switching module
As you can see this architecture is significantly different from the previous module but most of the functions are still in place. The majority of the difference is that the F2 module doesn’t perform routing, only switching. As a network architect, you should understand your switch architecture so that you know where the performance problems might be.
Frame Walk / Packet Flow
Many people are not aware of the complexity of an Ethernet switch. To meet the performance and latency targets requires a lot of specific features. Here is the steps through the module ( and I don’t even mention the fabric switching).
The EtherealMind View
An Ethernet Switch is complex and too many people think you just plug it in and it works. Because networking people are so clever we can do this. And because server guys are so dumb we don’t want to challenge you too much. :0
Be nice to your network, it’s working hard for you even if you don’t appreciate it.
Other posts in the series
- ◎ What's Happening Inside an Ethernet Switch ? ( Or Network Switches for Virtualization People ) (This post)
- Tech Notes: Juniper QFabric - A Perspective on Scaling Up
- Switch Fabrics: Input and Output Queues and Buffers for a Switch Fabric
- Switch Fabrics: Fabric Arbitration and Buffers
- What is an Ethernet Fabric ?
- What is the Definition of a Switch Fabric ?
- Juniper QFabric - My Speculations



