Response: Bare-metal Switches And SDN controllers At Facebook

Facebook has telling people more about it’s in-house designed and developed whitebox Ethernet switching based on Broadcom silicon (not sure if it’s Trident2 or Arad), their own Linux distribution and OpenCompute standards.

An article published at Data Centre Knowledge ( Bare-metal switches and SDN controller now online in Facebook data centers ) had some interesting parts for network engineers on their approach to SDN:

Facebook’s network operating system is a Linux variant. The company has a Software Defined Network controller for centralized network management. “We’re very big believers in SDN,” Ahmad said.

For Facebook, using a controller must scale for known size of their network. Proof the SDN controllers can scale. But note how Facebook is implementing SDN:

One example where SDN helps is selecting optimal network path for data at the edges of the network around the globe. The primary protocol Facebook uses at the edge is BGP, which is good for setting up sessions, path discovery and policy implementation, but not very good at path selection. BGP selects the shortest path and uses it without considering capacity or congestion, Ahmad explained. Facebook’s SDN controller looks at paths BGP discovers and selects the best path using an algorithm that also takes into consideration the state of the network, ensuring the most efficient content delivery.

In this architecture, this suggests that Facebook is using a BGP SDN design similar to what Peter Lapukhov talks about in this IETF Informational RFC and on Show 164 – Cool or Hot? Lapukhov + Nkposong’s BGP SDN where BGP does most of the basic routing but selected flows have different paths.

Increasing network utilisation to 90% is saving a lot of bucks:

The network is not making routing decisions at the edge on its own. The decisions are instead made by a central controller and pushed back into the fabric. As a result, Facebook has been able to increase utilization of its network resources to more than 90 percent, while running the application without any packet backlog, Ahmad said.

I would see benefits as higher network utilisation means less switches and leads to less capex, less power and lower operational overheads.

The EtherealMind View

There are many types of SDN and this approach to BGP SDN is adopted by large scale deployments like Microsoft Azure and Facebook. I would surmise that the SDN Controller is directly managing some percentage of flows and most data follows policy routing implemented over BGP – which is also SDN since the policy is passed to BGP from a route server. This approach is explained in detail in the podcast where Petr Lapukhov explain the why and wherefores (I had to listen a couple of times to understand it properly).