The following is a “paper review” of Project Calico following a recent briefing. I’ve conducted a short review of the technology and business issues around the product and conclude that its unlikely to be competitive with Docker libnetwork that was announced a few weeks back.
What is it ?
Its an open-source project sponsored/owned by Metaswitch that promotes a model of
programming Open vSwitch on Linux (see below for response from Calico team). using BGP as an API to configure Linux iptables for FIB manipulation and forwarding control. This provides an the low quality baseline for an SDN solution for networking in Linux / KVM / OpenStack for cloud services.
On each compute host, Calico uses the proxy ARP technique to intercept all ARP requests from each workload, returning the MAC address of the compute host as the next hop. As Calico is responding to all ARP requests from a workload, there is no distribution of MAC addresses between compute nodes and, hence, none of the usual proxy ARP scalability issues arise. Frequently Asked Questions — Project Calico 0.27 documentation
This is a substantially different approach to most other SDN solutions. ARP hijacking does present operational risk to certain types traffic loads in the enterprise based on my experience.
Why Use It ?
- Because BGP is the solution to every networking problem ever.
- Because high quality BGP code exists in open source libraries and reduces development time for Yet Another Network Controller.
- Because you don’t believe in encapsulation or overlays.
- Because you want a network stack that works in OpenStack and Docker.
- Because you only have a data centre LAN.
- Because Metaswitch has a product of interest or you believe that Metaswitch can build a community around Project Calico that is better than Docker, Weave, NSX, Contrail, ACI, Nuage Networks or other SDN solution for Linux hosts.
Reference: Why Calico? | Project Calico
Licensing Project is open source with an Apache license but Metaswitch is the project owner and controls contributions and some rather odd patent assertions that are onerous and need further investigation. I’m guessing that Metaswitch wants to mainline community contributions into its product for service providers and didn’t keep the lawyers under control or carefully watched. Contribution Guidelines
Security I have concerns around the integrity of Calico as cloud infrastructure. The use of next hop addressing means that spoof attacks could be a practical attack vector. Calico does configure iptables on hosts but this doesn’t protect against spoofing. While Calico implement a endpoint security as a form of stateless firewall using profiles, it doesn’t seem to address in network attack surfaces. Would require more research to confirm interpretation.
Server Resource This might be solved but the resource costs of holding a large BGP table and metadata in the server OS needs research. Servers are high performance and networking in Linux is really fast, but I would like some more data about performance impacts.
Data Centre Only The Calico solution relies on Linux host running all VMs. The documentation discusses gateways and access to the main network but I could not easily establish how connectivity to external networks is managed. A data centre require seamless integration to WAN, Wireless or Internet/DMZ networks and must be seamless, therefore incurring substantial technical debt.
Testing These Regressions
BGP BGP is a fine protocol for distributing state especially when there is limited CPU or memory (which is not true in 2015). I take the general view that it is not a good protocol for configuration, monitoring or API manipulation. Much better alternatives such as NETCONF, OpenFlow/OBSDB, exist and are well proven.
Because you cannot query BGP instances and validate or test that data is consistent or coherent across all instances. Using BGP to share state requires dozens of other protocols to monitor, analyse and operate ? Complexity like this must have a payback elsewhere to be worth building technical debt like this.
Encapsulations Networking has been using overlay networks through protocol encapsulation for more than 30 years. VLAN tagging, MPLS Tagging, GRE, IPSec, SSL VPN are the most common and there are dozens of other lost to history. Overlay networking isn’t new either and overlay networking in SDN is widely implemented for good technical reasons. I reject totally the assertion that overlay networking isn’t the best solution. Issues around tunnelling are directly related to the limited capabilities of path management and decades old routing protocols which have limited functions. Recursive paths are routing problems not limitations of encapsulation “per se”.
Docker and VM networking Docker has introduced native networking support. It works with lots of existing products. Its even an enabler for Project Calico but mostly people will be using Cisco / VMware/Nuage for running their data centers. There are dozens of SDN solutions that offer this feature, Calico is just another one with relatively limited features at low cost.
I did some searching on Reddit and Hacker News, and decisions on using Project Calico seems to focus on what developers have “found” rather than rational and knowledgeable decision making. Signs of adoption are positive for the project but I’m not convinced that it can achieve momentum since Docker has announced a native solution.
The technology assertions don’t make sense to me. The lack of overlay networking and the use of BGP as a programming interface is a substantial technical debt that isn’t offset by gains in other areas.
It is my perception that Metaswitch is focussed on building solutions for Service Provider (SP) Market and targeting Calico as an SDN solution to support its NFV products. In this context, the use of BGP and lack of overlays may be appealing to those customers but for cloud developers or enterprise customers, the product doesn’t seem to be relevant.
- CLAIM ↩
Response from Project Calico
The team at Project Calico have posted a useful response which addresses some of the points raised in this piece. In particular, I made an error about Open vSwitch
Firstly, Calico does not program, or use Open vSwitch in any way. In fact you can remove Open vSwitch in a Calico network, as its functions are completely bypassed. Instead, Calico programs the native routing function of the Linux kernel. Secondly, we do not use BGP as an API, we use it as it was intended, a routing protocol that tells other Calico nodes (and the rest of the infrastructure) where workloads are at any given point in time.. Our API is in the form of an etcd data model. Our use of BGP should be totally opaque to anything trying to program Calico.