Why does iSCSI use TCP instead of IP ?

I have been designing iSCSI backbones lately, and it struck me that the one of the performance problems with iSCSI is that it uses the TCP protocol.

A primary purpose of the TCP protocol is so that OS can ensure that any dropped or lost packets are handled by the TCP/IP stack in the operating system. The application can simply hand off the data to the network driver which will guarantee the delivery of the packet.

However, the use of TCP is processing intensive because buffers, order counting and checksums are peformed. And given that data centre networks are highly reliable, it isn’t even necessary (If FCoE can be reliable over Ethernet, then iSCSI can be reliable over IP).

My question is, then, why hasn’t the storage industry developed a new protocol that just uses IP ?

Precedent – DLSW

5DF3D09B-1AAF-4507-88C7-6C0DF9CE9A30.jpg
For those who remember DLSW (Data Link Switching) for the IBM mainframe, there was a specific protocol developed that allowed for very fast transport of SNA data to the mainframe.

The FST (Fast Sequenced Transport) was an IP Protocol that was stateless and sequenced for carrying bridged or Layer 2 data over IP networks. Because it was smaller and tool less CPU and time to generate packets, it reduced overhead and improved response time.

The network was then configured to provide QoS for the FST/IP protocol and thus ensured high performance.

Block transport over IP ?

The most common criticism of iSCSI is that is requires:

  • CPU resources from the Server and Storage Array to process the packets
  • adds latency during the TCP Checksum calculation, thus reducing throughput

thus lowering the overall performance of the storage system when compared to FibreChannel or FCoE.

If a new version of iSCSI was to remove the TCP header, and use only the IP header with a small amount of session information in the payload, it could address these issues.

Does anyone know if work has been done in this area ? And I’m wondering why the storage industry hasn’t picked up on the idea because I’m still baffled by the choice to use Ethernet as protocol to transport data. That idea was over in the 1980’s when IBM mainframes lost control of their customers with closed systems.

  • http://blog.ioshints.info Ivan Pepelnjak

    Did you mean “Fiber channel” in the sentence ” by the choice to use Ethernet as protocol to transport data”?

    • http://etherealmind.com Greg Ferro

      No. I mean, choosing to transport block storage data over Ethernet (instead of some sort of IPv4 or IPv6 protocol). In this case, FCoE.

  • victor
  • Dmitri

    Hi Greg,

    SCSI protocol is not designed to handle lost frames, or blocks. It expects them to arrive, no matter if they’re damaged (have CRC errors). IP or Ethernet will not deliver a frame which has CRC errors; behavior which is not compatible with what SCSI expects. This is why iSCSI uses TCP.

    The way to improve performance of iSCSI is to use HBAs (Ethernet NIC) with TCP offload, which do all the necessary calculations on an embedded ASIC, significantly reducing the processing latency. Majority of storage arrays with iSCSI already make use of TCP offload, so some of the performance problems that you’re seeing are quite likely caused by a poor choice of Ethernet NIC in the server (or incorrect/badly configured NIC drivers).

    The reliability of the FCoE is brought about by the enhanced end to end congestion avoidance mechanisms at the Ethernet level (DCE/CEE extensions), which are simply not available everywhere where you would use iSCSI, such as over a L3-only link such as IP VPN, for example.

    • http://etherealmind.com Greg Ferro

      Not exactly correct. First, FC has an acceptable error rate of 1 in 10^6 because it is impossible to build a network that can never drop a packet. This has been deemed acceptable and will be retransmitted using FC error recovery.

      Second, protocols that guarantee delivery can be built and FCoE is an example of exactly that. but why guarantee delivery using Ethernet ? Why not use similar techniques for IP ?

      The 802.1aq QoS mechanisms will work eqaully as well for iSCSI as for FCoE, and could easily be reconfigured for an IP Storage protocol.

      The use TOE HBA’s is also highly recommended, however, the Storage industry has no vested interest in using cheaper and simpler technology and comprehensively dismissed these technologies as toys or not significant.

      • Zed

        “the Storage industry has no ves≠ted interest in using cheaper and sim≠pler tech≠no≠logy”

        this is so true. Hopefully Sun’s efforts shake things up a little

      • Dmitri Kalintsev

        > …FC has an accept≠able error rate…

        iSCSI does not use FC, so there’s no FC error recovery mechanisms available.

        > …FCoE is an example of exactly that…

        If I understand it correctly, FCoE does not guarantee delivery by virtue of E. It does it by the virtue of FC. There is no FC in iSCSI.

        > Storage industry has no ves≠ted interest in using cheaper and sim≠pler tech≠no≠logy

        Storage industry (as any other out there) has vested interest in selling as many units as possible. If a younger competitor starts offering solutions comparable in features and reliability at much lower price point, larger players usually have no choice but to adopt.

        (All my personal opinion and understanding, of course).

        P.S. Looks like the “Notify me of follow-up comments by email” option on your blog engine does not work – I never got a notification of your reply.

        • http://etherealmind.com Greg Ferro

          iSCSI is a simple protocol that encapsulates SCSI block commands into an TCP/IP packet. FC is not a part of that.

          FCoE assumes that Ethernet does not drop packets. The application will retransmit as needed but if error rates exceed 1 in 10^12 then performance is severely impacted and can lead to loss of service.

          PS – will check the follow-up comments – thanks for the tip.

  • akg

    Why not use the Stream Control Transmission Protocol (SCTP) for block access over data networks? It seems better suited to the task than TCP.

  • Colin

    I agree with Greg. iSCSI over TCP is totally unnecessary. iSCSI would have beaten FC for good if we had the followings:

    – Lossless Ethernet
    – A similar error recovery logic like FCP SRR, simple and effective
    – Transport over IP

    Since the lossless Ethernet standard is coming, maybe it is time to consider retiring TCP from iSCSI fammily.

  • Dmitri Kalintsev

    Hi Greg,

    Here’s an interesting article (but maybe you’ve seen it, but anyway): http://blog.fosketts.net/2010/01/14/microsoft-intel-push-million-iscsi-iops/

    1 Million IOPS using nothing but software iSCSI initiator on Windows Server 2008 SR2 (and, of course, Intel Xeon 5500).

    • http://etherealmind.com Greg Ferro

      It’s not new. Other makers of iSCSI TCP drivers have also performed well. But Microsoft is effectively blessing iSCSI as a viable method and that is the real news in this articile. That is the point that Stephen is making in his article.