Thursday, March 18, 2010

Why Does iSCSI Use TCP Instead of IP ?

November 26, 2009 by Greg Ferro · 12 Comments 

I have been design­ing iSCSI back­bones lately, and it struck me that the one of the per­form­ance prob­lems with iSCSI is that it uses the TCP protocol.

A primary pur­pose of the TCP pro­tocol is so that OS can ensure that any dropped or lost pack­ets are handled by the TCP/​IP stack in the oper­at­ing sys­tem. The applic­a­tion can simply hand off the data to the net­work driver which will guar­an­tee the deliv­ery of the packet.

However, the use of TCP is pro­cessing intens­ive because buf­fers, order count­ing and check­sums are peformed. And given that data centre net­works are highly reli­able, it isn’t even neces­sary (If FCoE can be reli­able over Ethernet, then iSCSI can be reli­able over IP).

My ques­tion is, then, why hasn’t the stor­age industry developed a new pro­tocol that just uses IP ?

Precedent — DLSW

5DF3D09B-1AAF-4507-88C7-6C0DF9CE9A30.jpg
For those who remem­ber DLSW (Data Link Switching) for the IBM main­frame, there was a spe­cific pro­tocol developed that allowed for very fast trans­port of SNA data to the mainframe.

The FST (Fast Sequenced Transport) was an IP Protocol that was state­less and sequenced for car­ry­ing bridged or Layer 2 data over IP net­works. Because it was smal­ler and tool less CPU and time to gen­er­ate pack­ets, it reduced over­head and improved response time.

The net­work was then con­figured to provide QoS for the FST/​IP pro­tocol and thus ensured high performance.

Block trans­port over IP ?

The most com­mon cri­ti­cism of iSCSI is that is requires:

  • CPU resources from the Server and Storage Array to pro­cess the packets
  • adds latency dur­ing the TCP Checksum cal­cu­la­tion, thus redu­cing throughput

thus lower­ing the over­all per­form­ance of the stor­age sys­tem when com­pared to FibreChannel or FCoE.

If a new ver­sion of iSCSI was to remove the TCP header, and use only the IP header with a small amount of ses­sion inform­a­tion in the pay­load, it could address these issues.

Does any­one know if work has been done in this area ? And I’m won­der­ing why the stor­age industry hasn’t picked up on the idea because I’m still baffled by the choice to use Ethernet as pro­tocol to trans­port data. That idea was over in the 1980’s when IBM main­frames lost con­trol of their cus­tom­ers with closed systems.

Please rate this post:

  Why Rate Posts?
1 Star - It\\\'s Crud2 Stars - It\\\'s Tosh3 Stars - Something\\\'s missing4 Stars - Needs works5 Stars - Good Enough6 Stars - Good7 Stars - Excellent8 Stars - Brilliant9 Stars - Astonishing10 Stars - Awesomely Godlike? (2 votes, average: 6.50 out of 10)
Loading ... Loading ...

Comments

12 Responses to “Why Does iSCSI Use TCP Instead of IP ?”
  1. Did you mean “Fiber chan­nel” in the sen­tence ” by the choice to use Ethernet as pro­tocol to trans­port data”?

  2. Dmitri says:

    Hi Greg,

    SCSI pro­tocol is not designed to handle lost frames, or blocks. It expects them to arrive, no mat­ter if they’re dam­aged (have CRC errors). IP or Ethernet will not deliver a frame which has CRC errors; beha­vior which is not com­pat­ible with what SCSI expects. This is why iSCSI uses TCP.

    The way to improve per­form­ance of iSCSI is to use HBAs (Ethernet NIC) with TCP off­load, which do all the neces­sary cal­cu­la­tions on an embed­ded ASIC, sig­ni­fic­antly redu­cing the pro­cessing latency. Majority of stor­age arrays with iSCSI already make use of TCP off­load, so some of the per­form­ance prob­lems that you’re see­ing are quite likely caused by a poor choice of Ethernet NIC in the server (or incorrect/​badly con­figured NIC drivers).

    The reli­ab­il­ity of the FCoE is brought about by the enhanced end to end con­ges­tion avoid­ance mech­an­isms at the Ethernet level (DCE/​CEE exten­sions), which are simply not avail­able every­where where you would use iSCSI, such as over a L3-​​only link such as IP VPN, for example.

    • Greg Ferro says:

      Not exactly cor­rect. First, FC has an accept­able error rate of 1 in 106 because it is impossible to build a net­work that can never drop a packet. This has been deemed accept­able and will be retrans­mit­ted using FC error recovery.

      Second, pro­to­cols that guar­an­tee deliv­ery can be built and FCoE is an example of exactly that. but why guar­an­tee deliv­ery using Ethernet ? Why not use sim­ilar tech­niques for IP ?

      The 802.1aq QoS mech­an­isms will work eqaully as well for iSCSI as for FCoE, and could eas­ily be recon­figured for an IP Storage protocol.

      The use TOE HBA’s is also highly recom­men­ded, how­ever, the Storage industry has no ves­ted interest in using cheaper and sim­pler tech­no­logy and com­pre­hens­ively dis­missed these tech­no­lo­gies as toys or not significant.

      • Zed says:

        “the Storage industry has no ves­ted interest in using cheaper and sim­pler technology”

        this is so true. Hopefully Sun’s efforts shake things up a little

      • Dmitri Kalintsev says:

        > …FC has an accept­able error rate…

        iSCSI does not use FC, so there’s no FC error recov­ery mech­an­isms available.

        > …FCoE is an example of exactly that…

        If I under­stand it cor­rectly, FCoE does not guar­an­tee deliv­ery by vir­tue of E. It does it by the vir­tue of FC. There is no FC in iSCSI.

        > Storage industry has no ves­ted interest in using cheaper and sim­pler technology

        Storage industry (as any other out there) has ves­ted interest in selling as many units as pos­sible. If a younger com­pet­itor starts offer­ing solu­tions com­par­able in fea­tures and reli­ab­il­ity at much lower price point, lar­ger play­ers usu­ally have no choice but to adopt.

        (All my per­sonal opin­ion and under­stand­ing, of course).

        P.S. Looks like the “Notify me of follow-​​up com­ments by email” option on your blog engine does not work — I never got a noti­fic­a­tion of your reply.

        • Greg Ferro says:

          iSCSI is a simple pro­tocol that encap­su­lates SCSI block com­mands into an TCP/​IP packet. FC is not a part of that.

          FCoE assumes that Ethernet does not drop pack­ets. The applic­a­tion will retrans­mit as needed but if error rates exceed 1 in 1012 then per­form­ance is severely impacted and can lead to loss of service.

          PS — will check the follow-​​up com­ments — thanks for the tip.

  3. akg says:

    Why not use the Stream Control Transmission Protocol (SCTP) for block access over data net­works? It seems bet­ter suited to the task than TCP.

  4. Colin says:

    I agree with Greg. iSCSI over TCP is totally unne­ces­sary. iSCSI would have beaten FC for good if we had the followings:

    - Lossless Ethernet
     – A sim­ilar error recov­ery logic like FCP SRR, simple and effect­ive
     – Transport over IP

    Since the lossless Ethernet stand­ard is com­ing, maybe it is time to con­sider retir­ing TCP from iSCSI fammily.

  5. Dmitri Kalintsev says:

    Hi Greg,

    Here’s an inter­est­ing art­icle (but maybe you’ve seen it, but any­way): http://​blog​.fos​ketts​.net/​2​0​1​0​/​0​1​/​1​4​/​m​i​c​r​o​s​o​f​t​-​i​n​t​e​l​-​p​u​s​h​-​m​i​l​l​i​o​n​-​i​scsi-iops/

    1 Million IOPS using noth­ing but soft­ware iSCSI ini­ti­ator on Windows Server 2008 SR2 (and, of course, Intel Xeon 5500).

    • Greg Ferro says:

      It’s not new. Other makers of iSCSI TCP drivers have also per­formed well. But Microsoft is effect­ively bless­ing iSCSI as a viable method and that is the real news in this arti­cile. That is the point that Stephen is mak­ing in his article.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!