Friday, March 12, 2010

Blessay: Autonegotiation on Ethernet — It Works, It Should Be Mandatory!

March 12, 2010 by Greg Ferro · 27 Comments 

I found this quote recently:

“The switch is con­figured to auto­de­tect the speed and duplex set­tings on an inter­face. However, there are sev­eral things that can cause the autone­go­ti­ation pro­cess to fail, res­ult­ing in either speed or duplex mis­matches (and per­form­ance issues). The rule of thumb for key infra­struc­ture is to manu­ally hard-​​code the speed and duplex on each inter­face so there is no chance for error. ”


THIS IS VERY VERY VERY WRONG

Let me say it again: This is myth­in­form­a­tion.

Cisco(1), Sun(2), Dell(3) and all major man­u­fac­tur­ers RECOMMEND auto-​​negotiation.

How did the myth get started ?

Once upon a time ( or at least, around 19967 ) , there was no Ethernet autone­go­ti­ation stand­ard, but every­one agreed this would be a good idea. At that time, Intel and 3Com were the biggest man­u­fac­tur­ers of net­work adapters, and Cisco /​ 3Com /​ Nortel were the main switch vendors. So 3Com imple­men­ted their ver­sion and 3Com adapters worked on 3Com switches. Intel came up with their ver­sion and Intel adapters worked with Cisco switches. Nortel switches didn’t really work with any­one con­sist­ently. And the OEM equip­ment didn’t do much at all, it was hard set­ting or nothing.

This was a massive prob­lem. In fact, I remem­ber entire cor­por­ate net­works had their net­work adapters swapped out of every com­puter. No-​​one really knew which adapters and switches could inter­op­er­ate, and inter-​​switch con­nec­tions were a real concern.

So we all just con­cluded that hard set­ting Ethernet speed and duplex was the best option.

Then the 802.3u stand­ard was rat­i­fied in 1998 and new equip­ment from all the vendors star­ted to come into the mar­ket after that time that was com­pli­ant with 802.3u and things got bet­ter. But we still had a lot of that older equip­ment around, and those net­work adapters were quite expens­ive back then.

So we all became used to hard set­ting Ethernet speed and duplex as the best option. But all of that prob­lem­atic hard­ware should be gone by now, and we can con­sider chan­ging the way we work.

Why doesn’t every­one know ?

I sus­pect that this is one of the basic top­ics that no one ever researches. Everyone just “knows” that you always hard set duplex and speed. These days it is pretty much a man­tra, espe­cially among telco oper­a­tions and some server teams.

Because we all “know” it seems nobody has taken the time to fix it.

And because many people con­fig­ure this by default, it forces every­one else to do the same.

Whats the problem ?

A major prob­lem is that many people are also hard set­ting Gigabit Ethernet , and this is caus­ing major prob­lems. Gigabit Ethernet must have auto-​​negotation ENABLED to allow nego­ti­ation of mas­ter /​ slave PHY rela­tion­shit­whp for clock­ing at the phys­ical layer. Without nego­ti­ation the line clock will not estab­lish cor­rectly and phys­ical lay­ers prob­lems can result.

From Sun’s Best Practices on Ethernet Auto-​​negotiation (2):

Disabling autone­go­ti­ation can res­ult in phys­ical links issues going undetec­ted. The Fast Link Pule pro­cess does some test­ing for the phys­ical link prop­er­ties as well as nego­ti­ation on sev­eral Ethernet properties.

  • Unable to detect bad cables
  • Unable to detect link failures
  • Unable to check link part­ners capabilities
  • Unable to move sys­tems from one port to another or to another switch or router
  • Unable to determ­ine per­form­ance issues on higher layer applications
  • Unable to imple­ment Pause Frames (Flow Control)(4)

From the IEEE standard:

All 1000BASE-​​T PHYs shall provide sup­port for Auto-​​Negotiation
(Clause 28) and shall be cap­able of oper­at­ing as MASTER or SLAVE.
Auto-​​Negotiation is per­formed as part of the ini­tial set-​​up of the link, and
allows the PHYs at each end to advert­ise their cap­ab­il­it­ies (speed, PHY
type, half or full duplex) and to auto­mat­ic­ally select the oper­at­ing mode
for com­mu­nic­a­tion on the link. Auto-​​negotiation sig­nal­ing is used for the
fol­low­ing two primary pur­poses for 1000BASE-​​T:

a) To nego­ti­ate that the PHY is cap­able of sup­port­ing 1000BASE-​​T half
duplex or full duplex transmission.

b) To determ­ine the MASTER-​​SLAVE rela­tion­ship between the PHYs at
each end of the link. 1000BASE-​​T MASTER PHY is clocked from a local source.
The SLAVE PHY uses loop tim­ing where the clock is recovered from the received data stream.
My emphasis

Future Considerations

There are new stand­ards for 10 Gigabit but the new Converged Enhanced Ethernet (CEE) (Cisco is developed a super­set called Data Centre Ethernet©) will require each side of GigE con­nec­tion to nego­ti­ate cap­ab­il­it­ies relat­ing to pause, cap­ab­il­it­ies, queues and a num­ber of other crit­ical features.

If you haven’t already, you should con­sider chan­ging your stand­ards to use auto-​​negotiation every­where because it is a pre­requis­ite in the new technology.

I make an espe­cial plea to Service Providers to change their stand­ard prac­tice to using Autonegotiation on Ethernet. We have to move with times and mode

Operational Benefits

But most of all, setup auto-​​negotiation so I don’t have to think about con­fig­ur­ing ports for each server. I am going nuts here with my staff hav­ing to manu­ally con­fig­ure every frick­ing port whenever a server moves around. It a ridicu­lous waste of resources.

Plus it makes you more suc­cess­ful, since hard set­ting means that more mis­takes will be made.

Comments invited, I would love to dis­cuss this more. Tell all your friends!

Reference

(1)Cisco recom­mends to leave auto-​​negotiation on for those devices com­pli­ant with 802.3u.This would be every device since 1999 or so.

(2) — Sun Ethernet Autonegotiation Best Practices

Wikipedia The debat­able por­tions of the autone­go­ti­ation spe­cific­a­tions were elim­in­ated by the 1998 release of 802.3. This was later fol­lowed by the release of IEEE 802.3ab in 1999. The new stand­ard spe­cified that gig­abit Ethernet over cop­per wir­ing requires autone­go­ti­ation. Currently, all net­work equip­ment man­u­fac­tur­ers — includ­ing Cisco[1] — recommend to use autone­go­ti­ation on all access ports. On rare occa­sions where autone­go­ti­ation may fail [2], it may still be neces­sary to force settings.

Cisco When con­fig­ur­ing an inter­face speed and duplex mode, note these guidelines:

•If both ends of the line sup­port autone­go­ti­ation, we highly recom­mend the default set­ting of auto negotiation.

(3) Dell — Gigabit Ethernet Auto-​​Negotiation

(4) Peter Lapukhov of Internetwork Expert pos­ted a good art­icle about why 802.3X pause frame should be dis­abled here

Please rate this post:

  Why Rate Posts?
1 Star - It\\\'s Crud2 Stars - It\\\'s Tosh3 Stars - Something\\\'s missing4 Stars - Needs works5 Stars - Good Enough6 Stars - Good7 Stars - Excellent8 Stars - Brilliant9 Stars - Astonishing10 Stars - Awesomely Godlike? (3 votes, average: 7.00 out of 10)
Loading ... Loading ...

Comments

27 Responses to “Blessay: Autonegotiation on Ethernet — It Works, It Should Be Mandatory!”
  1. I think age is the big thing with autone­go­ti­ate. I’ve seen NICs on serv­ers (usu­ally IBM) with man­u­fac­ture dates of 2002 fail miser­ably at autone­go­ti­at­ing (10/​half to 100/​full makes for a won­der­ful medium). With the eco­nomy dying a hor­rible death, I’m pic­tur­ing a lot of old serv­ers get­ting put back into ser­vice or being used for another few years to save a dollar.

    As you said, a 1G NIC needs auto, so time will heal all wounds.

  2. cciepursuit says:

    Great art­icle. I did not know how GigE uses auto-​​negotiation. Cool information.

    There are 2 reas­ons that my com­pany still requires hard set­ting speed and duplex on server ports (user ports are all auto/​auto):

    1) $$$. Switches with GigE ports cost more, there­fore our port charge for 1000Mbps is higher than the cost for 100Mbps. Nearly all serv­ers now have GigE cap­able NICs. If we didn’t hard set our ports then the vast major­ity of our serv­ers would auto-​​negotiate to GigE. I per­son­ally don’t care, but the bean coun­ters do care.

    2) If the auto-​​negotiation pro­cess fails, it will set the duplex to half. This prob­lem is usu­ally caused when one side is hard set and the other is using auto-​​negotiation. In our case, our billing require­ments usu­ally res­ults in this issue. We hard set our port to 100/​Full and the server uses auto/​auto. Everything looks good until we get the frantic call from the server owner curs­ing out the net­work. A quick ‘show inter­face gx/​x counter error’ will usu­ally show col­li­sions. Then we begin the awk­ward and pain­ful dance of “I know that you don’t have your server NIC set to 100/​Full but we’re going to argue about this for an hour until I set my side to auto/​auto to prove you wrong.”

    Because of issue 1, we run into a lot of issue 2. It is the bane of my exist­ence that I don’t have access to the NIC set­tings on serv­ers. It would make a good 20% of my troubleshoot­ing go 90% quicker. I could then quickly check for the big three server prob­lems: incor­rect speed/​duplex set­tings, incor­rect sub­net mask, and incor­rect gateway.

    Anyhoo…great art­icle. I agree that auto-​​negotiation should be used if possible.

    • Greg Ferro says:

      What are you going to do when the new Gigabit becomes the stand­ard ? CEE and DCE is com­ing, and will prob­ably fil­ter down from the Cisco Nexus to Catalyst 6500, 4500etc and you will need autoneg then. I can’t see the neg part being optional.

  3. ahenning says:

    I have to agree with the author, I’ve had issues where switch trunk links (3500−2900) were set both sides to 100/​full. Show inter­face out­put dis­plays 100/​full no errors/​collisions, but through­put is exactly 10/​half with through­put test. Make both auto/​auto and though­put goes up to 100/​full.

  4. nevot says:

    Hi,
    In eleven years admin­is­ter­ing net­works I’ve seen autone­go­ti­ation fail again and again. I won’t let my net­work under auto-​​negotiation under no circunstances.

    • Greg Ferro says:

      You might have poor cabling then. If you don’t have qual­ity cabling then autoneg can be a prob­lem. And if you have poor cabling, then autoneg is the least of your problems.

      • Anonymous says:

        I dis­agree. All our install­a­tions are CAT-​​6 veri­fied and cer­ti­fied. Try to con­nect a Nokia Firewall to a cisco 2950 with autoneg and you will soon run into trouble. Problems also detec­ted with Avaya Gateways. If you work also with embed­ded equip­ment (such as build­ing alarms, air con­di­tion­ing con­trols, etc) con­nec­ted to your net­work, you will see also problems.

        My cri­teria is: access ports for users, autoneg. Access ports for serv­ers and other equip­ments, forced 100-​​full. In gig­abit net­works I run autoneg.

        • Greg Ferro says:

          No, the prob­lem is the Nokia fire­wall. They are infam­ous for being stu­pendously awful at doing this. A not­able excep­tion to the rule.

          You are actu­ally fol­low­ing my recom­mend­a­tion. Use autoneg where you can, and hard set where you must. Embedded equip­ment typ­ic­ally use really old and cheap chip­sets so I would expect them to be a problem.

          I would bet from your descrip­tion that more than 90% of your ports are auto ?

  5. shef says:

    I saw prob­lems prob­lems with new avaya ip pbx box and new 2960.

    • Greg Ferro says:

      One prob­lem in how many ports in your network ?

      You need to be care­ful not to let that over­whelm your per­cep­tion of the prob­lem. yes ?

  6. nevot says:

    Nokia Firewall, Avaya PBX, toshiba tecra s3… in fact, I only trust in auto­sensing when same man­u­fac­turer equip­ment is in both ends (and not much trust­ing because, as you said, cabling is not always as we expect).

    yes 90% of the net­work is auto­sensing, because 90% of the net­work is users. But the point is not how much % is in auto­sensing. Not quant­ity, but qual­ity. Just as I poin­ted: for infra­struc­ture, forced. For end users, auto­sensing. Having prob­lems on one end point affects only one user. Having prob­lems on one server port or one trunk affects lots of users.

    worst case: i’ve seen prob­lems con­nect­ing a PC to a Juniper fire­wall, just because they both have auto-​​mdix inter­faces. Yeah, I know, it’s not part of this stand­ard, but no link between this devices, with a per­fect cable. dis­con­nect­ing and con­nect­ing sev­eral times until one of them knows the cor­rect mdix. Just dis­abling auto-​​mdix on the pc solved the prob­lem. My poor con­cep­tion of this being ‘auto’ is based on many prob­lems found again and again. This pain will sure be less and less with time.

    One note: Have you reviewed LLDP? one of its char­ac­ter­ist­ics is hav­ing inform­a­tion between switches and end equip­ment (such as avaya ip phones) about, among other things, speed and duplex con­figured. LLDP fits media selec­tion not only on phy layer but net­work layer. I think LLDP its also a must in future for all net­work equip­ment. Why did any­body design­ing LLDP (or CDP, or NDP or any of the vari­ants that con­verged on LLDP) think about the neces­sity of advert­ising media cap­ab­il­it­ies avail­able and used on a upper pro­tocol if autoneg is so won­der­ful? Sure (I think) because they have seen people run­ning in trouble again and again.

  7. nullrouter says:

    Good art­icle on autonegotiation

    I abso­lutely agree with using the inher­ent autone­go­ti­ation on 1000baseT links.

    However, with Cisco optic links, I’ve had to rely on the ‘speed none­go­ti­ate’, which dis­ables the link nego­ti­ation, on gig optic ports to get them to come up. I also seem to recall Cisco pub­lish­ing a bug on optic ports without this com­mand being enabled.

    I work in a car­rier IP/​MPLS/​MAN envir­on­ment, where we man­age con­nec­tions between our Cisco switches and cli­ent equip­ment of vari­ous types. We hard code any­thing less than 1000baseT, as most cli­ents are run­ning Fast Ethernet ports on their equip­ment still. I’ve seen autone­go­ti­ation fail, and devices come up in half duplex after a power hit at a cli­ent site, caus­ing us to get unne­ces­sary packet loss/​throughput faults.

    Personally from exper­i­ence, I would recom­mend only run­ning autone­go­ti­ation on any­thing less than 1000BaseT, where you have admin­is­trat­ive con­trol on both ends of the link.

  8. Dan says:

    Hi

    I don’t agree with this art­icle, mainly because my exper­i­ence, even recently on hp desktops con­nec­ted into either 3560 or 3750 switches has been pain­ful when using auto-​​negotiate on either fast or gig eth­er­net. Basically either duplex or speed is nearly always incor­rectly set. So now I don’t eeven bother rely­ing on auto-​​negotiate, I just manu­ally set the speed and duplex.

  9. Josh Horton says:

    Greg,

    I recently ran into an issue where this art­icle was a HUGE help. Thanks.

    I had a vendor’s router con­nec­ted to a customer’s switch. Both were hard coded 100/​full but for some reason, the con­nec­tion was hor­rible. The strange thing is that only the vendor’s router dis­played any errors. Anyways…

    After set­ting both sides to auto the prob­lem was solved. Had I not read this art­icle, I would have stared at the prob­lem for a month or so say­ing, ” Well, both sides are hard-​​coded… I don’t know what to tell you. Your router must be junk … :)

    I guess old habits die hard. Thanks again!

    Josh

  10. zakki a says:

    Greg,
    Great art­icle. In term of stand­ard, yes auto nego­ti­ation the first we have to try. but auto nego­ti­ation fail­ure is not a myth. it is true.

  11. Scott says:

    I work at a major vendor of server applic­a­tion soft­ware that pretty much requires GigE through­puts, and while I agree with you in theroy, the prac­tical is that it doesn’t work that way in the real world. There are two hard­ware vendors that make 90% or bet­ter of server NICs and only a hand­ful of vendors who make 90% or bet­ter of the switches out there. The com­bin­a­tions of sev­eral of these simply do not work when set to full Auto on both sides. If there was bet­ter enforce­ment of the IEEE stand­ards, we could all play in your ideal world. But the truth is that many of these vendors in ques­tion need ser­i­ous help. I don’t know how they can get away with char­ging server NIC prices in NICs that have the same bugs (or worse) than simple desktop NICs.

  12. Dmitri says:

    Flow con­trol on GigE ports on 3550 sucks. It starts send­ing pause frames *way* before a link starts approach­ing full capa­city. Not sure about other equip­ment, but the said exper­i­ence with 3550 did not make me happy (and aware).

  13. Dennis says:

    Why do I want to hard set EVERY 10100 port in the network?

    1) Last week, Cisco 2950 switch and Cisco 2800 router both powered up after an out­age. Cisco did not suc­cess­fully autone­go­ti­ate with Cisco switch. Worked well enough that no one noticed til trad­ing star­ted over that link.

    2) Panasonic Toughbooks, circa 2004/​5, and Cisco 4500 10100 port. Don’t remem­ber the eth­er­net chip­set vendor for the Panasonics off­hand. Would not ever, under any cir­cum­stances, nego­ti­ate prop­erly. We had over 100 of those laptops in our com­pany. PITA!

    As you say, Gig is a dif­fer­ent animal, but hard set­ting speed/​duplex is a good prac­tice, learned by many of us in the school of hard knocks.

    m00tpoint

  14. Tim says:

    If only the world was that per­fect Greg

    I’ve had LOADS of prob­lems with autoneg.
    I’ve NEVER had a prob­lem where both ends are prop­erly nailed.

    No-​​brainer IMHO, why take the chance? Especially if you don’t have access to both ends to check that it auto’d cor­rectly _​this_​ time. (Just ‘cos it got it right last time doesn’t mean that it will again)

    I con­cede Gig is very dif­fer­ent and shouldn’t be tared with the same brush.

    Tim

    Tim

    • Greg Ferro says:

      I have had all sorts of prob­lems, but ulti­mately, auto-​​neg is the best default con­di­tion. I move to hard-​​config when there is a prob­lem. That is my recom­mend­a­tion to everyone.

  15. Pushkar Bhatkoti says:

    good research. The point is if any vendor doesn’t fol­low the stand­ard it’s not your fault. Your concept still works e.g. leave auto-​​neg on.

    well writ­ten.

    –Pushkar Bhatkoti

  16. joe says:

    I come from the telco world and school of hard knocks. Yes, if the pair­ing of equip­ment requires auto, or if we are talk­ing Gigabit ether, then you must auto.

    Denis hits the nail on the head, you have an unplanned reboot, then the sucker goes to some crappy set­ting, and no one notices until the busi­ness day hits. The through­put sucks, and the issue can be tran­si­ent in nature. Once you fig­ure out auto bit you in the ass one more time, man­age­ment says “Damn it, turn off auto”. As as man­age­ment usu­ally has a few more gray hairs, they still remem­ber the bad old days.

    So in the real world, non-​​gigabit will be expli­citly set if you want to keep your job.

    A red her­ring, some folks will say you can’t man­age all the set­tings with out pro­vi­sion­ing errors. That is a gap in your server man­age­ment pro­cess, not a a byproduct of expli­cit set­tings. Are your con­fig­ur­a­tion files in /​etc set to “auto”? I don’t think so.

  17. pete bateman says:

    I would sug­gest that your exper­i­ence is all quite recent. In the early days of full duplex Ethernet and Fast-​​Ethernet inter­op­er­ab­il­ity between vari­ous man­u­fac­tur­ers imple­ment­a­tions of the stand­ard was pretty bad. I.E a 3com Nic to 3com switch might be fine. Plug a Nortel router into your switch and it would all go south. I have per­son­ally seen this many times. Us old hands got bit­ten by this so often that we always recom­mend nail­ing the port con­fig down on inter­links and serv­ers. i.e. if a desktop coems up wrong its a minor issue, if you server sud­denly decides its going to be half-​​duplex you are in deep what­sits. I have seen this hap­pen. More recently you are right, it gen­er­ally just works, but don’t tell every­one that its infallable,cos it ain’t.

    • Greg Ferro says:

      By Crom, you missed the point com­pletely. The first para­graph clearly says it was a prob­lem (when I was a younger engin­eer) then I say it isn’t a prob­lem any more.

      The most likely cause of the duplex mis­match today is faulty cable. Not a hard­ware prob­lem with Nortel, 3Com or any­one else. And I explain the nego­ti­ation pro­cess so that you can under­stand why.

      Of course, you are smarter than Cisco, Sun, Dell and many other people because you don’t believe or even men­tion the evid­ence presen­ted here.

      I don’t have a very high opin­ion of your com­pany and you are con­firm­ing that very concept. Please pay atten­tion and read the art­icle before you criticise.

Trackbacks

Check out what others are saying about this post...
  1. […] Greg Ferro pos­ted a nice write-​​up about Autonegotiation on Ethernet — It Works, It Should Be Mandatory! […]

  2. […] die hard holds true to this one. Quite a while ago I read an art­icle by Greg Ferro regard­ing the force speed and duplex myth. Today I stumbled on Terry Slattery’s blog regard­ing the same “autone­go­ti­ate duplex or […]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!