Question from Reddit:
So a route goes down, there’s no feasible successor, and the router queries its neighbours, who in turn query their neighbours, and so on until it reaches every EIGRP router in the domain. During this time, the routers will be running timers counting down from 3 minutes, and if they don’t hear a response in that time, they’ll tear down their neighborships
“neighborships” is my mental word/voice for neighbours with routing peers or relationships. I know it’s not a real word. Whatever.
What I don’t understand is… how could this ever happen in real life? I mean, in what world would it take more than 3 minutes for a query to reach the end of an EIGRP domain??? Is there some very important factor that I’m not taking into consideration here – something that could explain how this could ever be something to worry about?
Once up a time, I worked on a WAN that had a mix of ATM, Frame Relay, ISDN and Satellite links. With ISDN and Satellite links, you have limited bandwidth and long delays. With ISDN, the delay was often due to ISDN dial up connections as well as low bandwidth (6v or 128 kbps). For Satellite, bandwidth and packet loss varied according the atmospheric conditions, and number of users active.
(Hopefully I’ve remembered this correctly)
We ‘discovered’ we made a grave mistake of choosing EIGRP for this network when nearly all the EIGRP routes would be SIA during the day. All EIGRP routers in an AS must confirm receipt of route update and the route is held in active state until its confirmed.
- While the delay wasn’t more than 5 seconds, packet loss could cause re-transmissions.
- EIGRP paced the delivery of its messages to avoid congestion with user data slowing the convergence time.
- CPU to generate EIGRP updates was major factor, some devices were 100% handling updates. (Remember ISDN dial up causes route flaps. )
- The longest path across the WAN would cover ten or more hops.
The longest path across the WAN would cover ten or more hops.
When I arrived in London in the early aughts, I interviewed with a ‘big bank’ who had this problem too. Basically as the network got larger and more complex, there were enough changes in the EIGRP state table that replication and acknowledgement could not be completed. This problem was partly related to overloaded bandwidth (printing at 300dpi from head office print servers usually) preventing route updates but mostly due to network stability propagating a lot of route updates.
I haven’t kept up with EIGRP but I think that there are some changes that reduce the problem. Something like OSPF Stub areas which reduce the size of the computation domain and the need for updates to reach every device in the AS. Also, modern routers have much more computing power (CPU & RAM) and improve the device performance.