The great thing about working on a Global network is that is sounds super when people ask you what you have been doing this week and you say, “I brought the Australia office live on Tuesday night, did some upgrades in Hong Kong on Wednesday morning, then some design work for London the rest of the week”, usually a look of astonishment on peoples faces wondering how you can get about so much. However working on a Global network, especially if you are making changes in-band and you don’t have the facility to access the system via a remote console or have remote power control(For people who do have such infrastructure I am insanely jealous), then you need have a few outs to keep you out of trouble. I thought I would share some useful tips that help minimise risk for you when doing remote changes.
99% of equipment I work on is Cisco, therefore all the tips are Cisco centric.
“Reload In “
This is a belts and braces command, but rest assured if you make a configuration change which kills your connection, locks you out because of an authentication mistake, prevented access from an access list change, then this will reload the system and put it back to the configuration prior to your changes.
Now for anyone jumping from lab environments to real worlds environments then this is not an end device friendly command particularly with switches in mind; If I am ever making a change in remote location where I feel it is necessary to use this as precautionary measure, I make sure the risk is highlighted on any Change Control Process documentation that the entire switch may need rebooted,†effectively†powering down phones, killing server connections and everything else connected into it.
Remember to “Reload Cancel” to stop the reload after completing your work!!!!!
AAA-NEW model and TACACS.
You don’t need to paste the whole AAA command set in to verify your TACACS has synced up ok with the TACACS server, where you could easily lock yourself out, here are some basic step to prove Device to TACACS is setup ok
- Define a local user name password
- Setup VTY login access (username and password)
- Connect to th devices over SSH or Telnet and test User name and Password
- Enable AAA New model
- Setup the TACACS server
- Switch on Debugging if required
- Switch on Terminal Monitoring
- Enable TACACS authentication for login, and default back to local
- Start a separate session and test,
- Once login has been confirmed via TACACS you can then process with other TACACS AAA commands
username john privilege 15 password donttellanyone
line vty 0 15
#connect to device and test
tacacs-server host 192.168.1.50 timeout 15 key mykey
#timeout ----the default is 5 seconds, if you ssh to a device any†reasonable†distance away then 5 second is never enough time to enter username and password. try 15
aaa authentication login default group tacacs+ local
Now start a new session and verify you can logon via TACACS, if you can’t it will fall back to local user name, and you still have EXEC access in your current session.
If†successful†you can now do your other AAA commands with confidence.
Transport output and vty access-list
Some times you can kill access through the primary path, but still have access via another device on the site. However you may still not be able to Telnet or SSH from that device. You try to Telnet †”% telnet connections not permitted from this terminal”. To allow Telnet or SSH from the device you need to have transport output configured,†also the device you are accessing may have security preventing access through a different interface, remember to check if an access-class is specified against the VTY or access-group against the interface. Before starting the change make sure your alternative path is open.
As always this should be documented as part of any change process and security reapplied after the change.
line vty 0 4
transport output all
Security Access-list changes
I have been burned before with ACL changes and personally always have “reload in”, having said that I don’t think I have ever had to reload since being burned. I try not to modify an active ACL, I will start with a new copy, make the changes, apply the new ACL then delete the old ACL from the config. Alternatively I remove the ACL from the interface, but this is not†always†possible due to security policies.
There are lots of hardware features that allow hot swap and†fail-over†of equipment, however it wouldn’t be the first time for one of these to throw a flakie and kill the system, or for an engineer to knock out a power cable or network connection; software changes also run a risk; recently I have seen a span port command take out a core switch “IN THE MIDDLE OF THE WORKING DAY”.
The point is that making changes during working hours increases impact if something goes wrong, yes we could get into a long discussion about 24/7 operations, the point is that you need to understand the environment you are working on and the impact to the business if things don’t go to plan.
Now I would never have expected setting a SPAN port to reload a switch, but the impact this had during the middle of the day was huge, if it had happened in the evening for this customer, it would still be an issue but the impact would be much less.
Use the working day to plan the changes, and I mean plan, each command that needs to be entered should be prepared beforehand, not high level stuff like create vlan xxx on switch Y.
Here is a part of†typical†change I had written for adding a new vlan (removed customer specific info) I also had the back out commands documented(not included here); when implementing a change I do not want to be “thinking” about what I need to do, I want to follow a script. For more complicated changes I would look to†rehearse†the script in a test lab.
#logon to 6509_1
spanning-tree vlan 136 root
interface vlan 136
ip address 172.22.136.2
standby 1 ip 172.22.136.1
standby 1 priority 110
standby 1 preempt
#logon to 6509_2
show vlan ----confirm 136 is present over VTP
spanning-tree vlan 136 root secondary
interface vlan 136
ip address 172.22.136.3
standby 1 ip 172.22.136.1
standby 1 priority 105
If you are working on a large network then there should be some sort of change control process, don’t be afraid to highlight the potential for things going wrong and let the business decided if it should go ahead, if it goes wrong then you are covered, if it goes fine you are also covered. Last year I had the fun job upgrading Cisco 4006 to a SSH enable version of CatOS 7.6. I had highlighted that with any software upgrade there is a risk of the device not coming back online after a reboot and it may be prudent to have an engineer attend site. The business decided to take the risk (verse the cost of an engineer) , and for 33 of 36 upgrades all was good, on 3 occasions the switches didn’t come back very well, 2 needed a power off/on which was done with a local site contact, one needed an engineer called out to re-apply the config†which did disrupt the next business day for that site.
This was frustrating, but process was followed and the business took the risk on board, and all the processes followed, so no backlash.
I hate performing remote software upgrades, however the following minor tips may help a little.
Do not use TFTP to transfer the image. (use a TCP based process like FTP)
use /verify option
- copy /verify ftp: flash:
or (newer 3750)
- archive download-sw /safe
Then fingers crossed on the reboot!
When you are use to being touching distance from equipment, it can be easy not to consider what happens if you lose connectivity during a change because you are so use to having this local access. Jumping from this local environment to a more†geographically†disperse†environment means you now have to consider what happens if you lose connectivity, hopefully these simple tips will help you reduce the risk of cutting off access to the device you are configuring.