I was asked to help upgrade a Vyatta OFR recently and broke a number of cardinal rules about doing a router software and configuration upgrade. Granted, it's been a few years since I was personally responsible for doing a router upgrade and I figured it was like riding a bike. Well, as it turns out I may have to put back on my training wheels!
Here are the rules for upgrading a critical piece of network infrastructure like a router, especially if you want to keep your users happy:
- Upgrade after hours
- Test the software
- Verify the configuration
- Have a fallback plan
I did my upgrade during office hours when users were active on the router (although I did warn the users about the upgrade and told them to expect "a few minutes of downtime" :). I did not test the software upgrade beforehand nor did I verify the configuration. Lastly, of course I did not have a fallback plan. Bad, bad, bad, bad.
So, I took down the OFR and proceeded to load the new packages and software updates. The software came right up and all was well. Feeling good at this point.
The next step was to load up the configuration. The plan was to copy the configuration file to the OFR and away we go.... The problem was that I edited the configuration file using MS Notepad and that inserted some control characters (^M, carriage return) to each line. The OFR did not like those characters and it took me a few minutes to figure that out. Time was ticking away and the good feeling I had was waning. I tried to edit the MS Notepad using vi on the OFR (and I'm pretty good at vi). For whatever reason I was having terminal console issues (I'll just blame Hyperterm) and the editing was not going well. We were far past the "few minutes of downtime" and I did not have a fallback plan. Users were getting restless and the only plan was to go forward.
After a few more minutes of struggling to edit the configuration file and tweaking my Hyperterm settings to no avail, panic started to set in. I don't panic especially well (who does really?!?) and decided that the best plan at this point was to re-type in my configuration file manually. That task seemed a bit daunting as the OFR configuration was over 400 lines! I got started by entering my network addresses and then immediately firing up DHCP and NAT so the users could get back on-line. Once I verified that the users could get to google.com again the panic subsided considerably. Most of the rest of the configuration was for the firewall, so we were a bit exposed for a few minutes as I frantically typed firewall configuration commands.
Nevertheless, the upgrade got done and the users are happy. Next time, I'll make sure I follow the cardinal rules. That is, if anyone ever lets me near an OFR upgrade again :)