Efficient Administration of Network Appliances
Beyond troubleshooting and packet analysis, part of my job is administration of the network monitoring tools we use to perform troubleshooting and packet analysis. One vendor we rely on quite heavily is Endace for their line of packet capture appliances. As part of my work with their gear, I’ve learned how to script things out for efficiency.
For capacity planning purposes, I’ve automated the process to obtain and format file system and packet capture information. Additionally, Endace recently released a hotfix for all of their devices, and I scripted things out to deploy it concurrently.
Overview of Endace Devices
The Endace architecture is relatively simple. There are probes distributed throughout the network that capture and store packets that are all managed by a Central Management Server (CMS).
The probes store databases full of packet data within an abstraction they call rotation files. These act like a ring buffer, refreshing the oldest 4GB block when the buffer fills up.
For most tasks on the system administration side, the CMS can distribute commands to all probes, groups of probes, or individual probes.
However, there are some cases where CMS orchestration doesn’t work, and installation of hotfixes is one of those. Hotfixes require the device to be in maintenance mode, which kills most of the client processes on the system, including the RabbitMQ client, the CMS orchestration mechanism. If I have 20+ devices on which to install a hotfix, requiring 2-3 minutes per device, not including waiting for a reboot to complete, I really don’t want to do that all in serial (as opposed to, and including, over serial) or manually. That’s where Python and netmiko come into play.
Netmiko Support for Endace
For those unfamiliar, netmiko is a Python library for network scripting. It adds a layer of abstraction on top of paramiko, the Python ssh library, to dumb things down enough for network engineers like me to work effectively. It allows us to send commands and view output without worrying about buffering input or output, figuring out when we can send more information, or writing regex to figure out if we’re in enable mode or configuration mode.
Since Endace is a relatively small vendor (as compared to Cisco or Juniper), netmiko doesn’t support it by default. Fortunately, Kirk Byers, the creator of netmiko, makes it really easy to add support for new operating systems. It took me a couple of days to build in support (available here). It took me a few more days to figure out how to make threading integrate with it to perform tasks on multiple devices in parallel.
At the end of that journey, I had a system to effectively distribute commands and collect information. For example, I wrote a quick script to extract information about the file systems and rotation files (basically PCAP files in a ring buffer made of 4GB blocks) to CSV files.
Extract Important Information
This script has 3 main parts. After initializing the CSV files it iterates through the devices:
for device in EndaceDevices.prod_probes:
connection = ConnectHandler(**device)
fsInfo = getFileSystemInfo(connection)
rotInfo = getRotFileInfo(connection)
connection.disconnect()
print_output_to_csv(fsInfo, fscsvName, fsTemplate)
print_output_to_csv(rotInfo, rotcsvName, rotTemplate)
First, it connects to the device and runs the relevant show commands.
def getFileSystemInfo(connection):
connection.enable() # Note how netmiko provides a method to enter enable mode
output = connection.find_prompt() # This is just here to output the device name
output += "\n"
output += connection.send_command("show files system")
return output
def getRotFileInfo(connection):
connection.enable()
output = connection.find_prompt()
output += "\n"
output += connection.send_command("show erfstream rotation-file")
return output
Then, it uses TextFSM to parse the output. TextFSM is a Python library developed by Google to throw regular expressions into a template whenever you have a block of text that will appear consistently. For example, the file systems template looks like:
Value Filldown Device (\S+)
Value Required TotalCapacity (\d+.?\d*)
Value UsedCapacity (\d+.?\d*)
Value UsableCapacity (\d+.?\d*)
Value FreeCapacity (\d+.?\d*)
Start
^${Device} #
^\s+Bytes Total\s+${TotalCapacity}
^\s+Bytes Used\s+${UsedCapacity}
^\s+Bytes Usable\s+${UsableCapacity}
^\s+Bytes Free\s+${FreeCapacity} -> Record
The top half of the template defines the variables I want to capture in the format Value <optional modifier> <Name> (<regex>)
.
The modifier Filldown
copies that value to every entry in that file. In this case, it isn't very useful since I'm only extracting one partition, but if I wanted every partition, Filldown
would copy the device name to the entry of every partition.
Required
is an important modifier, and it's necessary to designate at least one value as Required
. This indicates a valid entry. If that value doesn't appear, then TextFSM will skip the whole thing.
The second part of the template describes the text block as a regular expression.
Once this template is in place, all we need to do to turn a random string into a well formed list is:
with open(templateFilename, "r") as template:
re_table = textfsm.TextFSM(template)
data = re_table.ParseText(output)
Finally, I use csvwriter to push it out to a CSV file:
with open(filename, 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in data:
writer.writerow(row)
csvfile.close()
Deploy an Endace Hotfix
There are four basic steps to applying most hotfixes with Endace. First, the hotfix file needs to find its way to the device. Second, it needs to go into maintenance mode. Next, the hotfix is installed. Finally, it gets rebooted.
I use pysftp to get the hotfix file to the device:
def transfer_via_sftp(filepath, hostname, user, pw):
opts = pysftp.CnOpts()
opts.hostkeys = None
conn = pysftp.Connection(host=hostname,username=user, password=pw, cnopts = opts)
conn.chdir("/endace/packages")
conn.put(filepath) # Upload the file
# pysftp.exists() does an ls and compares the given string against the output #
if conn.exists("OSm6.4.x-CumulativeHotfix-v1.end"):
conn.close()
print(hostname + "tx completed successfully for host " + hostname)
return True
else:
conn.close()
print(hostname + "tx failed. Upload manually for host " + hostname)
return False
I didn’t format my code exceptionally well, so the next three steps are a little muddled:
def enter_maintenance_mode(connection):
connection.config_mode()
output = connection.send_command("maintenance-mode noconfirm", strip_prompt = False)
# In maintenance mode, the cli prompt looks like:
# (maintenance) <device_name> (configuration) #
if "(maintenance)" in output:
return True
else:
print(output)
print("issue obtaining maintenance mode")
return False
def install_hotfix(hotfix_name_full, hfn_installed, device):
conn = ConnectHandler(**device)
if not enter_maintenance_mode(conn): # Step 2: enter maintenance mode
print(device["host"] + "could not enter maintenance mode. Failing gracefully")
conn.close()
return False
conn.send_command("package install " + hotfix_name_full) # Step 3: install hotfix
output = conn.send_command("show packages")
if hfn_installed in output:
print(device["host"] + "successfully installed on host " + device["host"] + ". rebooting")
conn.save_config()
# send_command_timing() doesn't wait for the cli prompt to return #
conn.send_command_timing("reload") # Step 4: reboot
return True
else:
print(device["host"] + "possible issue with install. Troubleshoot host " + device["host"])
return False
pass
Executing in Parallel
The above code snippets perform all the required tasks to install the hotfix, but I want to go one step farther and execute those steps in parallel across many devices. To do this, I use the threading library:
thread_list = []
devices = EndaceDevices.all_devices
for dev in devices:
thread_list.append(threading.Thread(target=transfer_via_sftp, args=(path, dev["host"], dev['username'], dev['password'])))
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
print("File transferred everywhere")
thread_list = []
for dev in devices:
thread_list.append(threading.Thread(target=install_hotfix, args=(hf_name_full, hf_name_as_installed,dev)))
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
print("Hotfix installed everyone. Monitor for devices coming back up")
First, I create all the threads. Then, I start them. Finally, I wait for them all to return. And then I can repeat for the next function I want to parallelize.
The full script is available on Github.
Conclusion
Python, with netmiko, is a worthwhile tool to increase the efficiency of collecting information or pushing patches or configuration. If netmiko doesn’t support your device, it’s very easy to extend it. TextFSM is a very useful tool to process strings into useful, structured information.