Python Web Scraping to Detect PCI Errors

Right now, I am dealing with a little bit of instability within the packet capture environment at work. It turns out that when you completely overload one of these devices, it logs a PCI error and then stops responding to everything. It can be quite high effort to recover the device, requiring in at least one case a complete rebuild of the filesystem. The situation brings up imagery of a clogged toilet. And much like a clogged toilet, even clearing it out can still leave some issues down the line. I want to determine the true root cause.

The method to do that is to force a massive amount of traffic to one of the probes in our development environment, let it choke, and react quickly (within a day) to recover the probe, grab crash data, and send it to the vendor. Crash detection is most accurately performed by monitoring for a PCI error. These are visible in the IPMI console's event log (think ILO). However, I really don't want to take the couple minutes every day to log in and check it. So, let's automate it!

Python Web Scraping Options

  • Requests - This is a core Python module that excels at HTTP interaction. I will likely be using this to perform the GETs and POSTs I need.
  • Beautiful Soup (BS4) - This module is built for decoding HTML. If the information I need is in a <div> with a specific name, BS4 is the best way to get it.
  • Selenium - This is a bit of a long shot module. If the information is embedded in Java web applet or something that I can't interact with any other way, Selenium remotely controls a web browser. It's little annoying to work with for simpler web interaction, so I really want to avoid this.

Investigation to Determine Scraping Strategy

To determine which approach to take, I like to start by performing my normal browsing with the browser's developer mode turned on. As you can see below, this bit of navigation involved 80 HTTP requests/responses.

Sloppily redacted browser window

Sloppily redacted browser window

As I look through all this information, there are a number of things that jump out as very relevant. First, I see a POST to /cgi/login.cgi. Nothing about the headers looks all that important, but the Params tab shows that there's a form with username and password. So, authentication will require me to pass a username and password to /cgi/login.cgi.

Next, I need to see how the information I care about is being delivered. One of the many POSTs to /cgi/ipmi.cgi returns a bunch of XML with timestamps that match up with the event log table. However, the actual values aren't human readable.

<?xml version=\"1.0\"?>  
<IPMI>  
    <SEL_INFO>  
        <SEL TOTAL_NUMBER=\"0011\"/>  
        <SEL TIME=\"2019/04/27 11:44:30\" SEL_RD=\"0100029e40c45c20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/12 19:01:38\" SEL_RD=\"020002124c015d33000413006fa58012\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/14 18:10:39\" SEL_RD=\"0300021fe3035d200004c8ff6fa0ffff\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/14 18:14:23\" SEL_RD=\"040002ffe3035d20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/14 19:20:16\" SEL_RD=\"05000270f3035d33000413006fa58012\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/17 18:30:51\" SEL_RD=\"0600025bdc075d200004c8ff6fa0ffff\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/17 18:34:34\" SEL_RD=\"0700023add075d20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/17 19:17:55\" SEL_RD=\"08000263e7075d33000413006fa58012\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/17 19:46:05\" SEL_RD=\"090002fded075d200004c8ff6fa0ffff\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/17 19:49:53\" SEL_RD=\"0a0002e1ee075d20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/18 13:04:42\" SEL_RD=\"0b00026ae1085d200004c8ff6fa0ffff\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/18 13:08:28\" SEL_RD=\"0c00024ce2085d20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/18 15:25:47\" SEL_RD=\"0d00027b02095d200004c8ff6fa0ffff\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/18 15:29:31\" SEL_RD=\"0e00025b03095d20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/18 17:59:54\" SEL_RD=\"0f00029a26095d200004c8ff6fa0ffff\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
        <SEL TIME=\"2019/06/18 18:03:41\" SEL_RD=\"1000027d27095d20000405516ff0ffff\" SENSOR_ID=\"Chassis Intru   \" ERTYPE=\"6f\"/>  
        <SEL TIME=\"2019/06/18 20:38:09\" SEL_RD=\"110002b14b095d33000413006fa58012\" SENSOR_ID=\"NO Sensor String\" ERTYPE=\"FF\"/>  
    </SEL_INFO> 
</IPMI>

It's also important to note that the POST includes a form that requests:

SEL_INFO.XML : (1, c0)

So, now we know how to authenticate ourselves and how to get the data we need, but we don't yet have a way to interpret it. After more searching around, the HTML file returned from a GET to /cgi/url_redirect.cgi?url_name=servh_event includes a block of Javascript code that decodes that XML. While I probably could somehow run the XML through that Javascript code, I'm just going to understand how it works and recreate the parts I need.

Based on this investigation, it looks to me like I can get everything I need only using Requests. No need to involve those other modules.

Interpreting the XML Values

Through careful analysis of the Javascript code (and a lot of Ctrl-Fs), it looks like there are two really relevant pieces of code. The first is a switch on a value called tmp_sensor_type. Apparently, if this variable has the value 0x13 that means there's a PCI error. Also, I don't know who Linda is, but I appreciate that she made this whole thing possible.

case 0x13: //Linda added PCI error

    //alert("PCI ERR message detected! value = "+ sel_traveler[10]);
    var PCI_errtype = sel_traveler[10].substr(7,1);
    PCI_errtype = "0x" + PCI_errtype;
    var bus_id = sel_traveler[10].substr(8,2);
    var fnc_id = sel_traveler[10].substr(10,2);

    sensor_name_str = "Bus" + bus_id.toString(16).toUpperCase();
    sensor_name_str += "(DevFn" + fnc_id.toString(16).toUpperCase() + ")";

    if(PCI_errtype == 0x4)
        tmp_str = "PCI PERR";
    else if(PCI_errtype == 0x5)
        tmp_str = "PCI SERR";
    else if(PCI_errtype == 0x7)
        tmp_str = "PCI-e Correctable Error";
    else if(PCI_errtype == 0x8)
        tmp_str = "PCI-e Non-Fatal Error";
    else if(PCI_errtype == 0xa)
        tmp_str = "PCI-e Fatal Error";
    else
        tmp_str = "PCI ERR";

break;

The second relevant piece of code shows how tmp_sensor_type is determined. If it were in one discrete block, that would be nice. Instead, it's spread across a few hundred lines. Tracing through the variable assignments, I can see that:

tmp_sensor_type = sel_traveler[3];

I can also see:

sel_traveler = sel_buf[j];

It also appears that:

sel_buf[i-1][3] = stype;

And if I look at how we get stype, I find a line with an extremely helpful comment:

stype = parseInt(ch[5],16) ; \ take 11th byte

I don't know anything about Javascript, but I can find the 11th byte of something. And sure enough, if I go back to SEL_RD from my XML output, the 11th byte of all the PCI errors is 0x13.

Structure of the Python Script

Now that my investigation is done, I can identify what I need my Python script to do:

  1. Get the IPMI URL or IP address from the command line.
  2. Authenticate against /cgi/login.cgi.
  3. POST to /cgi/ipmi.cgi with the appropriate form.
  4. Navigate the resulting XML to find my SEL_RD values.
  5. See if the 11th byte is 0x13.
  6. Do something.

Implementation

Get the URL from the Command Line

This is pretty easy. It doesn't matter if there's a URL or IP address in the command line, and it's the only argument I'm taking.

import sys

url = sys.argv[1]

Authenticate against /cgi/login.cgi

I'm doing a couple noteworthy things here. First, I have my authentication details in a separate file called auth.py. Second, I'm using a Requests Session, which automatically keeps track of cookies and other session information for me.

import requests
from auth import ipmi_username, ipmi_pwd

auth_form = {'name': ipmi_username, 'pwd': ipmi_pwd}

ipmi_sesh = requests.Session() ipmi_sesh.post(f'http:///cgi/login.cgi', data=auth_form)

Request the Event Log Data

I already have my HTTP session set up, so I just need to ask for what I need.

event_log_form = {'SEL_INFO.XML': '(1,c0)'}

response = ipmi_sesh.post(f'http:///cgi/ipmi.cgi', data=event_log_form)

Get SEL_RD from the Returned XML

Python includes some resources that let you interpret XML relatively easily. Each element within the hierarchy is treated as a list.

import xml.etree.ElementTree as ET

ipmi_events = ET.fromstring(response.text) events = [] for SEL_INFO in ipmi_events: for SEL in SEL_INFO: events.append(SEL.attrib)

At this point, I have a list of dictionaries, where each dictionary includes the attributes of each SEL from the XML.

for event in events[1:] # first value is total number of events. We can skip it
    print(event.get('SEL_RD'))

I'm not actively printing SEL_RD in the final script, but that's how I isolate it.

Check the 11th Byte

Slicing byte arrays and strings in Python still doesn't come intuitively to me. It took some trial and error to find the [20:22].

if event.get('SEL_RD')[20:22] == '13':

# do something</code></pre><p><strong>Do Something</strong></p>

For this example, my 'do something' is just printing the timestamp, but we can record it in a log, send an email, or do whatever else we want.

print(f'Found PCI SERR at )

Conclusion

Python web scraping is surprisingly easy, and it would be a whole lot easier if I didn't have to learn to read Javascript to make my example work.

You can check out the full script on Github.

Audit TLS Version of Devices Connecting to Your Gear with a PCAP

I had a request come across my desk one day: we want to disable SSLv3 and TLS 1.0 on a set of 20+ servers, but would rather not rely on a scream test to see who will be affected. Can you tell us who is connecting to our gear with unsafe versions of this protocol?

I did some research to see if that information might be buried in the Windows Event Log somewhere, but it isn’t. I asked if it might be in an application log somewhere, but it isn’t. I took a quick look through our monitoring systems to see if there would be an easy answer, but there isn’t. Unfortunately, the best way to perform this audit, at least with my skillset, is to perform a packet capture on each server and extract the negotiated SSL/TLS version from each session.

So, I set up a packet capture (with capture filter set to specific TCP ports and headers only) on each one of these servers and hoped that I could come up with an analysis script by the time I stopped those capture. I didn’t quite get there.

Main Points of This Post

These are the main things I wanted to do when writing this script, and the main points I want to make in this post:

  1. Using enums to avoid the dreaded “magic number” confusion

  2. Working with Binary Files in Python

  3. Being bold in the interests of high performance

Primary Structure of the Script

To help orient you (and myself), this is a brief description of how I structured the script.

First, I take in the pcap or directory of pcaps, and pass one of them into the meat of the script.

Next, I look into the global header of the file to determine the endianness of the file (thus ensuring my binary interpretations are accurate later) and to ensure the file contains Ethernet frames (to make sure my offsets work right).

I then pull a packet from the file into a byte array. I discovered that it is much faster to work with a byte array in memory than a file pointer in a multi-gigabyte capture file.

At this point, most packet dissection tools would extract all the header information from the packet and store it in some struct format. I can save a little bit of compute time by first checking to see if the packet is a TLS handshake, specifically a Client or Server hello.

If I find out that I do care about the packet, then I'll grab the IP and TCP headers, extract the relevant details (5-tuple and TLS version), and save that in a list, which gets written to a CSV file after I've gone through the entire file.

Using Enums in Python

I had an embedded systems professor in college who always stressed "NO MAGIC NUMBERS IN YOUR CODE!" If he or a TA spotted a number in your code anywhere, it was points off. This sort of script, where I'm working at a relatively low level with binary files, seemed like a good application to try avoiding hard-to-read numbers.

The strucutre in Python to do this seemed to be the Enum. My list for this script is as follows:

class Constant(IntEnum):
  ETH_HDR = 14
  IHL_MASK = int('00001111',2)
  TCP_DATA_OFFSET_MASK = int('11110000',2)
  WORDS_TO_BYTES = 4
  DATA_OFFSET_BITWISE_SHIFT = 4
  TLS_RCD_LYR_LEN_OFFSET = 3
  TLS_RCD_LYR_LEN = 5
  IP_SRC_OFFSET = 12
  IP_DST_OFFSET = 16
  IP_ADDR_LEN = 4
  TCP_DST_OFFSET = 2
  TCP_PORT_LEN = 2
  TLS_HANDSHAKE_VER_OFFSET = 9
  TLS_VER_LEN = 2

To use these, I just had to reference Constant.ETH_HDR.

It was a good idea, but there were also lines in this script where it didn't contribute that much to readability. Below is one excellent example of that, where I used a ton of Enums to perform array splicing, bitwise comparisons, bitwise shifts, and mathematical operations in a single line. And there are somehow still magic numbers in there.

tcp_len = ((int.from_bytes(packet[Constant.ETH_HDR + ip_len + 12: Constant.ETH_HDR + ip_len + 12 + 1], byteorder = 'big') & 
            Constant.TCP_DATA_OFFSET_MASK) >> Constant.DATA_OFFSET_BITWISE_SHIFT) * Constant.WORDS_TO_BYTES

Binary Files In Python

Python is surprisingly good at dealing with binary files. All you need to do is insert a b in the second argument of the open() method, and things are really nice. Example below:

file_handle = open(<file_name>,'rb+')

Once you do this, you have a pointer that you can move around however you want using the seek() method. The first argument of that method is the desired offset, and the second argument is the place you want to offset from (0 to reference the beginning of the file, and 1 to reference the current location).

A perfect example of how to use this is interpreting the headers in the PCAP file format. The global header (I used this site as a reference) has the following format:

typedef struct pcap_hdr_s {
    guint32 magic_number;   /* magic number */
    guint16 version_major;  /* major version number */
    guint16 version_minor;  /* minor version number */
    gint32  thiszone;       /* GMT to local correction */
    guint32 sigfigs;        /* accuracy of timestamps */
    guint32 snaplen;        /* max length of captured packets, in octets */
    guint32 network;        /* data link type */
} pcap_hdr_t;

From here, I care about the magic number and the data link type. The magic number tells me the endianness of the system, which is required to interpret any further data, and the data link type makes sure I'm dealing with Ethernet frames. If I'm not, all of my offsets are screwed up and I might as well abort now.

To read this information, I use the following code:

def report_global_header(file_ptr):
# assumes file_ptr is at head
# of pcap file
# returns byte order
#     big
#     little
# confirms ethernet frame
#     true if ethernet frame

magic_num = file_ptr.read(4)
magic_test = b'\xa1\xb2\xc3\xd4'
if magic_num == magic_test:
    order = 'big'
else:
    order = 'little'

file_ptr.seek(16,1)
link_layer_type = file_ptr.read(4)
ethernet = int.from_bytes(link_layer_type, byteorder = order)
is_ethernet = ethernet == 1

return order, is_ethernet

The really important bits here are the read() and seek() methods, because these move the file pointer.

I start by reading the magic number, which moves the pointer 4 bytes forward. Then I seek() 16 bytes ahead of the current pointer location, bringing me to the network variable.

I perform similar operations to interpret the packet header and read the packet to a byte array.

Being Bold for High Performance

I'll go ahead and qualify this entire section by pointing out that if I was truly trying to be bold for high performance, I would have written this in C.

For this script, I chose to not use an existing Python packet dissection tool like Scapy because of performance. Scapy, in particular, is slow and uses a ton of memory. And I wasn't able to find another pre-built tool that seemed like a significant increase in speed. I implemented my own packet dissection because I could have more control over what data I was pulling and when. I make a relatively small number of operations on packets that aren't relevant to my investigation, and I don't interpret a single piece of information that isn't necessary for the goal of the script.

By making small performance improvements over the course of development, I was able to process over 50GB of packet capture data in about 10 minutes with negligible impact on my laptop's CPU or RAM. Here are a few of the tactics I used to increase performance:

  • Pulling the entire packet into a byte array, instead of working with it in the file.
  • Reorganizing the dissection to skip over the IP and TCP headers to pull the TLS content type first
  • Push interpretation as far back in the script as possible. If I can make my comparison against the binary representation of my desired value, I save cycles.

Conclusion

I was able to develop a script (available here) that could interpret a pcap-based packet capture to audit the SSL/TLS version that other systems use to connect to it. This gave my colleague a nice list of application owners to talk into upgrading TLS before turning off support on his side. He successfully disabled support for the unsafe protocol versions on all of his servers without impacting any applications that relied on them.

In developing this script, I was able to mess with binary files and work with binary representations of data in Python, discovering that Python makes it surprisingly easy. I was also able to implement the wise advice of an old college professor and (mostly) avoid magic numbers in my code. It was a fun project!

Efficient Administration of Network Appliances

Beyond troubleshooting and packet analysis, part of my job is administration of the network monitoring tools we use to perform troubleshooting and packet analysis. One vendor we rely on quite heavily is Endace for their line of packet capture appliances. As part of my work with their gear, I’ve learned how to script things out for efficiency.

For capacity planning purposes, I’ve automated the process to obtain and format file system and packet capture information. Additionally, Endace recently released a hotfix for all of their devices, and I scripted things out to deploy it concurrently.

Overview of Endace Devices

The Endace architecture is relatively simple. There are probes distributed throughout the network that capture and store packets that are all managed by a Central Management Server (CMS).

The probes store databases full of packet data within an abstraction they call rotation files. These act like a ring buffer, refreshing the oldest 4GB block when the buffer fills up.

For most tasks on the system administration side, the CMS can distribute commands to all probes, groups of probes, or individual probes.

However, there are some cases where CMS orchestration doesn’t work, and installation of hotfixes is one of those. Hotfixes require the device to be in maintenance mode, which kills most of the client processes on the system, including the RabbitMQ client, the CMS orchestration mechanism. If I have 20+ devices on which to install a hotfix, requiring 2-3 minutes per device, not including waiting for a reboot to complete, I really don’t want to do that all in serial (as opposed to, and including, over serial) or manually. That’s where Python and netmiko come into play.

Netmiko Support for Endace

For those unfamiliar, netmiko is a Python library for network scripting. It adds a layer of abstraction on top of paramiko, the Python ssh library, to dumb things down enough for network engineers like me to work effectively. It allows us to send commands and view output without worrying about buffering input or output, figuring out when we can send more information, or writing regex to figure out if we’re in enable mode or configuration mode.

Since Endace is a relatively small vendor (as compared to Cisco or Juniper), netmiko doesn’t support it by default. Fortunately, Kirk Byers, the creator of netmiko, makes it really easy to add support for new operating systems. It took me a couple of days to build in support (available here). It took me a few more days to figure out how to make threading integrate with it to perform tasks on multiple devices in parallel.

At the end of that journey, I had a system to effectively distribute commands and collect information. For example, I wrote a quick script to extract information about the file systems and rotation files (basically PCAP files in a ring buffer made of 4GB blocks) to CSV files.

Extract Important Information

This script has 3 main parts. After initializing the CSV files it iterates through the devices:

for device in EndaceDevices.prod_probes:
    connection = ConnectHandler(**device)
    fsInfo = getFileSystemInfo(connection)
    rotInfo = getRotFileInfo(connection)
    connection.disconnect()
    print_output_to_csv(fsInfo, fscsvName, fsTemplate)
    print_output_to_csv(rotInfo, rotcsvName, rotTemplate)

First, it connects to the device and runs the relevant show commands.

def getFileSystemInfo(connection):
    connection.enable() # Note how netmiko provides a method to enter enable mode
    output = connection.find_prompt() # This is just here to output the device name
    output += "\n"
    output += connection.send_command("show files system")
    return output

def getRotFileInfo(connection):
    connection.enable()
    output = connection.find_prompt()
    output += "\n"
    output += connection.send_command("show erfstream rotation-file")
    return output

Then, it uses TextFSM to parse the output. TextFSM is a Python library developed by Google to throw regular expressions into a template whenever you have a block of text that will appear consistently. For example, the file systems template looks like:

Value Filldown Device (\S+)
Value Required TotalCapacity (\d+.?\d*)
Value UsedCapacity (\d+.?\d*)
Value UsableCapacity (\d+.?\d*)
Value FreeCapacity (\d+.?\d*)

Start
  ^${Device} #
  ^\s+Bytes Total\s+${TotalCapacity}
  ^\s+Bytes Used\s+${UsedCapacity}
  ^\s+Bytes Usable\s+${UsableCapacity}
  ^\s+Bytes Free\s+${FreeCapacity} -> Record

The top half of the template defines the variables I want to capture in the format Value <optional modifier> <Name> (<regex>).

The modifier Filldown copies that value to every entry in that file. In this case, it isn't very useful since I'm only extracting one partition, but if I wanted every partition, Filldown would copy the device name to the entry of every partition.

Required is an important modifier, and it's necessary to designate at least one value as Required. This indicates a valid entry. If that value doesn't appear, then TextFSM will skip the whole thing.

The second part of the template describes the text block as a regular expression.

Once this template is in place, all we need to do to turn a random string into a well formed list is:

with open(templateFilename, "r") as template:
    re_table = textfsm.TextFSM(template)
data = re_table.ParseText(output)

Finally, I use csvwriter to push it out to a CSV file:

with open(filename, 'a', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in data:
        writer.writerow(row)
    csvfile.close()

Deploy an Endace Hotfix

There are four basic steps to applying most hotfixes with Endace. First, the hotfix file needs to find its way to the device. Second, it needs to go into maintenance mode. Next, the hotfix is installed. Finally, it gets rebooted.

I use pysftp to get the hotfix file to the device:

def transfer_via_sftp(filepath, hostname, user, pw):
    opts = pysftp.CnOpts()
    opts.hostkeys = None
    conn = pysftp.Connection(host=hostname,username=user, password=pw, cnopts = opts)
    conn.chdir("/endace/packages")
    conn.put(filepath) # Upload the file
    # pysftp.exists() does an ls and compares the given string against the output #
    if conn.exists("OSm6.4.x-CumulativeHotfix-v1.end"): 
        conn.close()
        print(hostname + "tx completed successfully for host " + hostname)
        return True
    else:
        conn.close()
        print(hostname + "tx failed. Upload manually for host " + hostname)
        return False

I didn’t format my code exceptionally well, so the next three steps are a little muddled:

def enter_maintenance_mode(connection):
    connection.config_mode()
    output = connection.send_command("maintenance-mode noconfirm", strip_prompt = False)
    # In maintenance mode, the cli prompt looks like:
    # (maintenance) <device_name> (configuration) #
    if "(maintenance)" in output:
        return True
    else:
        print(output)
        print("issue obtaining maintenance mode")
        return False

def install_hotfix(hotfix_name_full, hfn_installed, device):
    conn = ConnectHandler(**device)
    if not enter_maintenance_mode(conn): # Step 2: enter maintenance mode
        print(device["host"] + "could not enter maintenance mode. Failing gracefully")
        conn.close()
        return False
    conn.send_command("package install " + hotfix_name_full) # Step 3: install hotfix
    output = conn.send_command("show packages")
    if hfn_installed in output:
        print(device["host"] + "successfully installed on host " + device["host"] + ". rebooting")
        conn.save_config()
        # send_command_timing() doesn't wait for the cli prompt to return #
        conn.send_command_timing("reload") # Step 4: reboot
        return True
    else:
        print(device["host"] + "possible issue with install. Troubleshoot host " + device["host"])
        return False
    pass

Executing in Parallel

The above code snippets perform all the required tasks to install the hotfix, but I want to go one step farther and execute those steps in parallel across many devices. To do this, I use the threading library:

thread_list = []
devices = EndaceDevices.all_devices

for dev in devices:
    thread_list.append(threading.Thread(target=transfer_via_sftp, args=(path, dev["host"], dev['username'], dev['password'])))

for thread in thread_list:
    thread.start()

for thread in thread_list:
    thread.join()

print("File transferred everywhere")
thread_list = []

for dev in devices:
    thread_list.append(threading.Thread(target=install_hotfix, args=(hf_name_full, hf_name_as_installed,dev)))

for thread in thread_list:
    thread.start()

for thread in thread_list:
    thread.join()

print("Hotfix installed everyone. Monitor for devices coming back up")

First, I create all the threads. Then, I start them. Finally, I wait for them all to return. And then I can repeat for the next function I want to parallelize.

The full script is available on Github.

Conclusion

Python, with netmiko, is a worthwhile tool to increase the efficiency of collecting information or pushing patches or configuration. If netmiko doesn’t support your device, it’s very easy to extend it. TextFSM is a very useful tool to process strings into useful, structured information.

Fast Filtering of Load Balanced or Proxied Connections

A common problem exists when investigating network issues through packet analysis. Web scale architecture provides ways to distribute workloads, and those mechanisms tend to abstract away silly things like the 5-tuple frequently used to identify individual conversations. Both proxies and load balancers can obscure a client’s IP address and source TCP port, and that makes it difficult to isolate a specific conversation across multiple networking devices.

The goal of this post is to describe an automated way to see if traffic is making it past the proxy, which can be extended to isolate the 5-tuple for the part of the conversation behind the proxy (source IP, source port, destination IP, destination port, TCP/UDP).

Architecture for this Scenario

To demonstrate how this can happen, a simple sample architecture can be used that shows a basic load balanced website.

Sample architecture depicting a user, a load balancer, and a pair of web servers

Sample architecture depicting a user, a load balancer, and a pair of web servers

This assumes a few configuration options:

  1. The client connects via HTTPS to the load balancer.
  2. The load balancer performs SSL offload and the traffic within the data center is unencrypted.
  3. The load balancer replaces the client's IP address with it's own Virtual IP Address (VIP).

These conditions typically mean that the load balancer will pass along a client's IP address within an X-Forwarded-For field within the HTTP header. This means that if we have firewall or load balancer logs that show incoming connections, or we have customers calling in who can give us their IP address, it is possible to isolate their traffic behind the load balancer.

Significance of a Novel Technique

If we can go through and find the traffic, why am I even writing this post?

Most packet analysis tools (including Wireshark, Endace, and some of the various Riverbed tools), are really good at quickly filtering through protocol headers because they’re optimized for parsing that binary data. They are less efficient at filtering through an ASCII-formatted HTTP header. Even Scapy, which is a Python library made for messing with packets, isn’t well optimized for looking through packet payloads.

Proposed Technique

For this task, it turns out that simplicity is key. To begin, we need a known client IP address and a broad packet capture behind the load balancer or proxy (it is really easy for a 5-10 minute packet capture to reach 100+ GB in an enterprise environment).

Step 1: Confirm the traffic exists

It doesn’t help to expend time and compute power searching through the details of a massive packet capture file if we don’t know the traffic under investigation even exists. I’ve found that the fastest way to do so is to ignore all the protocol headers. It takes time to process binary, especially if Scapy is doing it, so we can save a lot of time by reading in the PCAP as a regular text file.

To save on system memory (because I don’t have 100GB of RAM in my computer), we read line-by-line.

with open(pcap_fn, mode="rb") as pcap:
    print "PCAP Loaded"
    for line in pcap:
        iterate_basic_ip_check(line,target_ip)

In the HTTP header, the different fields are delimited by a newline character, so the X-Forwarded-For field we’re looking for appears in it’s own line using this technique, which allows us to match an ip address with some really simple regex.

def iterate_basic_ip_check(line, target):
    match = re.match('X-Forwarded-For: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})',line)
    if match:
        if match.group(1) == target:
            print "Target IP %s found" % target

All put together, this script (available on Github) runs blazingly fast. I estimated 200+ MB/sec on my machine, and it’s possible to parallelize this workload to take advantage of multiple cores.

Step 2: Isolate the 5-tuple

Once we know the correct traffic exists, we can re-iterate using Scapy to identify the 5-tuple, or potentially multiple 5-tuples, used. This is left as an exercise for the reader (you can thank my engineering textbooks for teaching me this wonderful and horribly frustrating phrase).

Conclusion

If you find yourself in the position where an expected IP address is disappearing behind a proxy or load balancer, it is possible to process a fairly large amount of data to isolate the conversation in the next segment of the network as long as the HTTP header is exposed.