Bonus Homework: A Network Snarf Tool

Due date: May. 6, 2019 8 am

This is a bonus homework for extra credit. It counts toward the 50% homework course credit, as much as other homeworks. However if you already completed the previous 6 assignments successfully, doing this one will not provide any further credit toward the course grade.

Using our experience with raw packet reception from Homework 6, we build a tool for automatically extracting documents from overheard network traffic.

Basic libpcap setup

For reproducibility, we will capture data with tcpdump into a file, then replay that data using our tool. First, we record a trace:

tcpdump -s 0 -i interface -w file.dat

Then, using the tool you are building, we analyze the trace:

./hw7 -i file.dat --local-ip 192.168.2.100

tcpdump uses libpcap to capture, store and replay packets, so we'll use that too. The --local-ip parameter specifies the IP address of the local host in the trace. When run with no other parameters, have hw7 output simple statistics about the file: for each remote IP address, the number of packets and bytes, sent and received, as well as total number of packets and bytes, sent and received.

TCP stream reassembly

When run like this:

./hw7 -i file.dat -r

your program should produce, in a subdirectory called "tcpstreams", all of the complete TCP streams present in the trace. Do not write out streams that do not contain all the necessary packets, from SYN to FIN. Name the streams by their TCP flow identifier.

Application level analysis

The intent of this program is to extract interesting data from the packet capture. To do this, we need to go deeper than the transport layer.

images from HTTP

Given the parameter "--images", automatically extract any images from HTTP transfers. Store all images from the trace in a subdirectory called images. Name them by their URL, thus: www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png

hostnames from DNS

Given the parameter "--dns", return a list of of hostnames for which a DNS lookup request was captured. You may find the template code from the DNS homework handy in decoding the DNS names in such packets. DNS is using UDP on port 53.

emails from SMTP

This one is a bit of a stretch on a client host, but let's not restrict ourselves to life in a virtual machine. Given the parameter "--email", output all captured emails in in a subdirectory named "email". SMTP uses port 25. Name the files by date, originator, recipient thus:

201901140315-president@whitehouse.gov-jakob@uic.edu

This is using yyyyMMddhhmm date format. Use a different date format if you prefer. To test this, you will need to generate some plaintext SMTP traffic. Your homework 1 solution may be handy for this.

passwords from text

Finally, given the "--passwords" command line argument, your tool should output the 30 bytes immediately before and 30 bytes immediately after the word "password" or "passwd", or "secret", in any capitalization, in any TCP flow. You will want to use the regular expression functionality in regex.h (libc) for this: https://www.gnu.org/software/libc/manual/html_node/Regular-Expressions.html#Regular-Expressions 

turn-in

Turn in your homework using this Github Classroom link.  As always, double check using a fresh clone, to make sure that you've pushed all the necessary files so that the command "make" in the turn-in folder produces an executable named "hw7", that performs the functions described above.