Bonus Homework: A Network Snarf Tool
This is a bonus homework for extra credit. It counts toward the 50% homework course credit, as much as other homeworks. However if you already completed the previous 6 assignments successfully, doing this one will not provide any further credit toward the course grade.
Using our experience with raw packet reception from Homework 6, we build a tool for automatically extracting documents from overheard network traffic.
Basic libpcap setup
For reproducibility, we will capture data with tcpdump into a file, then replay that data using our tool. First, we record a trace:
tcpdump -s 0 -i interface -w file.dat
Then, using the tool you are building, we analyze the trace:
./hw7 -i file.dat --local-ip 192.168.2.100
tcpdump uses libpcap to capture, store and replay packets, so we'll use that too. The --local-ip parameter specifies the IP address of the local host in the trace. When run with no other parameters, have hw7 output simple statistics about the file: for each remote IP address, the number of packets and bytes, sent and received, as well as total number of packets and bytes, sent and received.
TCP stream reassembly
When run like this:
./hw7 -i file.dat -r
your program should produce, in a subdirectory called "tcpstreams", all of the complete TCP streams present in the trace. Do not write out streams that do not contain all the necessary packets, from SYN to FIN. Name the streams by their TCP flow identifier.
Application level analysis
The intent of this program is to extract interesting data from the packet capture. To do this, we need to go deeper than the transport layer.
images from HTTP
Given the parameter "--images", automatically extract any images from HTTP transfers. Store all images from the trace in a subdirectory called images. Name them by their URL, thus: www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png
hostnames from DNS
Given the parameter "--dns", return a list of of hostnames for which a DNS lookup request was captured. You may find the template code from the DNS homework handy in decoding the DNS names in such packets. DNS is using UDP on port 53.
emails from SMTP
This one is a bit of a stretch on a client host, but let's not restrict ourselves to life in a virtual machine. Given the parameter "--email", output all captured emails in in a subdirectory named "email". SMTP uses port 25. Name the files by date, originator, recipient thus:
201901140315-president@whitehouse.gov-jakob@uic.edu
This is using yyyyMMddhhmm date format. Use a different date format if you prefer. To test this, you will need to generate some plaintext SMTP traffic. Your homework 1 solution may be handy for this.
passwords from text
Finally, given the "--passwords" command line argument, your tool should output the 30 bytes immediately before and 30 bytes immediately after the word "password" or "passwd", or "secret", in any capitalization, in any TCP flow. You will want to use the regular expression functionality in regex.h (libc) for this: https://www.gnu.org/software/libc/manual/html_node/Regular-Expressions.html#Regular-Expressions
turn-in
Turn in your homework using this Github Classroom link. As always, double check using a fresh clone, to make sure that you've pushed all the necessary files so that the command "make" in the turn-in folder produces an executable named "hw7", that performs the functions described above.