Rebuilding TCP streams with Ruby part 1: fuzzymatch

March 11, 2008

I have undertaken the (not so small) task of attempting to use Ruby to rebuild TCP data streams. I was originally planning on using ruby-libnids, but after running into considerable trouble with dynamic library linking on OSX, I decided it’d be a good experiment to write my own.

This is not a small feat. In fact, I probably won’t ever get it working perfectly (or if I do, it certainly won’t be soon). In a series of posts, I’ll be exploring some of the development decisions, design choices and pitfalls that I run into, sort of a development journal. Why would a tool like this ever be useful? Well, if you want to do analysis on packet payloads, you certainly have to make sure you have a contiguous data segment to work on, otherwise part of the message is lost. I do, however, have a few things going for me:

  • I don’t have to do live reassembly. I can do 2-pass reassembly, because I’m only going to be analyzing pcap files. Perhaps latter I’ll add in the ability to do live analysis, but for now it’s adding complexity to a problem that’s already complex enough.
  • I will be building prototypes of different methods, each with its pros and cons, instead of having to work towards a final release, I have the flexibility to change designs with every iteration.
  • No matter what, I win. Nothing but learning can come from this project, so it still has benefits even if I do never arrive at a final product.

For the first installment, I want to talk about fuzzymatching using sequence and ackknowledgement numbers. I have released my proof-of-concept code here, but I’ll be going over it in more detail in this post:

Okay, let’s start by dumping the most simply pcap file datastream I could possibly generate (sending the word “test” using netcat):

./sdump.rb pcaps/pSmall.pcap
[1]    [....S.] 128.222.228.89 -> 128.222.228.77    seq=4679659509 ack=0 len=78
[2]    [.A..S.] 128.222.228.77 -> 128.222.228.89    seq=30782357 ack=4679659510 len=74
[3]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=66
[4]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=71
[5]    [.A...F] 128.222.228.89 -> 128.222.228.77    seq=4679659515 ack=30782358 len=66
[6]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
[7]    [.A...F] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
[8]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659516 ack=30782359 len=66

Being the most simple example, you can see that I only have 1 stream to deal with, and that the seq/ack numbers are nice enough to be where we want them to. For the first prototype, I have created a list of Stream objects (containing sequence and ack numbers), and when I get a new packet, I compare its seq/ack numbers to the numbers of streams already in the list, if it’s within a threshold (5 is my value), then it probably belongs to that stream and I add the packet to the stream.

One of the really nice things about processing packets this way is that I don’t have to worry about packet order on the first pass, if a packet is close enough, it’s added, if it isn’t, a new stream is created. Fuzzymatcher correctly identifies this pcap file as 1 stream:

./fuzzymatch.rb ../pcaps/pSmall.pcap
[1]    [....S.] 128.222.228.89 -> 128.222.228.77    seq=4679659509 ack=0 len=78
No stream found for packet, starting a new one...
[2]    [.A..S.] 128.222.228.77 -> 128.222.228.89    seq=30782357 ack=4679659510 len=74
ack num: 4679659510 close enough to 4679659509 to add.
[3]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=66
seq num: 4679659510 close enough to 4679659509 to add.
[4]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=71
seq num: 4679659510 close enough to 4679659509 to add.
[5]    [.A...F] 128.222.228.89 -> 128.222.228.77    seq=4679659515 ack=30782358 len=66
ack num: 30782358 close enough to 30782357 to add.
[6]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
seq num: 30782358 close enough to 30782357 to add.
[7]    [.A...F] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
seq num: 30782358 close enough to 30782357 to add.
[8]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659516 ack=30782359 len=66
ack num: 30782359 close enough to 30782357 to add.
Ended up with 1 stream(s).
Stream 1 contains 8 packet(s)

Now, I haven’t added any code to actually order the packets (yet), but this is a good start. Before I continue, how does fuzzymatching handle pcaps with a large amount of data?

./fuzzymatch.rb ../pcaps/data.pcap
... tons of output ...
[1684]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806823238 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1685]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806825958 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1686]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806826815 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1687]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806826815 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1688]    [.A....] 64.12.165.98 -> 192.168.1.136    seq=424631922 ack=282306986 len=54
No stream found for packet, starting a new one...
[1689]    [.A....] 192.168.1.136 -> 64.12.165.98    seq=282306986 ack=424631923 len=54
ack num: 424631923 close enough to 424631922 to add.
Ended up with 53 stream(s).
Stream 1 contains 8 packet(s)
Stream 2 contains 22 packet(s)
Stream 3 contains 8 packet(s)
Stream 4 contains 34 packet(s)
... lots more output, one for all 53 streams ...

Not bad, for a start. I think the next goal is probably ordering the streams, luckily I can do this in a second pass (or, when the stream data is accessed, cutting down on computation time unless the data is actually needed)

All my POC code can be downloaded from the RSB (Ruby StreamBuilder) project page, which will receive regular updates as I continue development.

Questions? Comments? Flames? Leave me a comment and let me know what you think ;)

You can check all the other parts of the series:

1 Comment to "Rebuilding TCP streams with Ruby part 1: fuzzymatch"

  1. :wq - blog » Blog Archive » Rebuilding TCP streams with Ruby part 2: fuzzysort wrote:

    […] Rebuilding TCP streams with Ruby part 1: fuzzymatch […]

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org