:wq - blog » rebuild http://writequit.org/blog Tu fui, ego eris Mon, 22 Dec 2014 14:54:59 +0000 en-US hourly 1 http://wordpress.org/?v=4.1.5 Rebuilding TCP streams with Ruby part 2: fuzzysort http://writequit.org/blog/2008/03/19/rebuilding-tcp-streams-with-ruby-part-2-fuzzysort/ http://writequit.org/blog/2008/03/19/rebuilding-tcp-streams-with-ruby-part-2-fuzzysort/#comments Wed, 19 Mar 2008 23:57:24 +0000 http://writequit.org/blog/?p=157 This is part 2 of a series on rebuilding TCP streams using Ruby, for more information, visit the previous post:

In my previous post, I talked about using fuzzy sequence/acknowledge numbers to split a network capture file into streams. Using fuzzymatch was pretty successful for cutting streams out, but the streams themselves were not ordered. This version of the Ruby StreamBuilder library orders the streams by using increasing seq/ack numbers, or as I like to call it “fuzzysort”. In order to do this, fuzzysort first splits the stream into a “source” stream and a “destination” stream. After spliting the streams, the streams are ordered in ascending acknowledgement order, where if there are duplicate acks, the ascending sequence numbers are used. The streams are then printed in ordered fashion (since this is just proof of concept)

You can download the code for fuzzymatch-sort here.

It’s interesting to note that because of implementing the ordered streams in a hash using the seq/ack as the key, the list does not handle duplicate packets. I added some logic so that a large data packet is not replaced by a simple ack packet with the same numbers, so the streams should still have the correct data after being ordered.

Here’s an example of running fuzzymatch-sort.rb on a randomized single-stream pcap file:

shell> ./fuzzymatch-sort.rb ../pcaps/pLargeRand.pcap
[1]    [....S.] 128.222.228.89 -> 128.222.228.77    seq=638858703 ack=0 len=78
Starting a new stream...
[2]    [.A..S.] 128.222.228.77 -> 128.222.228.89    seq=2849933258 ack=638858704 len=74
ack num: 638858704 close enough to 638858703 to add. Had to check 1 streams and 2 seq/ack nums
[3]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=638862706 ack=2849933260 len=66
ack num: 2849933260 close enough to 2849933258 to add. Had to check 1 streams and 2 seq/ack nums
[4]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=638861176 ack=2849933259 len=666
ack num: 2849933259 close enough to 2849933260 to add. Had to check 1 streams and 1 seq/ack nums
[5]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=66
ack num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 1 seq/ack nums
[6]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861776 len=66
seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 1 seq/ack nums
[7]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=638859728 ack=2849933259 len=1514
ack num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 2 seq/ack nums
[8]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66
seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums
[9]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=1090
ack num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums
[10]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861176 len=66
seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums
[11]    [.A...F] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66
seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums
Ended up with 1 stream(s).

==> Stream 1 contains 11 packet(s)
--> Sorting stream 1...
[1]    [....S.] 128.222.228.89 -> 128.222.228.77    seq=638858703 ack=0 len=78
[2]    [.A..S.] 128.222.228.77 -> 128.222.228.89    seq=2849933258 ack=638858704 len=74
[3]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=638862706 ack=2849933260 len=66
[4]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=638861176 ack=2849933259 len=666
[5]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=66
[6]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861776 len=66
[7]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=638859728 ack=2849933259 len=1514
[8]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66
[9]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=1090
[10]   [.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861176 len=66
[11]   [.A...F] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66

==== Unsorted Streams ====

Source
-------
[....S.] 128.222.228.89 -> 128.222.228.77    seq=638858703 ack=0 len=78
[.A....] 128.222.228.89 -> 128.222.228.77    seq=638862706 ack=2849933260 len=66
[.AP...] 128.222.228.89 -> 128.222.228.77    seq=638861176 ack=2849933259 len=666
[.A....] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=66
[.A....] 128.222.228.89 -> 128.222.228.77    seq=638859728 ack=2849933259 len=1514
[.AP...] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=1090
Dest
-------
[.A..S.] 128.222.228.77 -> 128.222.228.89    seq=2849933258 ack=638858704 len=74
[.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861776 len=66
[.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66
[.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861176 len=66
[.A...F] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66

==== Sorted Streams ====

Source
-------
[....S.] 128.222.228.89 -> 128.222.228.77    seq=638858703 ack=0 len=78
[.AP...] 128.222.228.89 -> 128.222.228.77    seq=638858704 ack=2849933259 len=1090
[.A....] 128.222.228.89 -> 128.222.228.77    seq=638859728 ack=2849933259 len=1514
[.AP...] 128.222.228.89 -> 128.222.228.77    seq=638861176 ack=2849933259 len=666
[.A....] 128.222.228.89 -> 128.222.228.77    seq=638862706 ack=2849933260 len=66
Dest
-------
[.A..S.] 128.222.228.77 -> 128.222.228.89    seq=2849933258 ack=638858704 len=74
[.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861176 len=66
[.A....] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638861776 len=66
[.A...F] 128.222.228.77 -> 128.222.228.89    seq=2849933259 ack=638862706 len=66

Another addition in this version is that the fuzzymatch algorithm that is used to generate streams has been optimized (mostly because I was doing it stupidly last time) so that seq/ack searches don’t take nearly as long for large pcap files. You can download an example of fuzzysort output from a pcap file with 5 streams here.

While this is a step up from the unsorted fuzzymatch from the last post, this method still has its downfalls, not being able to store duplicate seq/ack packets could definitely cause problems. The ability to generate a sorted stream without having to split the stream into source and destination streams would also be extremely useful (so you could see a full transmission stream from both sides).

The next evolution of the library would be to actually keep a state table and follow TCP streams using the seq/ack numbers (which I attempted for this version, but it was extremely complex, so I scraped it and did fuzzysort). Hopefully I’ll be able to implement it without any problems. I might take the easy route and go for using TCP TSVal to attempt to order streams.

Comments? Criticism? Leave a comment and let me know :)

]]>
http://writequit.org/blog/2008/03/19/rebuilding-tcp-streams-with-ruby-part-2-fuzzysort/feed/ 1
Rebuilding TCP streams with Ruby part 1: fuzzymatch http://writequit.org/blog/2008/03/11/rebuilding-tcp-streams-with-ruby-part-1-fuzzymatch/ http://writequit.org/blog/2008/03/11/rebuilding-tcp-streams-with-ruby-part-1-fuzzymatch/#comments Wed, 12 Mar 2008 02:45:45 +0000 http://writequit.org/blog/?p=153 I have undertaken the (not so small) task of attempting to use Ruby to rebuild TCP data streams. I was originally planning on using ruby-libnids, but after running into considerable trouble with dynamic library linking on OSX, I decided it’d be a good experiment to write my own.

This is not a small feat. In fact, I probably won’t ever get it working perfectly (or if I do, it certainly won’t be soon). In a series of posts, I’ll be exploring some of the development decisions, design choices and pitfalls that I run into, sort of a development journal. Why would a tool like this ever be useful? Well, if you want to do analysis on packet payloads, you certainly have to make sure you have a contiguous data segment to work on, otherwise part of the message is lost. I do, however, have a few things going for me:

  • I don’t have to do live reassembly. I can do 2-pass reassembly, because I’m only going to be analyzing pcap files. Perhaps latter I’ll add in the ability to do live analysis, but for now it’s adding complexity to a problem that’s already complex enough.
  • I will be building prototypes of different methods, each with its pros and cons, instead of having to work towards a final release, I have the flexibility to change designs with every iteration.
  • No matter what, I win. Nothing but learning can come from this project, so it still has benefits even if I do never arrive at a final product.

For the first installment, I want to talk about fuzzymatching using sequence and ackknowledgement numbers. I have released my proof-of-concept code here, but I’ll be going over it in more detail in this post:

Okay, let’s start by dumping the most simply pcap file datastream I could possibly generate (sending the word “test” using netcat):

./sdump.rb pcaps/pSmall.pcap
[1]    [....S.] 128.222.228.89 -> 128.222.228.77    seq=4679659509 ack=0 len=78
[2]    [.A..S.] 128.222.228.77 -> 128.222.228.89    seq=30782357 ack=4679659510 len=74
[3]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=66
[4]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=71
[5]    [.A...F] 128.222.228.89 -> 128.222.228.77    seq=4679659515 ack=30782358 len=66
[6]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
[7]    [.A...F] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
[8]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659516 ack=30782359 len=66

Being the most simple example, you can see that I only have 1 stream to deal with, and that the seq/ack numbers are nice enough to be where we want them to. For the first prototype, I have created a list of Stream objects (containing sequence and ack numbers), and when I get a new packet, I compare its seq/ack numbers to the numbers of streams already in the list, if it’s within a threshold (5 is my value), then it probably belongs to that stream and I add the packet to the stream.

One of the really nice things about processing packets this way is that I don’t have to worry about packet order on the first pass, if a packet is close enough, it’s added, if it isn’t, a new stream is created. Fuzzymatcher correctly identifies this pcap file as 1 stream:

./fuzzymatch.rb ../pcaps/pSmall.pcap
[1]    [....S.] 128.222.228.89 -> 128.222.228.77    seq=4679659509 ack=0 len=78
No stream found for packet, starting a new one...
[2]    [.A..S.] 128.222.228.77 -> 128.222.228.89    seq=30782357 ack=4679659510 len=74
ack num: 4679659510 close enough to 4679659509 to add.
[3]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=66
seq num: 4679659510 close enough to 4679659509 to add.
[4]    [.AP...] 128.222.228.89 -> 128.222.228.77    seq=4679659510 ack=30782358 len=71
seq num: 4679659510 close enough to 4679659509 to add.
[5]    [.A...F] 128.222.228.89 -> 128.222.228.77    seq=4679659515 ack=30782358 len=66
ack num: 30782358 close enough to 30782357 to add.
[6]    [.A....] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
seq num: 30782358 close enough to 30782357 to add.
[7]    [.A...F] 128.222.228.77 -> 128.222.228.89    seq=30782358 ack=4679659516 len=66
seq num: 30782358 close enough to 30782357 to add.
[8]    [.A....] 128.222.228.89 -> 128.222.228.77    seq=4679659516 ack=30782359 len=66
ack num: 30782359 close enough to 30782357 to add.
Ended up with 1 stream(s).
Stream 1 contains 8 packet(s)

Now, I haven’t added any code to actually order the packets (yet), but this is a good start. Before I continue, how does fuzzymatching handle pcaps with a large amount of data?

./fuzzymatch.rb ../pcaps/data.pcap
... tons of output ...
[1684]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806823238 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1685]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806825958 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1686]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806826815 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1687]    [.A....] 192.168.1.123 -> 64.12.28.76    seq=6079164989 ack=806826815 len=54
seq num: 6079164989 close enough to 6079164989 to add.
[1688]    [.A....] 64.12.165.98 -> 192.168.1.136    seq=424631922 ack=282306986 len=54
No stream found for packet, starting a new one...
[1689]    [.A....] 192.168.1.136 -> 64.12.165.98    seq=282306986 ack=424631923 len=54
ack num: 424631923 close enough to 424631922 to add.
Ended up with 53 stream(s).
Stream 1 contains 8 packet(s)
Stream 2 contains 22 packet(s)
Stream 3 contains 8 packet(s)
Stream 4 contains 34 packet(s)
... lots more output, one for all 53 streams ...

Not bad, for a start. I think the next goal is probably ordering the streams, luckily I can do this in a second pass (or, when the stream data is accessed, cutting down on computation time unless the data is actually needed)

All my POC code can be downloaded from the RSB (Ruby StreamBuilder) project page, which will receive regular updates as I continue development.

Questions? Comments? Flames? Leave me a comment and let me know what you think ;)

You can check all the other parts of the series:

]]>
http://writequit.org/blog/2008/03/11/rebuilding-tcp-streams-with-ruby-part-1-fuzzymatch/feed/ 1