:wq - blog » ruby

Annoyance, colored diffs in perforce

Lee — Mon, 01 Feb 2010 15:40:09 +0000

Quick one-off here, let’s begin.

Git has nice colored diffs, like this:

Perforce does not:

A simple script can remedy this:

Here’s the script:

Simply drop it somewhere in your $PATH and use it like so:

p4 diff | p4c.rb

You can find it in my one-offs directory (it’s called p4c.rb). We recently switched from subversion to perforce (definitely not my choice, I pushed for git), and so far it’s awful. I am definitely not a fan. This makes it a little better.

Process pooling with a Rinda Tuplespace

Lee — Tue, 25 Aug 2009 22:28:49 +0000

I going to talk about how I’m doing the process pooling in Forkify in this post, so let’s get started.

Feel free to look up the code or fork the project from the Github Forkify project.

The case for pool forking

So, why even do pooling instead of serial forking? Let’s look at an example:

[1, 2, 6, 1, 2, 3].forkify(3) { |n| sleep(n); }

What happens in the background when this is run – forkify will fork 3 processes at a time, each one sleeping for the array’s number, so for the first pass 3 processes will be created, sleeping for 1 second, 2 seconds and 6 seconds (respectively). Forkify will then do a Process.waitpid() on all 3 of the processes.

It will end up waiting a tiny bit over 6 seconds (because the longest process will take 6 seconds to finish).

Next, it will create 3 more processes, which will sleep for 1, 2 and 3 seconds respectively.

This will take a bit over 3 seconds.

So, the total time will end up being a little over 9 (6 + 3) seconds (the extra is overhead). Seen here:

% time ruby -I lib -rforkify -e '[1, 2, 6, 1, 2, 3].forkify(3) { |n| sleep(n); }' 0.08s user 0.10s system 1% cpu 9.131 total

How it works with a pool

Let’s take the same example and see how it works for pool forking:

[1, 2, 6, 1, 2, 3].forkify(:procs => 3, :method => :pool) { |n| sleep(n); }

First, 3 processes will be created to perform duties, they will be given 1, 2 and 6 seconds to sleep for. The parent thread will immediately wait for them to finish. The first process will sleep for 1 second, check for work, grab another ‘1’ from the Tuplespace and start doing the work immediately. This will continue with the second process (which will finish after 2 seconds and grab more “work to be done” from the Tuplespace).

This ends up taking less time because you don’t block waiting for 1 process that takes a tremendous amount of time while the others have been finished for a while:

% time ruby -I lib -rforkify -e '[1, 2, 6, 1, 2, 3].forkify(:procs => 3, :method => :pool) { |n| sleep(n); }' 0.12s user 0.11s system 2% cpu 8.380 total

Not a lot of difference huh? A lot of the different has to do with the overhead for creating a Tuplespace and sharing it between processes, here’s a better example (pool first, then serial):

% time ruby -I lib -rforkify -e '[1, 1, 6, 1, 5, 1].forkify(:procs => 3, :method => :pool) { |n| sleep(n); }' 0.12s user 0.10s system 3% cpu 7.157 total

% time ruby -I lib -rforkify -e '[1, 1, 6, 1, 5, 1].forkify(3) { |n| sleep(n); }' 0.08s user 0.10s system 1% cpu 11.112 total

Imagine if the array had 100 items in it, but only a few took a really long time to process, as the differences increase, pool forking makes more sense to do.

How it works in the code

The Parent’s job:

(This is extremely stripped down to illustrate what’s happening)

The key here is to make sure that all the work items are in the TupleSpace before forking any children, so they don’t pull while we’re in the middle of writing and mess up the order, this is why I write all the tuples before starting the DRb service with the TupleSpace object.

After starting the children, I only need to wait on 3 pids, then I retrieve the results from the same TupleSpace in the correct order (regardless of the order they were added in)

The Child’s job:

(Again stripped down to essentials)

The interesting parts of this are the TupleSpaceProxy and being able to write objects to a TupleSpace without any kind of mutex. Rinda’s TupleSpaces take care of doing operations multi thread/process-safely, so I don’t have to worry about synchronizing the writing/taking of items out of the TupleSpace.

The child pulls items out of the TupleSpace, calls the block on them and writes the result tuple back into the same TupleSpace, since it’s shared over DRb the parent will be able to see all of the result tuples when it grabs the results.

Are there problems with this code? Yes – hardcoding a port is a bad idea and I’ve run into some issues with deadlock if 2 pooled forkifys are run immediately after each other (even a ‘puts “foo”‘ in between them fixes the problem, which makes me crazy when debugging). I also imagine that while the forkify itself works with processes, it is (ironically) not actually thread or process safe (because of the DRb hardcoded port amongst other things).

So, does anyone have any ideas of how I can do a producer/consumer pool-forking library? Please let me know Sadly there is a criminally-small amount of Ruby Rinda documentation available online right now (most of it is for RingServers instead of TupleSpaces)

Check out the full forkify_pool method and let me know if you have any suggestions. Pool forking requires Ruby 1.9 right now because some of the Rinda stuff doesn’t work in 1.8.

How to run FastRI on Ruby 1.9.1

Lee — Tue, 07 Jul 2009 02:27:13 +0000

Alrighty, FastRI is a sweet program written by Mauricio Fernandez (of eigenclass.org, which is down currently for some reason) for doing RI (Ruby Documentation) lookups very quickly, it supports doing all sorts of neat things like running a DRb server for lookups and other cool things

Anyway, if you found this, you probably don’t want to listen to me blabber, you want to know how to get it running on Ruby 1.9.x, so I humbly present a patch!

Download the patch here (older version) version 2

Download the patch here, version 3.

First, download the FastRI tarball (version 0.3.1 is latest at the time of this writing) from rubyforge.

Here’s how to install the patch:

% tar zxf fastri-0.3.1.tar.gz % cd fastri-0.3.1 % patch -p1 < ../fastri-0.3.1-ruby1.9.1.v2.patch patching file bin/fastri-server patching file lib/fastri/ri_index.rb patching file lib/fastri/ri_service.rb patching file lib/fastri/util.rb patching file test/test_functional_ri_service.rb % sudo ruby setup.rb

And you’re done! Now you’ll need to make sure to do a ‘fastri-server -b‘ to build the cache, and from there you can use FastRI the same as it was in Ruby 1.8, see:

[hinmanm@Xanadu:~]% qri String.each_codepoint -------------------------------------------------- String#each_codepoint str.each_codepoint {|integer| block } => str From ------------------------------------------------------------------------ Passes the Integer ordinal of each character in str, also known as a codepoint when applied to Unicode strings to the given block. "hello\u0639".each_codepoint {|c| print c, ' ' } produces: 104 101 108 108 111 1593

Notes/Caveats/Disclaimers & Warnings:

This patch is a WORK IN PROGRESS, it may hang, crash, steal your wallet and/or delete files. I hacked it up in a day looking at FastRI because I really like using ‘qri’ for everything and was consistently annoyed by getting docs intended for 1.8. I assume no responsibility for these things.

Also, note that all tests don’t pass with this patch, I’m still working on it.

Feel free to leave any feedback for me

UPDATE:
New version of the patch that fixes doing a lookup on a base class instead of a method. so… version 3 right now. (link above)

UPDATE2:
I created a github project for fastri to build a gem, so you can install the gem (without patching!) using this command:

sudo gem install dakrone-fastri -s http://gems.github.com

Forkify 0.0.1 Ruby gem released, threadify with processes

Lee — Wed, 24 Jun 2009 19:02:39 +0000

I’m happy to announce the release of another tiny Ruby gem, Forkify.

Forkify is basically Ara Howard’s Threadify (minus a few features), but instead of using threads, it uses fork to create multiple processes to perform the operations on an Enumerable type. It was born out of the desire to have the forkoff gem work on Ruby 1.9.1 (which it isn’t currently).

You can get Forkify using RubyGems:

sudo gem install forkify

Using it is super-simple also:

require 'forkify' require 'pp' # optional, just for this example r = [1, 2, 3].forkify { |n| n*2 } pp r

=> [2, 4, 6]

Another example demonstrating the time-saving feature of using forks:

[3:hinmanm@Xanadu:~]% time ruby -rforkify -e “5.times.forkify { |n| sleep(1) }”

ruby -rforkify -e “5.times.forkify { |n| sleep(1) }” 0.02s user 0.03s system 5% cpu 1.030 total

[3:hinmanm@Xanadu:~]% time ruby -r forkify -e "5.times.forkify { |n| sleep(1) }" ruby -r forkify -e "5.times.forkify { |n| sleep(1) }" 0.02s user 0.03s system 5% cpu 1.030 total

You can specify the maximum number of processes to spawn with an argument to forkify, the default is 5:

[1, 2, 3, 4, 5, 6].forkify(3) { |n| n**2 } # This will spawn 3 processes at a time to process the 6 items

Here you can see sleeping for 5 seconds 5 times only takes slightly longer than 1 second (instead of over 5 if you hadn’t forkified it)

Forkify is available as a gem via rubygems, or you can check out the code here: http://github.com/dakrone/forkify/. It’s still beta-quality, so please let me know if you run into any issues with it.

P.S. 100th post, woo!

Nesting traps in Ruby with a quick monkeypatch

Lee — Tue, 09 Jun 2009 16:42:55 +0000

If you end up messing with traps a lot in Ruby, you’ll find that they don’t unbind when leaving function context, so you’re either stuck ‘ensuring’ that the trap is replaced by the old trap, or just dealing with it.

Well, here’s a quick monkeypatch for adding a method that will do all the ensuring/retrapping for you when you leave a function.

Here’s the code (click for github’s colorized version):

While the syntax for calling it (t.trapr_wrap "SIGINT", Proc.new { ... }, :foo) isn’t exactly clean, if you’re calling a lot of functions and only want a trap to be active inside, it beats having to repeat undoing your trap every time.

Update: I changed the code for better readability and because I was doing it a silly way. Now it takes actual arguments instead of just ‘*args’.

Installing HyperEstraier Ruby bindings with Ruby 1.9.1 (RC1)

Lee — Fri, 09 Jan 2009 19:13:40 +0000

Lately I’ve been doing a lot of coding around an awesome database called Hyper Estraier, which allows me to create large inverted-index databases to search over very quickly.

I have also been playing around with the new Ruby RC release for version 1.9.1. In an effort to get some current code running, I found that hyperestraier doesn’t compile the native ruby bindings using 1.9.1 correctly. You’ll see this error when attempting to compile:

~/hyperestraier-1.4.13/rubynative$ make ( cd src && if ! [ -f Makefile ] ; then /usr/local/bin/ruby extconf.rb ; fi ) checking for estraier.h... yes checking for main() in -lestraier... yes creating Makefile ( cd src && make ) make[1]: Entering directory `/home/hinmanm/hyperestraier-1.4.13/rubynative/src' gcc -I. -I/usr/local/include/ruby-1.9.1/i686-linux -I/usr/local/include/ruby-1.9.1/ruby/backward -I/usr/local/include/ruby-1.9.1 -I. -DHAVE_ESTRAIER_H -D_FILE_OFFSET_BITS=64 -fPIC -I. -I.. -I../.. -I/usr/include/estraier -I/usr/include/qdbm -Wall -O2 -g -Wall -Wno-parentheses -O3 -fomit-frame-pointer -fforce-addr -o estraier.o -c estraier.c estraier.c: In function ‘doc_make_snippet’: estraier.c:354: error: ‘struct RArray’ has no member named ‘len’ estraier.c: In function ‘db_search_meta’: estraier.c:767: error: ‘struct RArray’ has no member named ‘len’ estraier.c: In function ‘objtocblist’: estraier.c:1192: error: ‘struct RArray’ has no member named ‘len’ estraier.c:1195: error: ‘struct RString’ has no member named ‘ptr’ estraier.c:1195: error: ‘struct RString’ has no member named ‘len’ estraier.c: In function ‘objtocbmap’: estraier.c:1221: error: ‘struct RArray’ has no member named ‘len’ estraier.c:1227: error: ‘struct RString’ has no member named ‘ptr’ estraier.c:1227: error: ‘struct RString’ has no member named ‘len’ estraier.c:1228: error: ‘struct RString’ has no member named ‘ptr’ estraier.c:1228: error: ‘struct RString’ has no member named ‘len’ make[1]: *** [estraier.o] Error 1 make[1]: Leaving directory `/home/hinmanm/hyperestraier-1.4.13/rubynative/src' make: *** [all] Error 2

This error comes from a not-so-great practice of using RSTRING(foo)->len and RARRAY(bar)->ptr in the code, which was removed in Ruby 1.9.1. The correct way to fix this is to make these changes:

RSTRING(foo)->len and RSTRING(foo)->ptr
becomes:
RSTRING_LEN(foo) and RSTRING_PTR(foo)

RARRAY(bar)->len and RARRAY(bar)->ptr
becomes:
RARRAY_LEN(bar) and RARRAY_PTR(bar)

For HyperEstraier, this is a pretty quick fix, there’s a link to the patch at the bottom of this post, after downloading, simply do:

[4:hinmanm@dagger:~]% cd hyperestraier-1.4.13/rubynative/src [4:hinmanm@dagger:~/hyperestraier-1.4.13/rubynative/src]% patch < ~/hyperestraier-ruby191.patch patching file estraier.c [4:hinmanm@dagger:~/hyperestraier-1.4.13/rubynative/src]% cd ../ [4:hinmanm@dagger:~/hyperestraier-1.4.13/rubynative]% ./configure && make && sudo make install

And Hyper Estraier should be installed and ready to use in ruby 1.9.1 RC1! Enjoy!

Download the patch: [hyperestraier-ruby191.patch]

NSM-Console version 0.7 release

Lee — Sun, 27 Apr 2008 18:19:34 +0000

First off, I apologize for the lack of posts here lately, I’ve been trying to come up with something good to post, because I’m just not a fan of rehashing things other blogs post, or commenting on news stories. Hopefully I’ll be able to contribute more soon

Now down to the real post, NSM-Console 0.7 has been released, there are a lot of cool features in this release, but first, go download NSM-Console!

As always, you can check out the TODO and CHANGELOG from svn.

Now, let’s cover some of the newest features in this release:

Encode/Decode enhancements
The encode and decode methods have had a few enhancements added to them, most notably, you can now specify a file to encode or decode, instead of specifying just a string, so you could do:

nsm> encode -f base64 testfile.txt Encoding ascii --> base64... Output ([]'s added to show beginning and end): [TlNNLUNvbnNvbGUgaXMgYXdlc29tZSwgeW91IHNob3VsZCB1c2UgaXQgOikK]

Also, you can specify a variety of hex encodings, because I was noticing that it was delineated in a variety of ways, \x, space (or not delineated at all). I’ve also added the default hex and binary methods, so you don’t have to specify endianness, they default to little-endian.

IP->ASN mapping
As per Scholar’s suggestion, there is now both a module and a command for translating an ip into it’s ASN, you can either use the module to get a listing from each IP in the pcap, or use the below command to get the ASN for just one address:

nsm> ip2asn 203.223.154.86 Bulk mode; whois.cymru.com [2008-04-27 17:53:32 +0000] 17992 | 203.223.154.86 | AIMS-MY-DIA-AS AIMS Data Centre

Thanks to Team Cymru for their ASN servers

‘Print’ command supports flags
The print command now supports printing TCP flags, still uses Scholar’s pcapparser library.

New command: ‘iplist’
Generate a list of the ips in a pcap file, sorted by the number of occurrences in the file, see below:

nsm> iplist === IP list for data.pcap === 192.168.1.123 1507 64.233.179.109 260 192.168.1.136 141 204.245.162.17 126 216.178.38.133 102 208.67.217.230 92 209.225.0.103 88 .. etc etc

Pipes now supported
One feature geek00l has been bugging me about is getting piping to work in the nsm shell, I’m happy to announce that they finally work, you might run into a few bugs (broken pipes with less), but for the most part they work, now output can be piped into files and programs:

nsm> p -x 1-* | less (display all the packets and hex output, piped into less) nsm> iplist > iplist.txt (output the list of ips into iplist.txt) nsm> ip2asn 203.223.154.86 >> iplist.txt (append the ip2asn output to iplist.txt)

Etc, etc, you get the idea. The ‘<‘ pipe hasn’t been implemented yet, perhaps if it’s needed in the future.

New modules, bro-ids-connection and yahsnarf
Geek00l committed his bro-ids-connection module for generating only connection information from a pcap, a yahsnarf module was also committed, to enable extract yahoo IM conversations from a pcap file. Thanks geek00l!

Automatic updating of NSM-Console
Users desiring to be on the bleeding edge of NSM-Console development (is there anyone that actually desires this? :P) can now use the “update” command from within NSM-Console to automatically update from the latest subversion commit. You can also use the -v for verbose output, see below:

nsm> update -v Updating NSM-Console from svn... Fetching newest revision from svn... etc, etc

Still a few kinks to work out, but should work pretty well.

Bugfixes
I fixed some bugs related to gzip’d pcap files as well as some bugs in the encode and decode methods. I also introduced some bugs (hurray!) with pipes, but it’s still usable.

Like I always say, check out the full TODO and CHANGELOG for complete details, and send me any feedback you have

Yahsnarf – Sniff Yahoo IM conversations

Lee — Thu, 03 Apr 2008 16:19:42 +0000

Remember way back, when I released Aimsnarf? Well, it turns out that people were interested in one for Yahoo IM, so I’m happy to present Yahsnarf, the Yahoo messenger sniffing script.

You can download the script on the yahsnarf project page.

Yahsnarf requires Ruby, ruby-pcap and bit-struct (Thanks Matasano for introducing me to bit-struct, made this script take about 1/4rd the time to write)

I’m also currently working on an NSM-Console module for Yahsnarf.

This script is a little different than Aimsnarf, mostly because Aimsnarf was the first program I ever wrote in Ruby, so it tended to be just a little rusty, without the best design practices. For one, Yahsnarf is way smaller than Aimsnarf (70 lines to around 150), and Yahsnarf follows an object-oriented design. Enough of that, here’s what you can expect to see:

shell> sudo ./yahsnarf.rb -i en1 Use '-h' to display usage Capture/Decoding... buddy1 --> buddy2: This is a test of yahsnarf buddy2 --> buddy1: A test this is of yahsnarf; it's awesome! buddy1 --> buddy2: thanks for the help :)

You can also use ./yahsnarf.rb -r to read and extract from a network capture file.

Pretty simple eh? Replace buddy1 and buddy2 with the screen names of the conversationalists. There are a few issues I’m still working out, like usernames not always showing up (they could for the most part). Also, this obviously does not work on encrypted messages (OTR or otherwise), so if you value your privacy, use encryption.

Remember, don’t ever say anything over IM that you wouldn’t mind the world knowing, you never know who could be listening in

In conclusion, I’d also like to thank Yahoo, for making their protocol so much less of a pain to decode than AOL’s.

Rebuilding TCP streams with Ruby part 2: fuzzysort

Lee — Wed, 19 Mar 2008 23:57:24 +0000

This is part 2 of a series on rebuilding TCP streams using Ruby, for more information, visit the previous post:

Rebuilding TCP streams with Ruby part 1: fuzzymatch

In my previous post, I talked about using fuzzy sequence/acknowledge numbers to split a network capture file into streams. Using fuzzymatch was pretty successful for cutting streams out, but the streams themselves were not ordered. This version of the Ruby StreamBuilder library orders the streams by using increasing seq/ack numbers, or as I like to call it “fuzzysort”. In order to do this, fuzzysort first splits the stream into a “source” stream and a “destination” stream. After spliting the streams, the streams are ordered in ascending acknowledgement order, where if there are duplicate acks, the ascending sequence numbers are used. The streams are then printed in ordered fashion (since this is just proof of concept)

You can download the code for fuzzymatch-sort here.

It’s interesting to note that because of implementing the ordered streams in a hash using the seq/ack as the key, the list does not handle duplicate packets. I added some logic so that a large data packet is not replaced by a simple ack packet with the same numbers, so the streams should still have the correct data after being ordered.

Here’s an example of running fuzzymatch-sort.rb on a randomized single-stream pcap file:

shell> ./fuzzymatch-sort.rb ../pcaps/pLargeRand.pcap [1] [....S.] 128.222.228.89 -> 128.222.228.77 seq=638858703 ack=0 len=78 Starting a new stream... [2] [.A..S.] 128.222.228.77 -> 128.222.228.89 seq=2849933258 ack=638858704 len=74 ack num: 638858704 close enough to 638858703 to add. Had to check 1 streams and 2 seq/ack nums [3] [.A....] 128.222.228.89 -> 128.222.228.77 seq=638862706 ack=2849933260 len=66 ack num: 2849933260 close enough to 2849933258 to add. Had to check 1 streams and 2 seq/ack nums [4] [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638861176 ack=2849933259 len=666 ack num: 2849933259 close enough to 2849933260 to add. Had to check 1 streams and 1 seq/ack nums [5] [.A....] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=66 ack num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 1 seq/ack nums [6] [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861776 len=66 seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 1 seq/ack nums [7] [.A....] 128.222.228.89 -> 128.222.228.77 seq=638859728 ack=2849933259 len=1514 ack num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 2 seq/ack nums [8] [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66 seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums [9] [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=1090 ack num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums [10] [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861176 len=66 seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums [11] [.A...F] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66 seq num: 2849933259 close enough to 2849933259 to add. Had to check 1 streams and 3 seq/ack nums Ended up with 1 stream(s).

==> Stream 1 contains 11 packet(s) --> Sorting stream 1... [1] [....S.] 128.222.228.89 -> 128.222.228.77 seq=638858703 ack=0 len=78 [2] [.A..S.] 128.222.228.77 -> 128.222.228.89 seq=2849933258 ack=638858704 len=74 [3] [.A....] 128.222.228.89 -> 128.222.228.77 seq=638862706 ack=2849933260 len=66 [4] [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638861176 ack=2849933259 len=666 [5] [.A....] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=66 [6] [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861776 len=66 [7] [.A....] 128.222.228.89 -> 128.222.228.77 seq=638859728 ack=2849933259 len=1514 [8] [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66 [9] [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=1090 [10] [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861176 len=66 [11] [.A...F] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66

==== Unsorted Streams ====

Source ------- [....S.] 128.222.228.89 -> 128.222.228.77 seq=638858703 ack=0 len=78 [.A....] 128.222.228.89 -> 128.222.228.77 seq=638862706 ack=2849933260 len=66 [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638861176 ack=2849933259 len=666 [.A....] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=66 [.A....] 128.222.228.89 -> 128.222.228.77 seq=638859728 ack=2849933259 len=1514 [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=1090 Dest ------- [.A..S.] 128.222.228.77 -> 128.222.228.89 seq=2849933258 ack=638858704 len=74 [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861776 len=66 [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66 [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861176 len=66 [.A...F] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66

==== Sorted Streams ====

Source ------- [....S.] 128.222.228.89 -> 128.222.228.77 seq=638858703 ack=0 len=78 [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638858704 ack=2849933259 len=1090 [.A....] 128.222.228.89 -> 128.222.228.77 seq=638859728 ack=2849933259 len=1514 [.AP...] 128.222.228.89 -> 128.222.228.77 seq=638861176 ack=2849933259 len=666 [.A....] 128.222.228.89 -> 128.222.228.77 seq=638862706 ack=2849933260 len=66 Dest ------- [.A..S.] 128.222.228.77 -> 128.222.228.89 seq=2849933258 ack=638858704 len=74 [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861176 len=66 [.A....] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638861776 len=66 [.A...F] 128.222.228.77 -> 128.222.228.89 seq=2849933259 ack=638862706 len=66

Another addition in this version is that the fuzzymatch algorithm that is used to generate streams has been optimized (mostly because I was doing it stupidly last time) so that seq/ack searches don’t take nearly as long for large pcap files. You can download an example of fuzzysort output from a pcap file with 5 streams here.

While this is a step up from the unsorted fuzzymatch from the last post, this method still has its downfalls, not being able to store duplicate seq/ack packets could definitely cause problems. The ability to generate a sorted stream without having to split the stream into source and destination streams would also be extremely useful (so you could see a full transmission stream from both sides).

The next evolution of the library would be to actually keep a state table and follow TCP streams using the seq/ack numbers (which I attempted for this version, but it was extremely complex, so I scraped it and did fuzzysort). Hopefully I’ll be able to implement it without any problems. I might take the easy route and go for using TCP TSVal to attempt to order streams.

Comments? Criticism? Leave a comment and let me know

Rebuilding TCP streams with Ruby part 1: fuzzymatch

Lee — Wed, 12 Mar 2008 02:45:45 +0000

I have undertaken the (not so small) task of attempting to use Ruby to rebuild TCP data streams. I was originally planning on using ruby-libnids, but after running into considerable trouble with dynamic library linking on OSX, I decided it’d be a good experiment to write my own.

This is not a small feat. In fact, I probably won’t ever get it working perfectly (or if I do, it certainly won’t be soon). In a series of posts, I’ll be exploring some of the development decisions, design choices and pitfalls that I run into, sort of a development journal. Why would a tool like this ever be useful? Well, if you want to do analysis on packet payloads, you certainly have to make sure you have a contiguous data segment to work on, otherwise part of the message is lost. I do, however, have a few things going for me:

I don’t have to do live reassembly. I can do 2-pass reassembly, because I’m only going to be analyzing pcap files. Perhaps latter I’ll add in the ability to do live analysis, but for now it’s adding complexity to a problem that’s already complex enough.
I will be building prototypes of different methods, each with its pros and cons, instead of having to work towards a final release, I have the flexibility to change designs with every iteration.
No matter what, I win. Nothing but learning can come from this project, so it still has benefits even if I do never arrive at a final product.

For the first installment, I want to talk about fuzzymatching using sequence and ackknowledgement numbers. I have released my proof-of-concept code here, but I’ll be going over it in more detail in this post:

Okay, let’s start by dumping the most simply pcap file datastream I could possibly generate (sending the word “test” using netcat):

./sdump.rb pcaps/pSmall.pcap [1] [....S.] 128.222.228.89 -> 128.222.228.77 seq=4679659509 ack=0 len=78 [2] [.A..S.] 128.222.228.77 -> 128.222.228.89 seq=30782357 ack=4679659510 len=74 [3] [.A....] 128.222.228.89 -> 128.222.228.77 seq=4679659510 ack=30782358 len=66 [4] [.AP...] 128.222.228.89 -> 128.222.228.77 seq=4679659510 ack=30782358 len=71 [5] [.A...F] 128.222.228.89 -> 128.222.228.77 seq=4679659515 ack=30782358 len=66 [6] [.A....] 128.222.228.77 -> 128.222.228.89 seq=30782358 ack=4679659516 len=66 [7] [.A...F] 128.222.228.77 -> 128.222.228.89 seq=30782358 ack=4679659516 len=66 [8] [.A....] 128.222.228.89 -> 128.222.228.77 seq=4679659516 ack=30782359 len=66

Being the most simple example, you can see that I only have 1 stream to deal with, and that the seq/ack numbers are nice enough to be where we want them to. For the first prototype, I have created a list of Stream objects (containing sequence and ack numbers), and when I get a new packet, I compare its seq/ack numbers to the numbers of streams already in the list, if it’s within a threshold (5 is my value), then it probably belongs to that stream and I add the packet to the stream.

One of the really nice things about processing packets this way is that I don’t have to worry about packet order on the first pass, if a packet is close enough, it’s added, if it isn’t, a new stream is created. Fuzzymatcher correctly identifies this pcap file as 1 stream:

./fuzzymatch.rb ../pcaps/pSmall.pcap [1] [....S.] 128.222.228.89 -> 128.222.228.77 seq=4679659509 ack=0 len=78 No stream found for packet, starting a new one... [2] [.A..S.] 128.222.228.77 -> 128.222.228.89 seq=30782357 ack=4679659510 len=74 ack num: 4679659510 close enough to 4679659509 to add. [3] [.A....] 128.222.228.89 -> 128.222.228.77 seq=4679659510 ack=30782358 len=66 seq num: 4679659510 close enough to 4679659509 to add. [4] [.AP...] 128.222.228.89 -> 128.222.228.77 seq=4679659510 ack=30782358 len=71 seq num: 4679659510 close enough to 4679659509 to add. [5] [.A...F] 128.222.228.89 -> 128.222.228.77 seq=4679659515 ack=30782358 len=66 ack num: 30782358 close enough to 30782357 to add. [6] [.A....] 128.222.228.77 -> 128.222.228.89 seq=30782358 ack=4679659516 len=66 seq num: 30782358 close enough to 30782357 to add. [7] [.A...F] 128.222.228.77 -> 128.222.228.89 seq=30782358 ack=4679659516 len=66 seq num: 30782358 close enough to 30782357 to add. [8] [.A....] 128.222.228.89 -> 128.222.228.77 seq=4679659516 ack=30782359 len=66 ack num: 30782359 close enough to 30782357 to add. Ended up with 1 stream(s). Stream 1 contains 8 packet(s)

Now, I haven’t added any code to actually order the packets (yet), but this is a good start. Before I continue, how does fuzzymatching handle pcaps with a large amount of data?

./fuzzymatch.rb ../pcaps/data.pcap ... tons of output ... [1684] [.A....] 192.168.1.123 -> 64.12.28.76 seq=6079164989 ack=806823238 len=54 seq num: 6079164989 close enough to 6079164989 to add. [1685] [.A....] 192.168.1.123 -> 64.12.28.76 seq=6079164989 ack=806825958 len=54 seq num: 6079164989 close enough to 6079164989 to add. [1686] [.A....] 192.168.1.123 -> 64.12.28.76 seq=6079164989 ack=806826815 len=54 seq num: 6079164989 close enough to 6079164989 to add. [1687] [.A....] 192.168.1.123 -> 64.12.28.76 seq=6079164989 ack=806826815 len=54 seq num: 6079164989 close enough to 6079164989 to add. [1688] [.A....] 64.12.165.98 -> 192.168.1.136 seq=424631922 ack=282306986 len=54 No stream found for packet, starting a new one... [1689] [.A....] 192.168.1.136 -> 64.12.165.98 seq=282306986 ack=424631923 len=54 ack num: 424631923 close enough to 424631922 to add. Ended up with 53 stream(s). Stream 1 contains 8 packet(s) Stream 2 contains 22 packet(s) Stream 3 contains 8 packet(s) Stream 4 contains 34 packet(s) ... lots more output, one for all 53 streams ...

Not bad, for a start. I think the next goal is probably ordering the streams, luckily I can do this in a second pass (or, when the stream data is accessed, cutting down on computation time unless the data is actually needed)

All my POC code can be downloaded from the RSB (Ruby StreamBuilder) project page, which will receive regular updates as I continue development.

Questions? Comments? Flames? Leave me a comment and let me know what you think

You can check all the other parts of the series:

Rebuilding TCP streams with Ruby part 2: fuzzysort