Locality of reference in information security

December 19, 2007

I’ve been kicking this idea around in my head for the last couple of days, trying to decide what to write…

Return with me, for a moment, back to the computational hardware class you took in college (if you did take one, don’t worry if you didn’t). Do you remember discussing program/memory flow? How about locality of reference for RAM access? Well, I wanted to discuss a few ideas about locality of reference in regards to network security.

First off, let’s define the different kinds of locality of reference, first there is temporal locality, then spatial locality and finally sequential locality. Here are the quick-and-dirty definitions:

Temporal locality states that when you access something, it is likely you will access the same thing again

Spatial locality states that when you access a particular piece of data, you are likely to access data near that piece in the future.

Sequential locality states that when you access a piece of data, you are likely to access nearby data in either an ascending or descending fashion.

So, what does this have to do with information security? (if anything) Well, I believe you can apply this kind of data collection methods to network traffic when looking at a whole network. For a simple example, take for instance 2 machines, each one one serving DNS records to clients on a network. In a typical example, a client would query one of the two machines (probably the first one, using the 2nd as backup if the first didn’t respond), retrieve it’s name record and be done with the connection right? How about in the case of something scanning the network for DNS servers as potential targets? You could expect to see both DNS servers queried within a short amount of time, by the same host. In this case, the person doing automated scanning exhibited symptoms of locality (in this case, spatial, although it could have been sequential depending on the IP assignments) when scanning for vulnerabilities.

How does this help us? Well, as we increase in our security monitoring, we may be able to gain an additional edge over scanning tools by classifying network traffic according to the locality of it’s flow. An nmap scan of an entire subnet (even just 1 port), for example, would be displaying sequential locality that would most likely not show up during legitimate use. An nmap scan of 1 host, all ports (let’s pretend nmap doesn’t randomize port scanning order) shows sequential locality as far as ports are concerned (which is also an example of determining wether it was automated or human scanning.

Each of the different kinds of locality you would expect to see in a different environment, in the case of temporal locality, if you had, say, a DHCP server, you would expect to see a small amount of temporal locality between hosts (for legitimate uses) since each host would only send out a DHCP request if it either lost it’s current address (needed to renew), or was just joining the network for the first time. Seeing one host exhibit a great deal of DHCP temporal locality (say…requesting a DHCP license 50 times in 1 minute), should trigger an alarm.

Another thing this can help us determine is whether it is a real-live person generating this traffic, or an automated tool. Example:

A live person is gathering data about auditing a network, they decide to start with DNS servers, they gather a list of DNS servers, then manually gather information about each server by attempting to log in, grabbing banners, etc.

An automated tool scans a subnet, noting any DNS servers it finds, it then (sequentially), attempts to gather information from each server the same exact way. It makes no distinction between different kinds of DNS servers, etc, the network traffic sent is the same for each server.

A sensor, capturing this traffic, looks at the traffic sent by the live person, sees the lack of a sequential scan packets against DNS servers, the direct approach (“I know what information I’m gathering and how to gather it directly without resorting to a scan”) and notes the lack of sequential and temporal locality.

The same sensor looks at the traffic sent by the automated tool, sees the network flow example of sequential locality (scanning a subnet, incrementing IPs by 1) and temporal locality (since many automated tools query the same server multiple times). It also compares the traffic sent to each of the different DNS servers (was the exact same query sent each time?) and from that, determines the amount of locality exceeds the threshold to classify it as an automated scan/attack.

Unfortunately, I don’t have any kind of tool to do this kind of analysis right now, I also don’t know of any tool that specifically handles looking at data with locality of reference in mind. Think about your network, what servers do you have where temporal locality would be expected? Now what kind of locality would you NOT expect to see against the same machine, website crawling? Port traversal? What locality differences would you expect to see in a human vs. automated usage of your network? How about applying it at a filesystem level, expecting to see sequential file access for a group of files?

I certainly don’t claim to be an expert in locality of reference (or NSM, for that matter), but I was curious if anyone has come across anything else like this. If you have, leave me a comment with a link to a paper or article about it, I’m very interested in reading more about it. I would appreciate it :)

A completely useless but fun-to-write program for checking webpage existence

December 14, 2007

Code:

#!/usr/bin/env ruby
def fisher_yates_shuffle(a)
(a.size-1).downto(1) { |i|
j = rand(i+1)
a[i], a[j] = a[j], a[i] if i != j
}
end
lines = File.open('/usr/share/dict/words').collect
fisher_yates_shuffle(lines)
lines.each {
|word|
puts "trying #{word.chomp}..."
system("wget -q #{ARGV[0]}/#{word.chomp}.html")
system("wget -q #{ARGV[0]}/#{word.chomp}.htm")
system("wget -q #{ARGV[0]}/#{word.chomp}.php")
sleep(1)
}

(The “sleep(1)” is so you don’t kill the server with traffic, remove if you like)

Should be pretty self-explainatory, go through all the words in /usr/share/dict/words and attempt to fetch webpages. At my current word dictionary size, it would take 65 hours to complete (1 second per word) :D

A smart person would replace “/usr/share/dict/words” in the script with a better list of website pagenames, if they actually wanted to use this :)

You know, I’ve always wondered if servers had a “rurigenous.html” or a “mastochondroma.php” webpage on their site…

Network traffic IP Location aggregator (iploc)

December 14, 2007

Have you ever been looking through your pcap files (or live captures) and wondered where all the traffic was coming from (or going to)? I have! Well, I’ve written a small (< 150 lines) script to aggregate all of the packet source addresses into a neatly separated CSV (comma-separated values) file. It includes

<ip address>,<country>,<city and state>,<latitude>,<longitude>,<packet count>

First off, get the script here.

It requires ruby-pcap and wget (like most of the Ruby scripts I write :P). Each unique ip address will have the script query the hostip.info database, storing the info in a temp file. If the ip already exists in the script’s list, it won’t query for the information (to save time and bandwidth).

iploc can either run in live capture mode, or read from a pcap file. Here’s how you would run it live:

./iploc -i eth0 >> output.csv

iploc will run until it receives a CTRL-C, at which point it will aggregate the data and dump it to STDOUT. Alternatively, you can read from a pcap file like this:

./iploc -r ~/data.pcap >> output.csv

Here’s an example of what the CSV file looks like after being opened in Numbers (or Excel):

IPloc CSVs version 2

Since it’s a CSV file, it should be easy to parse for any other program. I’m planning on hopefully writing another program to take out the latitude/longitude and plot the points on a map (for a more visual representation).

IPloc also supports BPF filters, just use it like you would tcpdump, for instance, if you only cared about http data:

./iploc -i eth0 port 80 > output.csv

Questions? Leave me a comment below! Now go forth, and visualize your honeypot data! :)

P.S. Thanks to the hostip guys for having a really nice web api to get this data quickly.

Blog layout/pages update

December 13, 2007

Just a small update, I finally got around to creating static pages for the important things I’ve posted on my blog. You can view them on the right-hand column of the main page. I’ve created pages for the following projects/topics:

Hopefully this makes it easier to link to a particular project. Take a look and let me know if you find anything missing! I’m hoping to add a link for packages I’ve created soon also!

Also, I’m contemplating future posts, are there any requests for particular posts on a certain topic? More ZFS posts? More security tool posts? More how-to posts? Leave me a comment and let me know!

DNS poisoning FUD

December 12, 2007

In response to one of today’s articles on Ars Technica titled “DNS poisoning used to redirect unwitting surfers“. I highly respect Ars and read their articles regularly, however, in this case, I believe this article may be causing more FUD, which is not especially helpful in this case.

In the article they discuss DNS servers that can potentially serve bad information from requests, what the article *sounds like* is that this is an attack on legitimate DNS servers in order to get them to serve bad data (which would be far more serious). In actuality, the attack is using malware to change a user’s DNS settings to point to an evil DNS server, which in turn serves evil entries to the machine when a user tries to access a site like chase.com for banking.

Essentially, it’s a very advanced form of phising that uses malware to set correct settings. This is NOT the DNS poisoning attack the article vaguely describes, which would be if hackers were to able to get trusted DNS servers to send false data. It’s sad that a trusted source like Ars published the article under such a misleading title. More readership I suppose? (or honest mistake, personally I don’t think Ars would do it intentionally).

On another note, would you click on a gmail webclip that looked like this??

gmailwebclip

I’m guessing that the site isn’t malicious, just in a different language and thus displayed in “???” instead of whatever the original language was. Still, I’m curious why gmail thought I would be able to read something since 99.9% of all my email is in English. I’m also curious what a lay user would think of a webclip like that.

I apologize for the lack of consistent posting lately, I’ve been hard at work on the nsm-console for inclusion in the upcoming Hex 1.0.2 release. More posts to come!

Aegrescit medendo (Cell phones)

December 4, 2007

My wife and I don’t have cell phones.
At all.

You’d think that given my profession and people our age these days, that we would both have cells phones with some kind of super minute plan. You know what the worst part of it is? We used to have cell phones until we cancelled them due to expense. How do you have access to that kind of technology and then give it up?

Everyone has a cell phone these days, little 9 year-old kids, 95 year-old grandparents. I admit they are an extremely useful piece of technology, and they allow you to do things that were not possible only years ago. Look at the latest phones and you’ll see internet access from cell towers, messaging, email, video, music, etc all without the hassles of actually having a laptop or desktop around. I think mobile computing is definitely something coming in the future.

Still, I think there are some very important lessons the *lack* of cell phones can teach us:

1. Be on time
Before cell phones, could you call and say you were running late? No. You either showed up, or you were late. This is also a double-edged sword, because there are valid times to be running late and a cell phone definitely allows you to be able to give warning for that sort of thing. However, not being able to call ahead and say you will be late automatically commits you to arriving at the agreed-on time, no “5 minutes late because I was just lazy”.

2. You are not Facebook or Myspace
You are not the central hub for communication, everyone will not text message you everything they’re doing all day long. Stop gossiping. (May or may not apply to you)

3. Plan things in advance
No more can you call ahead and drop by with only a few moments notice. Now you must plan when to arrive somewhere. Now, of course it is nice to be able to call someone when in the neighborhood, but I still think that not having this ability, even for just a short while. Can teach you how to better plan for future events.

4. I am sorry work, but once I leave you, you can’t call me
That’s right, leave work at work. Yes, I know there are many workplaces that require you to be on-call, available all hours and all that stuff. Sometimes it is nice to not have work call you while you are out shopping. Having only a home phone requires someone to (gasp!) leave a message if you aren’t there.

5. Focus on driving
Of course, this doesn’t apply to everyone, this only applies to the 2-3 people daily who cut me off, run red lights, don’t signal and generally drive like they’ve just taken 3-4 times the recommended dosage of night-time cough syrup.

Lots of other things come to mind also, but these are the ones that stick out the most to me. On the other hand, having a cell phone definitely has it’s advantages (and yes, there are many of them). Driving long distances is one of them (what if you run out of gas? Definitely nice to have a cell phone then).

I’m not trying to make the point that cell phones are evil or anything, I’m just trying to share what I think you can learn by going without them for a short while. Think about how much you rely on a cell phone and how you would handle not being able to use it for a while, then you’ll understand the point I am trying to make.

And yes, we probably will get them again in the future. Eventually.

What do you think about cell phones? Necessary evil? Should parents be giving their pre-teen children cell phones? What age is a good age for a cell phone?

Packages to get svn working on Hex 1.0.*

December 3, 2007

Trying to run svn on Hex 1.0.* you get the following error:

/libexec/ld-elf.so.1: Shared object "libaprutil-1.so.2" not found, required by "svn"

As geek00l pointed out, this can be fixed by issuing the following command:

cd /usr/ports/devel/apr-svn/ && make install clean

Assuming you have a ports tree downloaded into hex, but what if you don’t have access to the ports tree? (I can’t download the ports tree from my work) Well, I’ve created the packages you need to install in order to get svn to work properly.

First, download db42-4.2.52_5.tbz and apr-gdbm-db42-1.2.8_2.tbz to the same directory somewhere on the Hex machine. Then issue the following command:

sudo pkg_add -v ./apr-gdbm-db42-1.2.8_2.tbz

This will automatically install the db42 package as a dependency. After installing the apr package, svn should work without any problems.

You can see more packages on the Hex project page.

Compiling tcpdstat on Mac OSX (quick fix)

November 29, 2007

Quick fix for compiling tcpdstat on Mac OSX (Leopard, although probably works for Tiger too).

If you get this error:

cc -I. -I../libpcap-0.7.1 -DLINUX -D__FAVOR_BSD -D_LARGEFILE_SOURCE=1 -D_FILE_OFFSET_BITS=64 -L../libpcap-0.7.1 -c stat.c
cc -I. -I../libpcap-0.7.1 -DLINUX -D__FAVOR_BSD -D_LARGEFILE_SOURCE=1 -D_FILE_OFFSET_BITS=64 -L../libpcap-0.7.1 -c net_read.c
net_read.c:74:1: warning: "__FAVOR_BSD" redefined
<command line>:1:1: warning: this is the location of the previous definition
net_read.c:149: error: static declaration of ‘packet_length’ follows non-static declaration
tcpdstat.h:415: error: previous declaration of ‘packet_length’ was here
make: *** [net_read.o] Error 1

Edit the net_read.c file and change line 149 from this:

static int packet_length; /* length of current packet */

to this:

int packet_length; /* length of current packet */

Simple eh? Just type “make” again and tcpdstat should compile just fine. Simple fix.

NSM Console projected module list

November 28, 2007

Here’s a list of all the planned modules and completed (struck-out) modules for nsm-console: (if a module is struck out, it’s because I’ve finished making a module for it, it isn’t necessarily in the tarball for download)

  • aimsnarf
  • ngrep (gif/jpg/pdf/exe/pe/ne/elf/3pg/torrent)
  • tcpxtract
  • tcpflow
  • chaosreader
  • bro-IDS
  • snort
  • tcpdstat
  • capinfos
  • tshark
  • argus
  • ragator
  • racount
  • rahosts
  • hash (md5 & sha256)
  • ra
  • honeysnap
  • p0f
  • pads
  • fl0p
  • iploc
  • foremost – thanks shadowbq!
  • flowgrep
  • tcptrace
  • tcpick
  • flowtime
  • flowtag
  • harimau
  • clamscan

Think of any other useful modules? Leave me a comment and let me know!

P.S. I’m also brainstorming for some pcap/real-time network visualization tools, stay tuned!

NSM Console – A framework for running things

November 27, 2007

Well, I’ve been hard at work for the last couple of days working on a (hopefully) useful tool for aiding in NSM file analysis (for pcap files, live analysis doesn’t work).

Behold! I present NSM-Console! (read more about it here, watch a screencast here)

Download the framework here.
Keep in mind this framework only includes 3 modules (mostly used just for testing)

NSM-Console in a small (< 500 1000 1500 lines) framework for running nsm modules. Essentially, it’s a framework for running things (but we don’t call it that because it sounds like it wasn’t any work :P). Here’s the breakdown: Continue Reading »

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org