I’ve been kicking this idea around in my head for the last couple of days, trying to decide what to write…
Return with me, for a moment, back to the computational hardware class you took in college (if you did take one, don’t worry if you didn’t). Do you remember discussing program/memory flow? How about locality of reference for RAM access? Well, I wanted to discuss a few ideas about locality of reference in regards to network security.
First off, let’s define the different kinds of locality of reference, first there is temporal locality, then spatial locality and finally sequential locality. Here are the quick-and-dirty definitions:
Temporal locality states that when you access something, it is likely you will access the same thing again
Spatial locality states that when you access a particular piece of data, you are likely to access data near that piece in the future.
Sequential locality states that when you access a piece of data, you are likely to access nearby data in either an ascending or descending fashion.
So, what does this have to do with information security? (if anything) Well, I believe you can apply this kind of data collection methods to network traffic when looking at a whole network. For a simple example, take for instance 2 machines, each one one serving DNS records to clients on a network. In a typical example, a client would query one of the two machines (probably the first one, using the 2nd as backup if the first didn’t respond), retrieve it’s name record and be done with the connection right? How about in the case of something scanning the network for DNS servers as potential targets? You could expect to see both DNS servers queried within a short amount of time, by the same host. In this case, the person doing automated scanning exhibited symptoms of locality (in this case, spatial, although it could have been sequential depending on the IP assignments) when scanning for vulnerabilities.
How does this help us? Well, as we increase in our security monitoring, we may be able to gain an additional edge over scanning tools by classifying network traffic according to the locality of it’s flow. An nmap scan of an entire subnet (even just 1 port), for example, would be displaying sequential locality that would most likely not show up during legitimate use. An nmap scan of 1 host, all ports (let’s pretend nmap doesn’t randomize port scanning order) shows sequential locality as far as ports are concerned (which is also an example of determining wether it was automated or human scanning.
Each of the different kinds of locality you would expect to see in a different environment, in the case of temporal locality, if you had, say, a DHCP server, you would expect to see a small amount of temporal locality between hosts (for legitimate uses) since each host would only send out a DHCP request if it either lost it’s current address (needed to renew), or was just joining the network for the first time. Seeing one host exhibit a great deal of DHCP temporal locality (say…requesting a DHCP license 50 times in 1 minute), should trigger an alarm.
Another thing this can help us determine is whether it is a real-live person generating this traffic, or an automated tool. Example:
A live person is gathering data about auditing a network, they decide to start with DNS servers, they gather a list of DNS servers, then manually gather information about each server by attempting to log in, grabbing banners, etc.
An automated tool scans a subnet, noting any DNS servers it finds, it then (sequentially), attempts to gather information from each server the same exact way. It makes no distinction between different kinds of DNS servers, etc, the network traffic sent is the same for each server.
A sensor, capturing this traffic, looks at the traffic sent by the live person, sees the lack of a sequential scan packets against DNS servers, the direct approach (“I know what information I’m gathering and how to gather it directly without resorting to a scan”) and notes the lack of sequential and temporal locality.
The same sensor looks at the traffic sent by the automated tool, sees the network flow example of sequential locality (scanning a subnet, incrementing IPs by 1) and temporal locality (since many automated tools query the same server multiple times). It also compares the traffic sent to each of the different DNS servers (was the exact same query sent each time?) and from that, determines the amount of locality exceeds the threshold to classify it as an automated scan/attack.
Unfortunately, I don’t have any kind of tool to do this kind of analysis right now, I also don’t know of any tool that specifically handles looking at data with locality of reference in mind. Think about your network, what servers do you have where temporal locality would be expected? Now what kind of locality would you NOT expect to see against the same machine, website crawling? Port traversal? What locality differences would you expect to see in a human vs. automated usage of your network? How about applying it at a filesystem level, expecting to see sequential file access for a group of files?
I certainly don’t claim to be an expert in locality of reference (or NSM, for that matter), but I was curious if anyone has come across anything else like this. If you have, leave me a comment with a link to a paper or article about it, I’m very interested in reading more about it. I would appreciate it