Validation and Port Signatures

In this section we wish to discuss our validation and testing of
the TCP-based work mentioned previously.  We do this in the context
of a novel reporting technique that has proven useful called a "port
signature" report.   The port signature report is more or less a
view of our TCP syn tuple, which includes the work metric previously
discussed as well as a small sampled set of destination ports (1..10)
for any IP source in question giving us a limited Layer 4 view
of the applications being used by the host.  Before we present the
port signature and related efforts aimed at validation, consider
the following statement from [TRW/Paxson]: "Consequently our argument
is nearly circular: we show that there are properties we can plausibly
use to distinguish likely scanners from non-scanners in the remainder
hosts, and we then incorporate those as part of a (clearly imperfect)
ground truth against which we test an algorithm we develop that
detects the same distinguishing properties".  Ultimately in a very
narrow sense, it is important to remember, our work weight system
catches *anomalies*, as for example in the limited scope of a system
sending SYNS and not getting any packets back (barring resets). We
all know this is not how TCP should work.  However ultimately if a
given system sends 100 SYNS and nothing else is seen, one cannot
always necessarily explain whether a programmer produced buggy code,
or we have a bad protocol, or had evil intent.  However looking at
a sample set of ports does help us here, especially when we know from
other evidence (when possible) that certain target ports with certain
frequency counts do not appear as the nature targets of so many SYN
packets in such a short time.  Still there are new phenomenon
including a very small percentage of "scanners" (and here we are
speaking only of work weights > 80%) that are in some way not
explicable and knowing the destination ports used may not be sufficient.

Below we will briefly introduce the port signature report, with
some examples.  In the process we will discuss both some explicit
validation testing (which is a work in progress) and some observations
we have made based on six months of experience with this tool.

port signatures:

ip src:	 flags work:  SA/S: port signature:
1        (WOM) 100:     0:  [445,100]
2        (WOM) 100:     0:  [24910,100]
3        (WOR) 100:     0:  [5554,65][9898,34]
3.1      (WOR) 100:     0:  [5554,65][9898,34]
4        ()    6:       100:[1054,2] ...
5        ()    2:       10: [1124,14]...[6881,36][6882,5]... 
6        ()    22:      0:  [1433,99][3536,0]
7	 (WOR) 100:     0:  [139,33][1025,22][2745,21][6129,23]

The port signature report given here is simplified from the current
version for reasons of space, and also consists of a small set of
illustrative examples taken from one real PSU report from fall 2004.
The portreport is derived from the front-end syn tuple and represents
the subset of hosts produced by the worm metric.  Thus we can say
the hosts in question for the most part have produced N more SYNS
than FINS.  Each port signature begins with the IP source in question,
with statistics for each individual IP source given per line.  In
addition to three metrics, flags, work, and SA/S, the primary
mechanism here is the port signature on the far right of each IP
source.  The port signature includes 1..10 two-tuple port samples,
with each port sample consisting of a destination port and a packet
frequency count for each port in the port sample space.  The number
of buckets for port destinations is currently set to 10 (we use
ellipsis above for cases when the entire port sample space is
filled).  For example the third entry shows that packets were sent
to TCP ports 5554, and 9898 by IP source 3.  The former received
65% of the packets, and the latter received 34%.

The port signature report is sorted from top to bottom in terms of
its logical key, the IP source address.  Here we are replacing real
IP addresses with logical numbers as substitutes.  (IP source 1
will be referred to as example 1, etc.).  A sorted IP source space
is useful because one can see possible "nearby" groupings of
distributed IP attacks, and of course, one can easily view ones own
IP source address space for outbound attacks.  For example, we have
observed agobot-based attacks in which all the IP source addresses
in a /24 space appear to be attacking the same remote destination
ports.  In our report above, there are two attacks that appear
similar based on their ports coming from the same network (3 and
3.1).  The port signature is also sorted from low port to high port
and this helps us see similar attacks using the same set of ports.

The flags metric shows us whether or not the worm candidate is receiving 
2-way data.  Flags here include: 

	1. W - the work weight is >= 90%.
	2. 0 - few fins if any are returned.
	3. R - large numbers of resets are being returned.
	4. M - few non-reset data packets are being returned.

The work metric is shown next.  We have done a statistical analysis
of the distribution of work metrics during both "normal" times
and during large distributed attack periods (as shown in our
worm graph, see []).  During normal periods (most times), the work
weights tend to cluster around low values, say 0..20%, and high values,
80..100%.  This corresponds to our empirical hunch that in general work
weights in the range of 80% or higher are usually a worm, and more rarely
a misbehaving application.  Of course during large attacks the metric clusters predominately 
in the high zone and tends to 100%.  

PSU source IPs are mostly clients of which a few are infected
dormitory hosts, and others tend to be running P2P clients like
bittorrent.  External hosts divide up into mostly TCP-based worms
with the occasional scanner (a la nmap) and a smaller set a puzzling
but likely benign phenomenon that we call the "noisy web server".
The noisy web server (example 4) and p2p apps (example 5) tend to
have low work weights.  Worms (examples 1, 2, 3, and 3.1) tend to
have high work weights.

We could choose to only show IP sources with high work weights
because of the high rate of "worminess". Out of 1000s of instances
of "worms", we have seen less than 10 cases that were not worms.
These cases are true anomalies in that something is wrong, but they
are not necessarily worms.  Three example anomalies so far spotted
(and explained) include: 1. one case of a popular meeting application
that perhaps over enthusiatically tries to reconnent to its server
when the server is taken down, 2. well-known (as opposed to infected)
campus email servers that are attempting to forward email error
messages to spammers (which given fake return IP addresses will
never work), and 3. certain P2P clients (often Gnutella-based) that
have a very low success rate for peer connections.  Examples 1-3.1
and 7 show work metrics at 100% for various "real" worms.  3 and
3.1 are examples of the dabber worm[dabber].  Example 7 is an old
phenomenon seen many times, and is some form of phatbot/agobot
attack.  These two examples taken together illustrate a very
interesting forensic possibility which is that the display of the
ports may allow you to identify the worm.  On the other hand, Example
2 is a new phenomemon as of late November, 2004 which we have not
seen before but based on experience and the work metric, it is
highly dubious.   Still we have not as of yet identified it.

In summer 2004, we performed a number of Microsoft file share tests
and looked at hosts screened from the Internet using various Microsoft
file share and SQL services.  We dumped their associated syn tuples
during short and long sample periods to see if ports used by Microsoft
(with TCP) including 135, 139, 445, and 1433 were likely to ever
show up in the worm metric sample.  The answer was no, which conformed
to the intituion of various local security experts.  This per
application testing is a very good idea and we intend to continue
it in the future with other applications (nmap, nessus, and various
P2P applications).  For now we can state that instances of port
445, 135, and 139 with high packet rates are worms.

At this point in time, we are satified with our general understanding
of the work metric in the high range.   However there is work yet
to be done in terms of understanding why IP sources may appear in
the lower range.  For example, we have the noisy web server phenomenon
mentioned before that is shown in example 4.  We do not yet understand
why PSU contact with certain web servers produces large amounts of
SYNS that exceed the FIN count.  The work weight tends to be low,
thus there is two-way data exchange.  The SA/S metric is useful
here.  It compares the total number of SYN+ACK packets sent against
the total number of SYNS sent by an IP source.  Thus it gives us a
rough idea as to whether or not a system has client tendencies (also
true of worms), or is a server (SA/S equals 100%, with the noisy
web server), or is somewhere in-between, which is true for hosts
running P2P clients (example 5).

Example 6 is interesting simply because it too has a low work weight,
and yet we know from the previously mentioned application testing
that any mention of port 1433 in the work report (with large numbers
of packets) is an attack.  The work weight is low here because this
is a unsuccessful password guessing attack on SQL servers, thus
there is (nefarious) work being done. Here the use of ports is
invaluable.  Of course, it is also important to remember that one
does not need a high work weight to have an attack.

Example 5 is also interesting, as it is quite common to see P2P
applications appear in the portreport output.  This is presumably
because P2P applications in general will have some set of connection
peers however with a subset of those peers possibly unavailable
simply because some of the peer IP addresses are old.  Of course
there is no guarantee that a given P2P application is bound to a
given port.  Still we can guess that this example is using bittorrent
because of port 6881 and 6882.  The SA/S metric is interesting here
in that it suggests the host in this example has some server
tendencies although it tends to the client side.  In general, we
intend to more research on the lower work weights.  For example,
we hope to improve our abilities to identify P2P applications.