== Boiler Plate == You can use this stuff as long as you use it for interesting things and not evil things. If you use it somewhere cool, it'd be nice if you gave us some credit or something. Caleb Phillips Suresh Singh This work was partly funded by NSF 0435328. == Collection Methodology == These traces were collected using a VeriWave WT20 Appliance which was kindly loaned to us by the folks at VeriWave (http://www.veriwave.com). The WT20 hardware consists of two 802.11 reference radios, real-time linux, and two processors. The WT20 provides nanosecond resolution timestamps and it logs the time when it began seeing a frame and the time when the frame finished arriving. We are using Veriwave WT20 in a somewhat novel way. It listens with two radios simultaneously on the same channel, recording frames to a per-radio, 256 MB, ring-buffer. The The WT20's firmware will discard any frames received with a signal less than -75 dBm, but the rest (Data and Management, but not Control) are logged without any scrubbing. A tclsh script, running on a laptop connected to the WT20 (via ethernet), grabs the contents of this ring-buffer from each radio in-turn, every 10 seconds. This data is dumped as a VWR file, a proprietary Veriwave file format, and then converted to a libpcap file on the fly. At the end of a 4 hour capture we have 1440 files which are stitched together using a program we have developed for this purpose (after finding that existing tools like mergecap and tcpslice either contained bugs or didn't work with 802.11 traces). We face two challenges in data collection: The first is placement of the VWave sniffer. Because it has a lower effective receiver sensitivity than most access points today (-75dBm versus -90dBm), we must prevent a large possible packet loss with careful antenna choice and placement. The second problem is practical -- we had to obtain permission from the three merchants and further needed to ensure that our equipment was as unobtrusive as possible so as not to affect the ``normal'' behavior of the users. == Anonymization Methodology == We used the anonymization tool developed by David Kotz et al. for santizing the CRAWDAD/Dartmouth traces. It is based on the prefix-preserving anonymization scheme presented in: Xu, J., Fan, J. Ammar, M., and Moon, S. 2002. ``On the Design and Performance of Prefix-Preserving IP Traffic Trace Anonymization'', Proc. of 10th IEEE International Conference on Network Protocols (ICNP 2002), Paris, France, November 2002. http://www.cc.gatech.edu/~jx/reprints/ICNP02A.pdf More recent publications (V. Paxson 2006) have shown that there are still attacks possible with this level of anonymization. We have chosen to anonimize the traces as much as possible without losing the most interesting features. Our expectation is that the remaining information that could be extracted with such an attack is uninteresting enough to bore most attackers. Moreover, all traces were collected on unencrypted networks in public locations and with the permission of the network-operators - users of such networks should have low expectations for the privacy of their traffic to begin with. We: * Anonymized the IPs, in a prefix-preserving way * Anonymized the MACs, keeping the OUI identifiers intact * Stripped everything after the TCP/UDP header Which corresponds to this command with the tool we used: ~/bin/pcap_sanitize -k ~/anon_key -u -r merged.cap -o merged_anon.cap == Description of Traces == We collected data at six different locations of which three (first three below) were located on-campus and three off-campus: 1) Name: psu-cs Where: PSU CS Department Near Faculty Offices in Networking Closet Duration: 1 Hour (1500 - 1600), Monday Description: The capture antennas were placed at the same level and immediately in front of the access-point antennas. The closest clients are at least one wall away. We used this site for prototyping our capture methodologies. 2) Name: library Where: PSU Library, 3rd Floor Duration: 4 Hours (1400 - 1800), Monday Description: Each library floor is covered by at least three access-points. We positioned our capture antenna on a table, about 4 feet away from the access-point antenna (ceiling mounted) and with roughly the same vantage. 3) Name: cafeteria Where: PSU Cafeteria Duration: 4 Hours (0930 - 1330), Monday Description: For this capture we placed our capture antenna directly under a sector antenna which serves the cafeteria. The room is mostly free of impediments, providing line-of-sight to nearly all users. 4) Name: pioneer-sq Where: Office overlooking ``Pioneer Square'' from the second floor Duration: 4 Hours (1130 - 1530), Monday Description: This location serves Pioneer Square, a large common outdoor area in downtown Portland, and surrounding coffee-shops and businesses. We setup the VeriWave WT20's antenna to the side of the access-point antenna, in a neighboring room. One wall and about 5 feet separated the capture antenna from the access-point antenna. 5) Name: urban-grind Where: Urban Grind Coffee Duration: 2 Hours (1300 - 1500), Thursday Description: The Urban Grind is a popular coffee-shop in Portland for laptop-users, and gets as much or more laptop-traffic then any other coffee-shop in Portland. This space, like the cafeteria, has very few impediments - both the access-point and the capture antenna have line of sight to nearly every client device. The capture antenna was placed approximately 10 feet from the ceiling-mounted access point. 6) Name: powells Where: Worldcup Coffee at Powell's Books Duration: 4 Hours (1030 - 1430), Monday Description: The coffee shop at Powells sees a typical, slow but steady stream of laptop users. Aside from a couple of book-cases, it is a mostly open space. We positioned our capture antenna on a bookshelf approximately 8 feet above the ground to have good line-of-site to the access-point and the laptop-using patrons.