PSU CS199 — Introduction to Computer Science

Homework 5: Histograms

Due Friday 14 August 2009

Language Level: Advanced Student
Teachpacks: image.ss, loops.ss

This homework builds on everything that you have done so far. Like Homework 4, it involves processing real-world data. Howver, the task required of you — building a histogram — is one that is most easilly solved using vectors and vector-set!

Preliminaries

Work alone. As this is the final assignment for the course, I want an assignment that is your own work, for grading purposes.

The textbook does not talk very much about vectors. However, Intermetzo 29 does explain the general idea of a vector, and why we need vectors in addition to lists. Read this section of the book. Section 40.4 discusses mutating vectors and vector-set!

Download this file, and save to your Scheme directory under as new file name (one containing your name). We have discovered (the hard way) that on Windows Vista, it is dangerous to save a file to your downloads directory and then run DrScheme on it. This file contains some data about the Old Faithful geyser in Yellowstone National Park, on which you will run your functions, and parts of some data definitions and templates to get you started.

Your Assignment

Part I: Datasets and Histograms

You will need two data definitions in order to do this homework: datum, and dataset. A datum will consist of three values; a dataset will consist of a non-empty list of datums.

A histogram is lets us look at a large set of data and get a quick overview of its shape. If you haven't met histograms before, or you are unsure of the details, read this web page.

Your task is to take the data on Old Faithful and create a histogram of the interruption times. (The interruption time of a geyser is, apparently, the time that elapses between successive eruptions.) The data also contains two other measurements, which are irrelevant for this exercise.

What is Histogram?

A Histogram groups a series of measurements into "buckets" of a fixed size, and tells us how many measurements fall into each bucket. For example, if the data are 31 42 45 47 51 64 69 70, and the buckets are 30–39, 40–49, 50–59, 60–69, 70–79, then the histogram tells us that there is 1 measurement in bucket 0 (31), 3 in bucket 1 (42, 45 and 47), 1 in bucket 2 (51), 2 in bucket 3 (64 & 69), and 1 in bucket 4 (70).

Write a function

    dataset->interrupton-histogram: list-of-datum number -> vector

that generates a vector of length n representing the histogram, where n is the second argument to the function. If the resulting vector is h, then

    (vector-ref h i)

should be the value of the ith bucket. Your function should determine the minimum and maximum values of the interruption field of the dataset and use them to set the limits

Part II: Graphics

The second part of this homework is a visualization of the histogram. You will write a function

  ;; histogram->image: dataset -> image
that, given a dataset, will produce a plot of the histogram.

Your visual representation of the histogram must include

You may break this problem into smaller pieces as you see fit. You will probably re-use some of the pieces of your solution to homework 4, such as axes.

  • Part III: Try it out

    Pull all the pieces together. Run your histogram-generating function several times on the data, with different values for n, the number of buckets. When I run your DrScheme file, I should see several images.

    If you didn't manage to do all the parts, hand in what you did complete. For example, Parts I and II are independent, and can be tackled in either order. If you never managed to get dataset->interrupton-histogram working correctly, fake it! Dummy up a function that returns an appropriate vector, ignoring its input, and carry on with Part II.

    There are some examples of histogram graphics in the web page mentioned above.

    Hand in your work.

    Put your name as a comment at the top of the definitions window. Save your file from DrScheme, and attach it to an email message. Submit your email to CS199Homework.

    Acknowledgements

    The data were obtained from http://www.stat.duke.edu/courses/Fall04/sta113/data/oldfaith.dat