Machine Learning

CSE 410/510 TOP:
Machine Learning
Spring Quarter 2007


Time : Tuesdays and Thursdays, 12:00-1:50pm
Location: Neuberger Hall (NH), Room 222.

Instructor: Melanie Mitchell, FAB 120-24, (503) 725-2412, mm-AT-cs.pdx.edu.
Office hours: Tuesdays and Thursdays, 2:00-3:00pm, or by appointment.

Course Website: :
http://www.cs.pdx.edu/~mm/MachineLearningSpring2007/index.html

Prerequisites: Undergraduate-level courses in calculus, linear algebra, and probability and statistics. Facility in at least one high-level programming language.

Course objectives: :

  1. Introduce students to several prominent areas of machine learning, including feature extraction, decision trees, neural networks, genetic algorithms, Bayesian learning, clustering, ensemble learning, support vector machines, and reinforcement learning, and illustrate what types of problems the different methods are suited for.
  2. Give students hands-on experience with these methods and tools for implementing and using them on real-world problems.
  3. Give students experience with performing simulations and doing statistical data analysis of the results.
  4. Provide students with experience in reading research papers and giving presentations.

Textbook: Ethem Alpaydin, Introduction to Machine Learning , MIT Press, 2004.

Reserve Readings: TBA

Assignments: There will be several short computer-based homework assignments, each corresponding to a topic covered in the course. All assignments are due at the beginning of class on the date specified. Late assignments will be accepted only with prior approval.

Presentations: Each student will be assigned one technical paper to read on a machine learning topic, and will give an in-class presentation (of approximately 10-15 minutes) on this paper.

Exams: There will be an in-class midterm exam and an in-class final exam.

Grading: Homework: 50%; Presentation: 10%; Midterm exam: 20%; Final exam: 20%.

Academic integrity: Students will be responsible for following the PSU Student Conduct Code, and in particular, the policy concerning academic honesty.

Collaboration policy: Students may discuss the general concepts and principles behind an assignment with other students. In fact, you are encouraged to do this whenever possible, because it is often a valuable way to reinforce ideas, and to learn new perspectives. However, in doing assignments, each student is expected to develop, write up, and hand in an individual solution and, in doing so, develop a sufficient understanding of the problem and solution so as to be able to explain it adequately to the instructor. Under no circumstances should a student copy or consult the solution of another student, or copy a solution from any other source, including the Internet.

Cheating will result in a grade of zero on the assignment or exam on which the student cheats and the initiation of disciplinary action at the university level.

Students with disabilities: If you are a student with a disability in need of academic accommodations, you should register with Disability Services for Students and notify the instructor immediately to arrange for support services.

Syllabus (subject to change):

Date

Topics

Homework and Reading

Tuesday April 2

Class overview

Intro. to machine learning

Feature extraction

Decision trees I

Reading: Textbook, Chapter 1; Chapter 2, sections 2.1, 2.5-2.8

Homework 1 (Feature Extraction) assigned. Due Tuesday April 10.

Here is the web site for downloading spam and ham data

Short papers assigned (Decision Trees).

Thursday April 5


Decision Trees II

Bayesian Learning I


Reading: Textbook, Chapter 9, Sections 9.1-9.5.

Tuesday April 10

Decision Trees III


Student (or instructor) presentations (Decision Trees)

Chris Jorgensen: D. Wilking and T. Rofer, Realtime object recognition using decision tree learning Robocup 2004, 556-563.
(Chris's presentation slides )

MM: C. Ratanamahatana and D. Gunopulos, Feature selection for the naive Bayesian classifier using decision trees. Applied Artificial Intelligence, 17, 415-487, 2003.

MM: B. Liu et al., Clustering via decision tree construction Foundations and Advances in Data Mining, (Studies in Fuzziness and Soft Computing, vol. 180), ed. by W. Chu, and T. Lin, Springer, 2005.

Homework 1 (Feature Extraction) due.

Homework 2 (Decision Trees) assigned, due Tuesday, April 17.

Here is the web site where you can download C4.5.

Short papers assigned (Bayesian Learning).

Thursday April 12

Bayesian Learning II

Reading: Textbook, 3.1-3.2, 3.7

Tuesday, April 17

Assessing and Comparing Classification Algorithms I

Student presentations (Bayesian Learning)

Montana Low: T. Pedersen, A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation
(Montana's presentation slides )



Darin Morrison: R. Kohavi Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid
(Darin's presentation slides )



Homework 2 (Decision Trees) due.

Homework 3 (Bayesian Learning) assigned, due Tuesday, April 24

Reading: Textbook, Chapter 14, Sections 14.1-14.4

Thursday, April 19

Assessing and Comparing Classification Algorithms II

Reading: Textbook, Chapter 4, Sections 14.5-14.9

Tuesday, April 24



Linear Discrimination and Support Vector Machines I

L. I. Smith, A tutorial on Principle Components Analysis

Reading: Textbook, Chapter 10, Sections 10.1-10.3, 10.9

Homework 3 (Bayesian Learning) due.

Short papers assigned (Support Vector Machines).

Thursday, April 26

Linear Discrimination and Support Vector Machines II

Model complexity and VC dimension

Slides from today's lecture on model complexity and VC dimension.

Homework 4, part 1 and part 2 assigned, due Thursday, May 3.

Tuesday, May 1

Guest lecture (Bart Massey)

...

Thursday, May 3

Student presentations (Support Vector Machines)

Gregor Richards: G. Schohn and D. Cohen, Less is More: Active Learning with Support Vector Machines Proceedings of 17th International Conference on Machine Learning, 2000.
(Gregor's presentation slides )

Review for Midterm

Perceptrons

Reading: Textbook, Sections 11.1-11.4

Homework 4 (Assessing and Comparing Classification Algorithms) due.

Tuesday, May 8

Midterm

Homework 5 (Linear Discrimination and Support Vector Machines) assigned, due Tuesday, May 15

Thursday, May 10

Neural Networks

Reading: Textbook, Chapter 11

Short papers assigned (Neural Networks).

Tuesday, May 15

Student presentation (Neural Networks)

Tyson Mahuna: M. N. Dailey, G. W. Cottrell, C. Padgett, and R. Adolphs. EMPATH: A neural network that categorizes facial expressions, Journal of Cognitive Neuroscience, 14(8):1158-1173, 2002.
(Tyson's presentation slides )

Dimensionality Reduction

Combining Multiple Learners I

Reading: Chapter 11, continued.

Short papers assigned (Combining multiple learners)

Thursday, May 17

Student presentation (Neural Networks)

Nish Aravamudan: I. S. Oh and C. Y. Suen, A class-modular feedforward neural network for handwriting recognition , Pattern Recognition, 35(1), 2002, 229-244.
(Nish's presentation slides )

Combining Multiple Learners II

Genetic algorithms I

Reading: Textbook, Chapter 15; Genetic algorithms handout.

Homework 5 (Linear Discrimination and Support Vector Machines) due.

Homework 6 (Neural Networks and Dimensionality Reduction) assigned, due Tuesday, May 29.

Here is the link for the data to use in this homework assignment.

Tuesday, May 22

Student presentations (Combining multiple learners)

Wren Ng Thornton: P. Viola, M. J. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153-161, 2005.

Scott Wespi D. Opitz and R. Maclin, Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198, 1999. (Scott's presentation slides )

Genetic algorithms II

Short papers assigned (Genetic Algorithms).

Thursday, May 24

Genetic Algorithms III (Guest lecture: Martin Cenek)

...

Tuesday, May 29

Reinforcement Learning I

Reading: Textbook, Chapter 16, Sections 16.1-16.4

Homework 6 (Neural Networks and Dimensionality Reduction) due.

Homework 7 (Genetic Algorithms) assigned, due Thursday, June 7.

Here is the gzipped tarball for the Simple GA in C.

Here is the spambase data.

Thursday, May 31

Student presentations (Genetic Algortihms)

Tim Hamilton: C. H. Ooi and P. Tan. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 19 (1), 2003, 37-44. (Tim's presentation slides )

Alex Ruban: J. Busch et al., Automatic generation of control programs for walking robots using genetic programming

Reinforcement Learning II

Reading: Textbook, Chapter 16, Sections 16.5-16.6

Tuesday, June 5

Analogy-Making

Homework 8 (Review for final) assigned, due Tuesday June 12. (Distributed in class. If you didn't get it, e-mail the instructor.)

Thursday, June 7

Clustering and collaborative filtering

Catch-up and Review

Reading: Textbook 7.1, 7.3

Homework 8 (Final review) due.

Tuesday, June 12

No class

Homework 7 (Genetic Algorithms) due.

Thursday June 14

Final exam, 10:15am-12:05pm

...