Machine Learning

CSE 445/545:
Machine Learning
Winter Quarter 2009


Time : Mondays and Wednesdays, 12:00-1:50pm
Location: Fourth Avenue Building (SH), Room 40-06.

Instructor: Melanie Mitchell, FAB 120-24, (503) 725-2412, e-mail
Office hours: Mondays and Wednesdays, 2:00-3:00pm, or by appointment.

TA: Karan Sharma, e-mail
Office hours: Thursdays, 2-4pm, CS Dept. Fishbowl.

Course Website: :
http://www.cs.pdx.edu/~mm/MachineLearningWinter2009/index.html

Prerequisites: Undergraduate-level courses in calculus, linear algebra, and probability and statistics. Facility in at least one high-level programming language.

Course description: This course provides a broad introduction to techniques for building computer systems that learn from experience. It provides both conceptual grounding and practical experience with several learning systems. The course provides grounding for advanced study in statistical learning methods, and for work with adaptive technologies used in speech and image processing, robotic planning and control, diagnostic systems, complex system modeling, and iterative optimization. Students will gain practical experience implementing and evaluating systems applied to pattern recognition, prediction, and optimization problems.

Exams: There will be a take-home midterm exam and a take-home final exam. Both will be open-book.

Grading: Homework 50%, Presentation 10%, Midterm 20%, Final 20%.

Syllabus (subject to change):

Date

Topics

Homework and Reading

Monday Jan. 5

Class overview

Review of Probability Theory

Naive Bayesian Classification

Here are the slides.

Optional reading:
P. Sebastiani, A Tutorial on Probability Theory

Wednesday Jan. 7

Linear classification I

Linear discrimination using principal components analysis

Here are the slides.

Required reading for this topic:

C. Bishop, Pattern Recognition and Machine Learning, Chapter 4: Linear Models for Classification, Sections 4.1.1, 4.1.4, 4.1.7 (on electronic reserve at library).

L. I. Smith, A Tutorial on Principal Components Analysis

Optional reading: M. Turk and A. Pentland, Eigenfaces for recognition

Monday Jan. 12

Linear classification II

Evaluating classifiers

Here are the slides.

Required reading for this topic:

T. M. Mitchell, Machine Learning, Chapter 5: Evaluating Hypotheses (on electronic reserve at library).

Homework 1 assigned. Here is the gzipped tarball with the data.

Additional notes on Homework 1.

Matlab tutorial

Wednesday Jan. 14

Vapnik-Chervonenkis (VC) dimension and model selection

Kernel methods / Support Vector Machines I

Here are the slides.

Here are Suzi's Kernel slides.

Required reading for this topic: A. Ben-Hur et al., Support vector machines and kernels for computational biology

Monday Jan. 19

No class (Martin Luther King day)

...

Wednesday Jan. 21

Kernel methods / Support Vector Machines II

Here are the slides.

Will Landecker will present C. Cusano et al., Image annotation using SVM (optional reading)

Holly Grimes will present Pang et al., Thumbs up? Sentiment classification using machine learning techniques (optional reading)

Required reading: T. Fawcett, An introduction to ROC analysis, Sections 1--4, 7

Monday Jan. 26

Kernel methods / Support Vector Machines III
ROC analysis
Ensemble learning I

Here are the slides.

Samuel Moffatt will present An adaptive network intrusion detection method based on PCA and support vector machines (optional reading -- available for free if you're on the PSU network)

...

Wednesday Jan. 28

Ensemble learning II

Here are the slides.


Jason Hall will present T. Diettrich Ensemble methods in machine learning (optional reading)

Required reading: R. Schapire, The Boosting Approach to Machine Learning: An Overview

Monday Feb. 2

Ensemble learning III.

Here are the slides.


Robert Bermond will present K. Tieu and P. Viola, Boosting Image Retrieval (optional reading -- available for free if you're on the PSU network)
Here are Rob's slides.

Required reading: R. E. Schapire et al, Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods (Section 2 is optional)

Optional reading: L. Reyzin and R. Schapire, How Boosting the Margin Can Also Boost Classifier Complexity

Homework 2 assigned; due Monday Feb. 16.

Here is the gzipped tarball with the C4.5 code.

Here is part II of Homework 2.

Wednesday Feb. 4

Unsupervised learning I

Here are the slides.

Daesung Park will present G. Hamerly and C. Elkan, Learning the K in K-means

Required reading Chapter 12 ("Cluster Analysis") in I. Kononeko and M. Kukar, Machine Learning and Data Mining. (On e-reserve in the library)

Monday Feb. 9

Unsupervised learning II

Here are the slides.

Student presentations:

Nish Aravamudan: M. Ramoni et al., Bayesian Clustering by Dynamics

Jonathan Beare: A. Likas et al., The Global K-Means Clustering Algorithm

...

Wednesday Feb. 11

Unsupervised learning III

Here are the slides (1) and (2) .

Student presentations:

Tim Pepper: Y. Zhao and G. Karypis, Evaluation of Hierarchical Clustering Algorithms for Document Datasets

Adam Naser: S. Ertekin et al., Learning on the Border: Active Learning in Imbalanced Data Classification

...

Monday Feb. 16

Bayesian networks I

Here are the slides.

Student presentations:

Brice Arnould: J Breese et al., Empirical Analysis of Predictive Algorithms for Collaborative Filtering

Amer Harb: Y. Wang and J. Vassileva, Bayesian network trust model in peer-to-peer networks


Required reading for this topic:

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Chapter 14 (Probabilistic Reasoning), Sections 14.1-14.6. (On electronic reserve at library.)

Take home midterm assigned, due Monday Feb. 23.

Wednesday Feb. 18

Bayesian networks II

Here are the slides.

Student presentations:

Max Goodman: G. Provan and M. Singh, Learning Bayesian Networks Using Feature Selection

Max Quinn: N. Friedman, Learning belief networks in the presence of missing values and hidden variables

...

Monday Feb. 23

Class cancelled. Midterm due Wednesday Feb. 25.

...

Wednesday Feb. 25

Temporal learning I (Hidden Markov Models, dynamic Bayesian networks)

Here are the slides.

Student presentations:
John Gebbie: D. Zhou et al., Probabilistic Models for Discovering E-Communities

Damon Tyman: E. Horvitz et al., The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users

...

Monday Mar. 2

Temporal learning II (Hidden Markov Models, dynamic Bayesian networks)

Guest lecture on HMMs and music: Will Landecker

Student presentations:
Ian Billington: U. V. Martin and H. Bunke, Using a statistical language model to improvethe performance of an HMM-based cursive handwriting recognition system

Randy Myers: H. Lee and A. Ng, Spam deobfuscation using a Hidden Markov Model

Homework 3 (part 1) assigned; due Wednesday March 11.

Wednesday Mar. 4

Temporal learning II (Hidden Markov Models, dynamic Bayesian networks)

Here are the slides.

Dimensionality reduction I

Student presentations:

John Koerner: K. Seymore et al. Learning Hidden Markov Model Structure for Information Extraction

Dona Hertel: W. Gansterer et al., Spam filtering based on latent semantic indexing

Jennifer Williams: E. Gabrilovich and S. Markovitch, Computing semantic relatedness using Wikipedia-based explict semantic analysis

Homework 3 (part 2) Due Wednesday March 11.

Required reading:

F. Chiaromonte, Notes on Multidimensional Scaling

S. Deerwester et al., Indexing by Latent Semantic Analysis

Monday Mar. 9

Dimensionality reduction II

Here are the slides.

Student presentations:
Dan Coates

Dan's slides.

Jeff Weston: M. S. Bartlett et al., Independent Component Analysis for Face Recognition

Jeff's slides.

...

Wednesday Mar. 11

Catch up on uncompleted topics

Student presentations:
Alireza Goudarzi

Take-home final exam assigned; Due Wednesday March 18 by 5pm.

Monday Mar. 16

No class (finals week).

...

Wednesday Mar. 18

No class (finals week).

...