CS 510 Special Topics (4 credits) Data Mining

Data Mining
CS 510 (DM)
Winter,2004
home | news | site map
review | project | subject | group
weka | mining | gawk | bash
modeling | reference | pods
Display: big | small

Why all the scripting?

Starts January 2004, Wednesday evenings, 1840-2120

room 150

[TOP]


Updates

For the latest on this subject, see http://www.cs.pdx.edu/~timm/dm/news.html.

[TOP]


Course Coordinator

Tim Menzies; tim@menzies.us

[TOP]


Course Goals

The founder of Lotus, Mitchell Kapor, once said that ``getting information off the Internet is like drinking from a fire hydrant''. His warning should be taken seriously. Unless we can process the mountain of information that surrounds us, we must either ignore it or be buried by it. This subject introduces automatic data mining methods that find the ``pearls in the dust''; i.e. the stuff that really matters. Students in this class will gain an understanding of a range of data mining methods; learn how to contrast different learning methods; and understand the assessment methodologies for data miners.


[TOP]


Textbook

Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (1999) by Ian H. Witten, Eibe Frank; Morgan Kaufmann; 1st edition; BSBN: 1558605525; http://www.cs.waikato.ac.nz/~ml/weka/book.html

[TOP]


Major Topics Covered in the Course

[TOP]


Laboratory Projects

Students are given projects where they are asked to use various data mining algorithms and compare those learners.

[TOP]


Oral and Written Communications

Ability to write technical reports.

[TOP]


Social and Ethical Issues

None.

[TOP]


Theoretical Content

About 30% of class time is spent on a review of data mining theory.

About 30% of class time is spent on introduction to shell scripting.

About 30% of class time is spent on assessment methods for data miners.

Which leaves 10% for fun!

[TOP]


Problem Analysis

All students learn methods that are used to assess different implementations of data miners. Specific attention is paid to assessing accuracy, runtimes, and stability of the learnt theories. .

[TOP]


Assessment

Theory:
50% (exam in week 11)

Project:
50%

Students must do well in both theory and project to pass subject. To ensure this, the final mark will be twice the harmonic mean of the theory (T) and the project (P) mark. For example, suppose you scored a theory mark of 50 and you got a mark of 10 for your project mark. The mean of these two numbers is 30 but the harmonic mean is 16.7 (i.e. much lower).

Recall that the harmonic mean of two numbers P,T is 2*T*P/(T+P). Here's a plot comparing the twice the standard mean with twice the harmonic mean plotted for a theory mark of 50 and an project mark ranging from 1 to 65.

Note that when the marks are the same, the harmonic mean is the same as the standard mean mark. However a lower project mark pulls down the harmonic mean to something much less lower the average.

A students letter grade will correspond to their total score according to the following letter grade table.

 Letter Grade
  Score round(2*hmean(P,T))
 ------ ------------------ 
 A+    = 96..100
 A     = 93..95
 A-    = 90..92
 B+    = 86..89
 B     = 83..85
 B-    = 80..82
 C+    = 76..79
 C     = 73..75
 C-    = 70..72
 D+    = 66..69
 D     = 63..65
 D-    = 60..62
 E+    = 55..59
 E     = 50..54
 E-    = 45..49
 F     <  45

[TOP]


Lecture notes

Each week, please print and print to class a new set of lecture notes:

one |two |three |four |five |six |seven

[TOP]


Credits

Author

Tim Menzies , tim@menzies.us, http://menzies.us

Software

This page generated by Site: see http://www.cs.pdx.edu/~timm/dm/site.html

Acknowledgements

This site is built using PerlPod.

Style sheet switching method taken from Eddie Traversa's excellent and simple-to-apply tutorial: http://dhtmlnirvana.com/content/styleswitch/styleswitch1.html.

Search engine powered by ATOMZ http://www.atomz.com/search/. Note, the indexes to this site are only updated weekly (heh, its a free service- what more ja want?).

Icons on this site come from http://www.sql-news.de/rubriken/olap.asp and http://www.ifnet.it/webif/centrodi/eng/toolbar.htm.

The JAVA machine learners used at this site come from the extensive data mining libraries found in the University of Waikato's Environment for Knowledge Analysis (the WEKA) http://www.cs.waikato.ac.nz/ml/weka/

[TOP]


Legal

Copyright

Copyright (C) Tim Menzies 2004

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; see http://www.gnu.org/copyleft/gpl.html. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Disclaimer

The content from or through this web page are provided 'as is' and the author makes no warranties or representations regarding the accuracy or completeness of the information. Your use of this web page and information is at your own risk. You assume full responsibility and risk of loss resulting from the use of this web page or information. If your use of materials from this page results in the need for servicing, repair or correction of equipment, you assume any costs thereof. Follow all external links at your own risk and liability.

[TOP]