Last updated at 05:30 PM on 19 Jan 2010
This page provides supplementary information to support lectures in ME 488 during Fall 2009. The notes are presented in reverse chronological information, i.e. the most recent lecture is listed first.
Here is a short list of books, in addition to the textbook we are using for the course, that I consult when preparing notes or attempting to reduce my own confusion of some aspect of DOE.
If you could only afford one of these books (and they are expensive), I would consider finding a used copy of Montgomery. The current edition sells for $130 at Amazon (Nov. 2009). The second book should be Box, Hunter and Hunter. Both of those books should be available at a library.
Lecture on 23 November 2009
Reading: pp. pp. 513 -- 523
Homework: TBA
Download the MINITAB Instructions as MS Word or PDF
Download the Laundry2.txt and Laundry3.txt data files.
Lecture on 9 November 2009
Reading: pp. 395 -- 398 (review), pp. 513 -- 523
Class notes on Type I and Type II errors are more extensive than pp. 395 - 398 in the textbook. The following image shows two possible outcomes for a hypothesis test on a population mean with the standard deviation is known. [The image is from Paul Mathews, Design of Experiments with MINITAB, 2005, ASQ Press, Milwaukee, p. 86]
The textbook does not describe procedures for selecting the sample size to obtain a desired power.
To choose a sample size to obtain a desired power, you will need to use a procedure that is specific to the hypothesis test. The following procedure is generic.
Use the Stat menu:
Stat --> Power and Sample size --> [desired hypothesis test]
Today's lecture notes were created from several references. The example of how beta arises in a test of population means was taken from Paul Mathews, Design of Experiments with MINITAB, 2005, ASQ Press, Milwaukee, WI [Link to Google Books], [Link to Amazon]
The following list provides links to free, on-line references.
See pp. 513 - 523 in the textbook
Also from Paul Mathews, Design of Experiments with MINITAB (Section 6.7, p. 215): Two-way ANOVA can be performed in MINITAB via three different menus
Lecture on 19 October 2009
Reading: pp. 367 --374 (review), pp. 398 -- 412
After class I created a short set of slides that lists the main calculation formula in a one-way ANOVA.
During class my attempts to demonstrate the randomization of data and (later) a "manual" ANOVA calculation were thwarted by mysterious behavior of Excel. Several important commands from the ribbon bar were simply not responding to mouse clicks.
with a completed ANOVA calculation
The randomizeDemo
code listed below shows how you can randomize the order of tests
with MATLAB. Note that the randomize
function works
with any vector of test values, so it can be reused in other projects.
You can
download the randomizeDemo.m function.
The one-way ANOVA of the cotton thread data is computed with the following MATLAB codes and CSV data file
ANOVAdemo.m
fp.m
(StatBox)
fq.m
(StatBox)
betaq.m
(StatBox)
Montgomery_Cotton_Data_set.csv
The fp
, fq
, and betaq
functions are from the
StatBox toolbox.
randomizeDemo.m
function randomizeDemo % randomizeDemo Example of randomizing the order of a single-factor test % Use example problem from data from D.C. Montgomery, % Design and Analysis of Experiments, 5th ed., 2001, % Wiley, New York. See Chapter 3, pp. 60-62 % % No inputs to the main program% -- Set up the trial data nreps = 5; % number of repetitions per treatment reps = ones(1,nreps); % Temporary vector to create list of treatments
% -- CWP is the Cotton weight percents, SID is a list of labels (IDs) CWP = [ 15reps, 20reps, 25reps, 30reps, 35*reps]; SID = 1:length(CWP);
% -- Randomize the test order [CWPrand,SIDrand] = randomize(CWP,SID); fprintf('n Test Sample Cotton WeightnSequence ID Percentn'); for i=1:length(CWPrand) fprintf('%4d %6d %6dn',i,SIDrand(i),CWPrand(i)); end
% =============================================== function [xrand,idrand] = randomize(x,id) % randomize Randomize a vector of test conditions x with corresponding IDs % % Synopsis: xrand = randomize(x) % [xrand,idrand] = randomize(x) % [xrand,idrand] = randomize(x,id) % % Input: X = vector of values to be put in random order % id = optional vector if IDs for the x data % % Output: xrand = values of the x vector in random order % idrand = optional vector of ID values for to the elements in x % If no id vector is supplied, but idrand is expected as % a return value, generate the IDs as sequential integers
% -- Generate a random vector, sort it, and save the sort order. irand = randperm(length(x)); % randomized list of integers [junk,isort] = sort(irand); % isort is the sort order for the integers xrand = x(isort);
% -- Sort IDs if user either supplies IDs or asks for IDs if nargin>1 idrand = id(isort); % If IDs were supplied, sort them too elseif nargout>1 % No IDs were supplied, generate some & then sort id = 1:length(x); % Generate sequential IDs idrand = id(isort); % and sort them in the same order as xrand end
Lecture on 12 October 2009
Reading: pp. 367 --374 (review), pp. 398 -- 412
We didn't get to the F-test, but you can download my notes.
Lecture on 5 October 2009
Reading: Chapter 9
Read Chapter 8, paying attention to section 8.3. The computational formula is in Equation 8.2 on page 373.
Key points:
On pages 400-401, Levine et al list the steps in Hypothesis testing. A more compact list from Ayub and McCuen, was discussed on class. The Ayub and Mccuen list is presented here (without all of the documentation)
The solution to the Comprehensive Problem on Problem set #1 was presented in class.
Lecture on 28 September 2009
I didn't hand it out, but I displayed the ME 488 calendar for Fall 2009
At the end of class I began a MINITAB demonstration of analyzing my commuting time data. The two data files are commuteTimesHomeToPSU.csv and commuteTimesPSUToHome.csv
It is important that you remember your basic statistics. Accordingly you should review Chapters 1, 3, and 5 from the textbook.
While doing the homework, a student asked about outliers.
An outlier is a value in a sample that is significantly different from other values in that sample. Treatment of outliers can be problematic. A normal distribution contains all values from minus infinity to plus infinity. Values in the tails of the distribution are not likely to be observed, but they exist.
What should you do when you have a sample (i.e. a finite set of values drawn from a population) that contains one or more values that don't seem to belong? If you include the outlier(s), the sample statistics will have a larger dispersion (larger variance, larger inter-quartile range, etc.) than if the outliers are excluded. However, should you eliminate the outlier just to make your data look good?
The following discussion only applies to the display of data with a box-and-whisker plot. We will need to separately address the treatment of outliers in the analysis of data.
In a box-and-whisker plot, outliers are defined in terms of the inter-quartile range, IQR = Q_{3} - Q_{1}, where Q_{1} is the first quartile of the sample and Q_{3} is the third quartile. The outliers are identified by symbols (circles or asterisks) that lie beyond the range of the whiskers. This is just a convention and only affects the display of the data.
In a box-and-whisker plot, an outlier is defined as follows
Greater than Q_{3} + 1.5×IQR
or
Less than Q_{1} - 1.5×IQR
If outliers are identified by either of these criteria, then x_{max} and/or x_{min} are recomputed. The rules for identifying outliers (in the box-and-whisker) plot are not applied recursively, i.e. once the outliers have been identified and x_{max} and/or x_{min} are recomputed, the test for outliers is not applied again.
The following diagram is a summary of the box-and-whisker symbols for data with and without outliers. This diagram differs in its appearance from a MINITAB plot with outliers, but the basic layout is the same.