Table of Contents
Exploratory Data Analysis
: Objectives
Probability and statistics
background
Why do exploratory data
analysis?
A general model for
data, estimates, predictions
Considerations on data,
observations and predictions
Error characteristics
A measure of accuracy
- MSE
In terms of the error
Typical questions in
EDA
Data collection procedures
Some general characteristics
exhibited by environmental data
What dictates the characteristics
of the data?
Characteristics of interest
Characteristics of interest
(cont.)
Analysis tools
A short preview of the
tools
Summarizing univariate
data
Aside: Ranked data and
quantiles (or percentiles)
Ranked data and quantiles
(continued)
Classical measures of
location
Resistant measures of
location
Measures of variability
(spread)
Measures of symmetry
Measures of association
between variables
Pearson’s correlation
coefficient
Alternative measures
of association
Test for significance
of ?
Alternative measure
of association
Association in time
- serial correlation or autocorrelation
Transformation of data
Types of transformations
Some techniques for
visualizing the characteristics of data
Time series plots of
data
Annual streamflow data
Monthly precipitation
data
Daily streamflow data
Daily precipitation
data
Analyzing seasonal behavior
Visualizing seasonal
behavior
Representations of seasonal
data
Plots of seasonal descriptive
statistics
Example: Plot of seasonal
statistics
Box plots
Typical box plot features
Example box plot
Seasonal box plots
Computing Fourier series
Graphical representation
of a univariate distribution
Box plots
Example: Box plot
Histograms
Histograms (cont.)
Determination of the
number of intervals (bins)
Determination of the
relative frequency
Example: Histogram
Example: Probability
plot
Graphical representation
of relationship between variables
More on the nature of
the association
Functional forms of
association
Nonlinear functional
relationships
Example: Scatterplot
Associations among several
variables
Example: SPLOM |