Machine Learning
PSU CS441/541
Lecture 8
November 13, 2000
- Plan For Today
- Machine learning
- HW discussion
- Project discussion
- Machine Learning
- AI goal: replace human programming with
``self-programming''
- The example: infants
- language skills
- motor skills
- other behaviors
- Usual dichotomy:
- algorithmic/heuristic ``tricks''
- simulate human behavior (infant brain)
- ML and Systems
- Data flow in ML systems
- Data complexity
- Boolean
- discrete
- continuous
- ML system evaluation
- statistical significance
- negative and positive examples
- Type I (false-positive) and Type II (false-negative)
error
- overtraining and overfitting
- Modes Of Learning
- ``Discovery'' learning and ``generalization'' learning
(ala Ginsberg)
- AM
- sets/LISP
- prime numbers
- Goldbach's Conjecture
- maximally factorable numbers
- Eurisko and games
- TD-Gammon (more later)
- relation between discovery learning and
generalization learning
- Deductive learning: concluding things
from principles
- theorem proving
- knowledge compilation
- Inductive learning
- Evaluating Learning: PAC
- Probably Approximately Correct: Valiant 1984
- error(x) = [p(x) and not ~p(x)] or
[not p(x) and ~p(x)]
- ~p is approximately correct iff pr(error) <= e
- Given (sum x in U | pr(x)) = 1, we have
- ~p is approximately correct iff
(sum x in U s.t. error(x) | pr(x)) <= e
- L is probably approximately correct
iff pr(L learns ~p and pr(error) > e)
- Training set size
- hypothesis space bias: will pick too-simple
hypothesis given insufficient data (ouch: Occam's Razor!)
- Given H possible concepts in language, plus
desired d and e, can show that
ln(H/d)/e training examples sufficient
- Basic ML Techniques
- Deductive ``KR-based'': Explanation-Based Learning
- book e.g.: given can derive
holds(loc(Sun-City), result(drive(Phoenix,
Sun-City), result(fly(Phoenix), s')))
in some database
- might remember
near(x,y) and airport(x) implies
holds(loc(y), result(drive(x,y),
result(fly(x), s')))
- basic idea: cache most general derivable
version of result in case similar query later
- Version Spaces
- want to exactly capture binary distinction
with conjunctive prop formula
- note hierarchy of concepts
- keep lub and glb of training set in hierarchy
- convergence implies unique concept
- Decision Trees and ID3
- want to probably capture distinction with
binary decision tree
- consider taxonomy
- restrict to boolean: positive and negative instances
- select characteristic that best differentiates
positive and negative
- for each sub-category, go again
- will eventually mostly correctly characterize all
training data (remember PAC)
- actual deal: select next feature in tree according
to min
G = (%Vf+) * [ -(%U+) log (%U+) - (%U-) log (%U-) ] +
(%Vf-) * [ -(%U+) log (%U+) - (%U-) log (%U-) ]
where %Vf,%Vf- is the fraction of possible
positive and negative values at this point, and
%U+,%U- is the fraction of actual
positive and negative instances in the sample at
this point
- can continue until overfitting occurs!
- Neural Nets
- neuron threshold: (sum i | x[i] - h[i] > 0.5)
- may replace > (expensive) with cheaper
and differentiable nonlinear function
- need at least two layers
- how to compute weights?
- training v. reinforcement?
- in any case, error backpropagation
via gradient descent on squared error
- consider threshold case
- assume E = (t - o)^2 where
E is error, t is target
response, o is output response
- change h'[i] - h[i] = a * dE/dh[i] = - a * x[i]
* E
- Genetic Algorithms
- another biological model: evolution
- basic approach
- select feature set
- construct random ``genomes'' corresponding
to features being considered positive, negative
- repeatedly
- select highest-scoring genomes on training data
- create new genomes by mixing genes from sets
of selected genomes
- randomly
- ``crossover'' on fixed sequences
- ???
- randomly mutate some of the genes of new genomes
- stop when sufficiently good fit reached (PAC)
- advantages
- simple to do, understand
- requires little understanding of problem
- disadvantages
- evolution is slow!
- details of scheme can still matter
- ML Example: Neural Net Backgammon
- What is backgammon?
- Why is BG hard?
- Gerry Tesauro 199x: TD(lambda) NN reinforcement learning
can learn to play good BG from rules alone!
- Add shallow search for tactics, tune: best BG player ever!
- Humans learn BG from TD(lambda), close circle
- Go?
- CS Ethics: Neural Net Actuarials?
- Can build a neural net that
- given training data about insurance payoffs
- predicts expected cost of policy to insurance co.
- Ethically, must use only socially sanctioned data, e.g. not race!
- But
- net can infer race from e.g. eye color, average
salary, traffic stops by location
- can infer from given set of inputs?
- net is nondeclarative: how does it use the
inputs?
- Net may inadvertantly ``discriminate'' based on true
costs
- Worse, malicious person may adjust input choice and training
data (even weights?) to discriminate
- Conclusions?