Machine Learning

PSU CS441/541 Lecture 8
November 13, 2000

Plan For Today
- Machine learning
- HW discussion
- Project discussion
Machine Learning
- AI goal: replace human programming with ``self-programming''
- The example: infants
  - language skills
  - motor skills
  - other behaviors
- Usual dichotomy:
  - algorithmic/heuristic ``tricks''
  - simulate human behavior (infant brain)
ML and Systems
- Data flow in ML systems
  - supervised
  - reinforcement
- Data complexity
  - Boolean
  - discrete
  - continuous
- ML system evaluation
  - statistical significance
  - negative and positive examples
  - Type I (false-positive) and Type II (false-negative) error
  - overtraining and overfitting
Modes Of Learning
- ``Discovery'' learning and ``generalization'' learning (ala Ginsberg)
  - AM
    - sets/LISP
    - prime numbers
    - Goldbach's Conjecture
    - maximally factorable numbers
    - Eurisko and games
  - TD-Gammon (more later)
  - relation between discovery learning and generalization learning
- Deductive learning: concluding things from principles
  - theorem proving
  - knowledge compilation
- Inductive learning
  - supervised
  - reinforcement
Evaluating Learning: PAC
- Probably Approximately Correct: Valiant 1984
  - error(x) = [p(x) and not ~p(x)] or [not p(x) and ~p(x)]
  - ~p is approximately correct iff pr(error) <= e
  - Given (sum x in U | pr(x)) = 1, we have
  - ~p is approximately correct iff (sum x in U s.t. error(x) | pr(x)) <= e
  - L is probably approximately correct iff pr(L learns ~p and pr(error) > e)
- Training set size
  - hypothesis space bias: will pick too-simple hypothesis given insufficient data (ouch: Occam's Razor!)
  - Given H possible concepts in language, plus desired d and e, can show that ln(H/d)/e training examples sufficient
Basic ML Techniques
- Deductive ``KR-based'': Explanation-Based Learning
  - book e.g.: given can derive
    holds(loc(Sun-City), result(drive(Phoenix, Sun-City), result(fly(Phoenix), s')))
    in some database
  - might remember
    near(x,y) and airport(x) implies holds(loc(y), result(drive(x,y), result(fly(x), s')))
  - basic idea: cache most general derivable version of result in case similar query later
- Version Spaces
  - want to exactly capture binary distinction with conjunctive prop formula
  - note hierarchy of concepts
  - keep lub and glb of training set in hierarchy
  - convergence implies unique concept
- Decision Trees and ID3
  - want to probably capture distinction with binary decision tree
  - consider taxonomy
    - restrict to boolean: positive and negative instances
    - select characteristic that best differentiates positive and negative
    - for each sub-category, go again
    - will eventually mostly correctly characterize all training data (remember PAC)
    - actual deal: select next feature in tree according to min
      G = (%Vf+) * [ -(%U+) log (%U+) - (%U-) log (%U-) ] +
      (%Vf-) * [ -(%U+) log (%U+) - (%U-) log (%U-) ]
      where %Vf,%Vf- is the fraction of possible positive and negative values at this point, and %U+,%U- is the fraction of actual positive and negative instances in the sample at this point
    - can continue until overfitting occurs!
- Neural Nets
  - neuron threshold: (sum i | x[i] - h[i] > 0.5)
  - may replace > (expensive) with cheaper and differentiable nonlinear function
  - need at least two layers
  - how to compute weights?
    - training v. reinforcement?
    - in any case, error backpropagation via gradient descent on squared error
      - consider threshold case
      - assume E = (t - o)^2 where E is error, t is target response, o is output response
      - change h'[i] - h[i] = a * dE/dh[i] = - a * x[i] * E
- Genetic Algorithms
  - another biological model: evolution
  - basic approach
    - select feature set
    - construct random ``genomes'' corresponding to features being considered positive, negative
    - repeatedly
      - select highest-scoring genomes on training data
      - create new genomes by mixing genes from sets of selected genomes
        
        randomly
        ``crossover'' on fixed sequences
        ???
      - randomly mutate some of the genes of new genomes
    - stop when sufficiently good fit reached (PAC)
  - advantages
    - simple to do, understand
    - requires little understanding of problem
  - disadvantages
    - evolution is slow!
    - details of scheme can still matter
ML Example: Neural Net Backgammon
- What is backgammon?
- Why is BG hard?
- Gerry Tesauro 199x: TD(lambda) NN reinforcement learning can learn to play good BG from rules alone!
- Add shallow search for tactics, tune: best BG player ever!
- Humans learn BG from TD(lambda), close circle
- Go?
CS Ethics: Neural Net Actuarials?
- Can build a neural net that
  - given training data about insurance payoffs
  - predicts expected cost of policy to insurance co.
- Ethically, must use only socially sanctioned data, e.g. not race!
- But
  - net can infer race from e.g. eye color, average salary, traffic stops by location
  - can infer from given set of inputs?
  - net is nondeclarative: how does it use the inputs?
- Net may inadvertantly ``discriminate'' based on true costs
- Worse, malicious person may adjust input choice and training data (even weights?) to discriminate
- Conclusions?