Machine Learning

PSU CS410/510GAMES Lecture 7
May 16, 2002

More probability: perfect Yahtzee
- Rules
  - Yahtzee
  - Yacht
- One-player, maximize score
  - Retrograde analysis
  - Nonlinearity
  - Avoiding recomputation
- More sophisticated
  - One-player, maximize winning chance
  - Two-player, maximize winning chance
Machine learning of games
- Goal: learn state values (vs. learn moves)
- Generic problems: opponent modeling, overtraining, feature selection
- TD-Gammon: neural nets for evaluation
  - Gerry Tesauro, IBM: play random backgammon with neural net evaluator
  - Gradient descent weight propagation with TD(lambda) assigns errors to positions
  - Eventually used hand-coded features plus shallow search to improve play
  - Learning "doubling" is hard
- Michael Buro: three strategies
  - Multi-ProbCut: learn reasonable AB windows
  - Opening Book: don't lose the same way twice
    - Idea: use PVS, but with leaves leading to demonstrated win/loss labeled with infinite scores. Win for side is loss for other side...
    - Learning: update the book as position values are "found". (Can always find a "best" position to play for unless tree value is -inf.)
    - Problems
      - Bad opponents mean bad Ws, bad plays means bad Ls
      - Expensive to build an opening book this way
  - GLEM: learning eval functions
    - Eval function: g(sum(w[i] * val(c[i]))) where c[i] = and(r[j])
    - Uses gradient descent to adjust weights: effectively "one-neuron net"
    - Features c[i] and relations r[i] are given by implementor
    - Strategy: get endgame weights and back weight assignments up toward opening
    - Problems: expense, overtraining, weight assignment
- GAs for evaluation