Machine Learning
PSU CS410/510GAMES
Lecture 7
May 16, 2002
- More probability: perfect Yahtzee
- Rules
- One-player, maximize score
- Retrograde analysis
- Nonlinearity
- Avoiding recomputation
- More sophisticated
- One-player, maximize winning chance
- Two-player, maximize winning chance
- Machine learning of games
- Goal: learn state values (vs. learn moves)
- Generic problems: opponent modeling, overtraining,
feature selection
- TD-Gammon: neural nets for evaluation
- Gerry Tesauro, IBM: play random backgammon
with neural net evaluator
- Gradient descent weight propagation with
TD(lambda) assigns errors to positions
- Eventually used hand-coded features plus
shallow search to improve play
- Learning "doubling" is hard
- Michael Buro: three strategies
- Multi-ProbCut: learn reasonable AB windows
- Opening Book: don't lose the same way twice
- Idea: use PVS, but with leaves leading to
demonstrated win/loss labeled with infinite
scores. Win for side is loss for other side...
- Learning: update the book as position values are
"found". (Can always find a "best" position to
play for unless tree value is -inf.)
- Problems
- Bad opponents mean bad Ws,
bad plays means bad Ls
- Expensive to build an opening book this way
- GLEM: learning eval functions
- Eval function: g(sum(w[i] * val(c[i])))
where c[i] = and(r[j])
- Uses gradient descent to adjust weights:
effectively "one-neuron net"
- Features c[i] and relations r[i]
are given by implementor
- Strategy: get endgame weights and back weight
assignments up toward opening
- Problems: expense, overtraining, weight assignment
- GAs for evaluation