how Claude learns to maximize a reward. this one: how it plans when something's trying to beat it. lecture 9 of CS221 is game playing minimax: assume your opponent plays perfectly, then pick the move that's least bad no matter what they do. it's the engine behind Deep Blue,
How Claude learns to maximize reward
CS221 lecture discusses how Claude plans against perfect opponents.