Back to AI Pulse

How Claude learns to maximize reward

CS221 lecture discusses how Claude plans against perfect opponents.

how Claude learns to maximize a reward. this one: how it plans when something's trying to beat it. lecture 9 of CS221 is game playing minimax: assume your opponent plays perfectly, then pick the move that's least bad no matter what they do. it's the engine behind Deep Blue,

Source
How Claude learns to maximize reward | AI Pulse