What is speculative decoding? Speculative decoding is an inference optimization that uses a fast, small "draft" model to quickly propose several future tokens, which are then verified in parallel by a larger, slower "target" model. This process significantly speeds up the
What is Speculative Decoding?
Speculative decoding is an inference optimization that uses a fast, small model to propose future tokens.