Back to AI Pulse

Optimizing multi-epoch pretraining

Scaling multi-epoch pretraining efficiently with limited data.

1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs? Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget. w/

Source
Optimizing multi-epoch pretraining | AI Pulse