Creating low-latency text-to-speech models

It's hard to make models with sub-300ms median ttft.

its so hard to make a text-to-speech model with such a sub-300ms median ttft (time to first token). big big congrats!

Creating low-latency text-to-speech models | AI Pulse