its so hard to make a text-to-speech model with such a sub-300ms median ttft (time to first token). big big congrats!
Creating low-latency text-to-speech models
It's hard to make models with sub-300ms median ttft.
It's hard to make models with sub-300ms median ttft.
its so hard to make a text-to-speech model with such a sub-300ms median ttft (time to first token). big big congrats!