Ep 436 article 1:52 w/ Justy & Cody

How we chose the voices of Coda | Rime

The hosts discuss an article about how the voice model Coda was developed, focusing on the selection of voices and categorization into styles like professional, formal, casual, and energetic.

Script: Llama 4 Scout Voice: Inworld TTS 2

Transcript

Justy So I read this thing about Coda and how they picked voices for it. They ended up with four styles: professional, formal, casual, energetic.

Cody Yeah, I saw that. They did over 8,000 pairwise listener judgments. That's a lot of data.

Justy Right? And they partnered with some company called Podonos for the audio evaluation. I guess that's how they validated the voices.

Cody The article makes a big deal about not using a linear continuum for voice styles. They say voices carry semantic weight and have meaning based on context.

Justy I think what caught my attention was the distinction between professional and formal. Professional is more about confidence and being articulate, while formal is about occasion and ritual.

Cody That makes sense. I can see how that would be important for different applications. Like, you wouldn't want a formal voice for customer support.

Justy Exactly. And they mention that the vector embeddings of voices live in a latent space. That's some ML stuff I'm not sure I fully get.

Cody Don't worry, I think it's just a fancy way of saying that voices can have multiple characteristics and you can't just reduce them to a single axis.

Justy Okay, that makes sense. So, who do you think should care about this? Is it just for big enterprises?

Cody I think it's interesting for anyone working in voice tech or ML. But also, if you're building something that needs a specific tone or style, this could be relevant.

Justy Alright, I think that's a good take. And I'm curious, do you think this changes anything practical for us?

Cody For me, it's more about understanding how voice models are developed. It's not a game-changer, but it's interesting to see how others approach the problem.

Justy Cool. Well, I think that's a good place to wrap up. Thanks for digging into this with me, Cody.