(no title)
tmjdev
|
1 year ago
While it is impressive and I like to follow the advancements in this field, it is incredibly frustrating to listen to. I can't put my finger on why exactly. It's definitely closer to human-sounding, but the uncanny valley is so deep here that I find myself thinking "I just want the point, not the fake personality that is coming with it". I can't make it through a 30s demo.
swatcoder|1 year ago
We may not know that a given speaker is a GenX Methodist from Wisconsin that grew up at skate parks in the suburbs, but we hear clusters of speech behavior that lets our brain go "yeah, I'm used to things fitting together in this way sometimes"
These don't have that.
Instead, they seem to mostly smudge together behaviors that are just generally common in aggregate across the training data. The speakers all voice interrupting acknowledgements eagerly, they all use bright and enunciated podcaster tone, they all draw on similar word choice, etc -- they distinguish gender and each have a stable overall vocal tone, but no identity.
I don't doubt that this'll improve quickly though, by training specific "AI celebrity" voices narrowed to sound more coherent, natural, identifiable, and consistent. (And then, probably, leasing out those voices for $$$.)
As a tech demo for "render some vague sense of life behind this generated dialog" this is pretty good, though.
adamhartenz|1 year ago
TimTheTinker|1 year ago
lancesells|1 year ago
beoberha|1 year ago
And I say all that completely slackjawed that this is possible.
echelon|1 year ago
Imagine being stuck on a call with this.
> "Hey, so like, is there anything I can help you with today?"
> "Talk to a person."
> "Oh wow, right. (chuckle) You got it. Well, before I connect you, can you maybe tell me a little bit more about what problem you're having? For example, maybe it's something to do with..."
kelseyfrog|1 year ago
amelius|1 year ago
I bet that if you select a British accent you will get fewer of them.
hyperific|1 year ago
ukuina|1 year ago
lokimedes|1 year ago
xnx|1 year ago
onion2k|1 year ago
iNic|1 year ago
JoblessWonder|1 year ago
rob|1 year ago
Veen|1 year ago
semitones|1 year ago
MrSkelter|1 year ago
kaibee|1 year ago
Listening to this on 1.75x speed is excellent. I think the generated speaking speed is slow for audio quality, bc it'd be much harder to slow-down the generated audio while retaining quality than vice versa.
moralestapia|1 year ago
A lot of people are just like that IRL.
They cannot just say "the food was fine", it's usually some crap like "What on earth! These are the best cheese sticks I've had IN MY EN TI R E LIFE!".
shermantanktop|1 year ago
Cthulhu_|1 year ago
pmontra|1 year ago
The problem is that people talking over each other is not a format I long to listen to.
narag|1 year ago
Please don't think that I'm trying to suggest... anything . It's just that I'm getting used to read this pattern in the output of LLMs. "While this and that is great...". Maybe we're mimicking them now? I catch myself using these disclaimers even in spoken language.
tmjdev|1 year ago
nl|1 year ago
In general people find the back and forth between the "hosts" engaging and also gives people time to digest the contents.
ljf|1 year ago
This is good, but certainly not yet great.
jeksicjjdjisos|1 year ago
pvarangot|1 year ago
yapyap|1 year ago
unknown|1 year ago
[deleted]
chrismorgan|1 year ago
In similar vein, I’m glad they told me it was a funny story, because otherwise I wouldn’t have known.
vel0city|1 year ago
unknown|1 year ago
[deleted]
gwbas1c|1 year ago