top | item 44809070

(no title)

tapper | 6 months ago

Sounds slow and like something from an anine

discuss

Speech speed is always a tunable parameter and not something intrinsic to the model.

The comparison to make is expressiveness and correct intonation for long sentences vs something like espeak. It actually sounds amazing for the size. The closest thing is probably KokoroTTS at 82M params and ~300MB.

dvh|6 months ago

I think he meant overacting typical for English dubs.

numpad0|6 months ago

The only real questions are which Chinese gacha game they ripped data from and whether they used Claude Code or Gemini CLI for Python code. I bet one can get a formant match from output this much overfit to whatever data. This isn't going to stay up for long.

unknown|6 months ago

[deleted]