top | item 45586787

(no title)

adeptima | 4 months ago

Did research on accent, pronunciation improvement, phoneme recognition, kaldi ecosystem, etc … nothing really changed in the public domain past few years. There’s no even accurate open source dataset. All self claimedccc manually labelled dataset with 10k+ hours was partly done with automation. Next issue, model models operates in different latent space often with 50ms chunks while pronunciation assessment requires much better accuracy. Just try to say B loud - silent part gathering energy in the lips, loud part, and everything what resonates after. Worst part there are too many ml papers from the last year students or junior phd folks claiming success or fake improvements, etc

The article itself is just a vector projection in 3d space … the actual reality is much complex.

Any comments on pronunciation assessment models are greatly appreciated

discuss

oezi|4 months ago

You are right and I don't think incentives exist to solve the issues you describe, because currently many of the building blocks people are building are aligned to erase subtleaccent differences: the neural codecs, transcription systems such as whisper want to output clean/compressed representations of their inputs.

adeptima|4 months ago

100% agree