Did research on accent, pronunciation improvement, phoneme recognition, kaldi ecosystem, etc … nothing really changed in the public domain past few years. There’s no even accurate open source dataset. All self claimedccc manually labelled dataset with 10k+ hours was partly done with automation. Next issue, model models operates in different latent space often with 50ms chunks while pronunciation assessment requires much better accuracy. Just try to say B loud - silent part gathering energy in the lips, loud part, and everything what resonates after. Worst part there are too many ml papers from the last year students or junior phd folks claiming success or fake improvements, etcThe article itself is just a vector projection in 3d space … the actual reality is much complex.
Any comments on pronunciation assessment models are greatly appreciated
oezi|4 months ago
adeptima|4 months ago