top | item 38507608

(no title)

agarsev | 2 years ago

Well that's what we humans do, isn't it? :)

In any case, text seems to stil form a part:

> During training, we use monolingual speech-text datasets

So there's still a way till machines learn language as humans do, i.e. with sounds as primary modality. But nowadays I won't bet as to how long any ml task for language will take to be solved

discuss

agarsev|2 years ago

If I understood correctly, to me there seem to be two keys to the proposed method:

1) they use a single, shared embedding space for the two languages, forcing the model to learn "semantics" independently (or rather, interdependently) of language 2) using back-translation for training. I'm not sure that I got this right, but this seems to be round-trip translation? So the model can self-assess its performance by checking the spanish->english->spanish difference.

Sounds very promising and interesting! However, it seems they only tested on spanish and english. I wonder if the similarity of the languages at the lexical level made these results possible.

IanCal|2 years ago

I've wondered for years how far you could get just checking perplexity. English -> internal rep, and x-> internal rep. Then mapping between the internal reps such that English -> another language has low perplexity. That is, a sensible sentence in English should result in a sensible sentence in another language.