The next trick would be to create a matrix that is the closest to all of the languages and doesn't have gaps from missing words. Using English as the identity is probably a bit ethnocentric -- another reification of the three-percent problem of literary translation: http://www.rochester.edu/College/translation/threepercent/ .Maybe we can call this new matrix Mondoshawan?
sls56|8 years ago
As you suggest potentially this mean language is itself really high quality word vectors; but we haven't looked at this yet...