Slightly off topic, but I could imagine that what you are alluding to regarding the expectation of certain words or phrases depending on the context of the conversation could be used to improve speech-to-text models. The speech could be parsed into multiple options which can ranked by a language model with the conversation context.
IanCal|2 years ago
fabiensnauwaert|2 years ago
- On the one hand, it performs well in so many cases… and having multilingual support built-in is great! - On the other hand: there's actually NO OPTION to Whisper to recognize just two languages (you either recognize ONE language or ANY language with it, which can cause issues depending on one's pronunciation and the language at hand.)
Will definitely turn OFF multilingual speech recognition by default, because the huge majority of negative reactions in this thread stem from this.