top | item 36229764

(no title)

knutzui | 2 years ago

Slightly off topic, but I could imagine that what you are alluding to regarding the expectation of certain words or phrases depending on the context of the conversation could be used to improve speech-to-text models. The speech could be parsed into multiple options which can ranked by a language model with the conversation context.

discuss

order

IanCal|2 years ago

Whisper takes a prompt as well, it would be a good idea to try that out.

fabiensnauwaert|2 years ago

It does and that's indeed Whisper I'm currently using. I do have mixed feelings about it:

- On the one hand, it performs well in so many cases… and having multilingual support built-in is great! - On the other hand: there's actually NO OPTION to Whisper to recognize just two languages (you either recognize ONE language or ANY language with it, which can cause issues depending on one's pronunciation and the language at hand.)

Will definitely turn OFF multilingual speech recognition by default, because the huge majority of negative reactions in this thread stem from this.