top | item 34478891

(no title)

jafarlihi | 3 years ago

Wouldn't this be too resource-intensive?

discuss

Const-me|3 years ago

Here's my D3D11 implementation of speech-to-text https://github.com/Const-me/Whisper With medium model it needs 1.43 GB of assets, 2 GB of VRAM, and on gaming GPUs works at 10x realtime speed. These performance figures might be good enough for modern videogames. BTW, the model understands almost 100 spoken languages and can translate them to English.

sebzim4500|3 years ago

You wouldn't be able to run locally, but these models are pretty cheap to run assuming you batch everything. You wouldn't want to use it for a F2P game, but for a subscription game (order of a few dollars a month) it would not be prohibitively expensive.