(no title)
TachyonicBytes | 8 months ago
I am also eyeing whisperX[2], because I want to play some more with speaker diarization.
Your use-case seems to be batch transcription, so I'd suggest you go ahead and just use whisperfile, it should work well on an M4 mini, and it also has an HTTP API if you just start it without arguments.
If you want more interactivity, I have been using Vibe[3] as an open-source replacement of SuperWhisper[4], but VoiceInk from a sibling comment seems better.
Aside: It seems that so many of the mentioned projects use whisper at the core, that it would be interesting to explicitly mark the projects that don't use whisper, so we can have a real fundamental comparison.
[1] https://huggingface.co/Mozilla/whisperfile
[2] https://github.com/m-bain/whisperX
levocardia|8 months ago
TachyonicBytes|8 months ago
There are two ways to parse your first sentence. Are you saying that you used whisperX and it doesn't do well with diarization? Because I am curious of alternative ways of doing that.
anonymousiam|8 months ago
I've run it on my ThinkPad P14s Gen 4, which doesn't have much of a GPU (Radeon 780M). It processes approximately in realtime.
satvikpendem|8 months ago
https://pccnect.fit.vutbr.cz/gradio-demo/
TachyonicBytes|8 months ago