top | item 44768065

Show HN: Local audio transcription and speaker ID for Apple Silicon

2 points| vadiml | 7 months ago |github.com

  Built a tool combining MLX Whisper + pyannote for fast local audio transcription with speaker diarization on Apple Silicon.

  Key benefits: privacy-first (fully local), hardware-accelerated, automatic speaker identification, multiple output formats (TXT/SRT/JSON).

  Main technical challenge was making MLX Whisper and pyannote work together despite different audio processing - solved with preprocessing pipeline.

  Perfect for interviews, meetings, podcasts. Handles HuggingFace gated models with proper error handling.

1 comment

order

torstenvl|6 months ago

Surprised this didn't get more traction, as it's really interesting.

Is there a reason it's ASi-only? I don't know the technical details of MLX, whether it runs or can be run on other hardware, etc.

Also, why does the HF token need to be in an environment variable and passed on the command line?