top | item 37542025

Ask HN: Looking for a 24-7 Real-Time Voice Transcription Tool

11 points| 8ta4 | 2 years ago

I'm on the hunt for a voice transcription tool that can operate continuously, in real-time, 24/7, to enhance my workflow. I need something that doesn't require constant starting and stopping.

I've looked into a few options, but none of them seem to provide the non-stop, real-time transcription I'm after:

- Siri/Google Assistant: They're great for short dictations, but they don't offer continuous transcription. - Otter.ai: This tool provides real-time transcription for meetings and interviews, but using it 24/7 could get expensive. - Rev Voice Recorder: It lacks real-time capabilities and needs to be manually activated. - The NSA: I won't be able to pass their security clearance because of that "classified" recording of my ex in bed... snoring like a freight train all night.

Before I decide to build or modify a tool myself, I thought I'd ask here:

Does anyone know of a tool that can provide 24/7 transcription?

I've started sketching out some initial ideas here:

https://github.com/8ta4/say

But my main goal is to find out if such a tool already exists so I don't end up reinventing the wheel.

14 comments

tikkun|2 years ago

I'm interested in the same thing and have spent quite a bit of time looking.

Rewind.ai is ok (transcription accuracy is meh)

Voice Memos.app is ok (though no native transcription, and requires stopping and starting)

Otter.ai is ok (though there's a 4 hour limit on recordings, and there's no paid plan that allows for enough recording minutes to do 24/7)

My ideal solution would be that Otter comes out with a Pro 24/7 plan with 60,000 minutes per month and no max recording length, for $60-80/mo.

I would pay for this and have paid for alternatives, though I'd prefer to use an existing company that I've used for a while and that has lots of users, due to privacy/trust, or perhaps a small startup that publishes security reports and does everything on device.

As an aside:

I use 24/7 voice transcription as a kind of "extended context window" (to use an LLM analogy). While I'm working, I talk out loud to myself about what I'm thinking through, which I find allows me to effectively increase my working memory size to be much larger than otherwise. It's quite helpful.

8ta4|2 years ago

Do you think an open-source solution that only uses Deepgram API and does not store any recordings would satisfy your privacy requirements?

How many hours per day or month do you actively use speech recognition?

60,000 minutes per month. I had to double-check my calculations. It seems you've found a 30th hour in your day.

Let me give you some context:

I saw your blog post about Deepgram. They charge $0.0059 per minute for pay-as-you-go.

- If you use it 24/7, it costs:

    - $8.496 per day

    - $254.88 per month

- If you use it 8 hours a day (with voice activity detection), it costs:

    - $2.832 per day

    - $84.96 per month

I know the 24/7 cost is too high for your budget ($60-80 a month). But voice activity detection can save you a lot of money.

About privacy and trust, open-sourcing the solution might give you some confidence. Deepgram is backed by YC and has many users, which might also make you feel better.

ginkoutest|2 years ago

Out of curiosity, what do you then do with the transcriptions? You said you typically talk out loud while working, but do you continue working like normal after the transcription is recorded for later use, or do you interrupt your workflow to do something specific with the transcription immediately after?

wyldfire|2 years ago

> Before I decide to build or modify a tool myself, I thought I'd ask here:

if you do decide to, start with ggml/whisper

its-summertime|2 years ago

https://github.com/abb128/LiveCaptions comes to mind: The libraries and models for it are easily available for reworking it to be how you want, and can run 24/7 if you don't mind the cpu usage

8ta4|2 years ago

Thank you! With the CPU usage, it seems like every season will be summertime.

solardev|2 years ago

Can you just use the speech recognition built into your OS? I think macos, windows, android, and iOS all have that built in these days?

8ta4|2 years ago

Can Windows, Android, or iOS dictation run continuously 24/7?

Actually, I'm using the built-in macOS speech recognition to answer your que

noman-land|2 years ago

You can do this via command line with whisper.cpp.