Show HN: Open-source macOS AI copilot using vision and voice
430 points| ralfelfving | 2 years ago |github.com | reply
It's pretty simple:
- Use a keyboard shortcut to take a screenshot of your active macOS window and start recording the microphone.
- Speak your question, then press the keyboard shortcut again to send your question + screenshot off to OpenAI Vision
- The Vision response is presented in-context/overlayed over the active window, and spoken to you as audio.
- The app keeps running in the background, only taking a screenshot/listening when activated by keyboard shortcut.
It's built with NodeJS/Electron, and uses OpenAI Whisper, Vision and TTS APIs under the hood (BYO API key).
There's a simple demo and a longer walk-through in the GH readme https://github.com/elfvingralf/macOSpilot-ai-assistant, and I also posted a different demo on Twitter: https://twitter.com/ralfelfving/status/1732044723630805212
[+] [-] e28eta|2 years ago|reply
I was skimming through the video you posted, and was curious.
https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s
code link: https://github.com/elfvingralf/macOSpilot-ai-assistant/blob/...
[+] [-] ralfelfving|2 years ago|reply
I suspect OSX vs macOS has marginal impact on the outcome :)
[+] [-] hot_gril|2 years ago|reply
[+] [-] jondwillis|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
I prefer speaking over typing, and I sit alone, so probably won't add a text input anytime soon. But I'll hit you up on Discord in a bit and share notes.
[+] [-] tomComb|2 years ago|reply
That would be great for people with Mac mini who don't have a mic.
[+] [-] faceless3|2 years ago|reply
https://github.com/samoylenkodmitry/Linux-AI-Assistant-scrip...
F1 - ask ChatGPT API about current clipboard content F5 - same, but opens editor before asking num+ - starts/stops recording microphone, then passes to Whisper (locally installed), copies to clipboard
I find myself rarely using them however.
[+] [-] ralfelfving|2 years ago|reply
[+] [-] Art9681|2 years ago|reply
EDIT: I checked again and it seems the pricing is comparable. Good stuff.
[+] [-] ralfelfving|2 years ago|reply
Right now there's also a daily API limit on the Vision API too that kicks in before it gets really bad, 100+ requests depending on what your max spend limit is.
[+] [-] krschacht|2 years ago|reply
https://news.ycombinator.com/item?id=38244883
There are some pros and cons to that. I’m intrigued by your stand-alone MacOS app.
[+] [-] hackncheese|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] poorman|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] thomashop|2 years ago|reply
I can see how much time it will save me when I'm working with a software or domain I don't know very well.
Here is the video of my interaction: https://www.youtube.com/watch?v=ikVdjom5t0E&feature=youtu.be
Weird these negative comments. Did people actually try it?
[+] [-] ralfelfving|2 years ago|reply
I sent him your video, hopefully he'll believe me now :)
[+] [-] mikey_p|2 years ago|reply
"Here's a list of effects. Here's a list of things that make a song. Is it good? Yes. What about my drum effects? Yes here's the name of the two effects you are using on your drum channel"
None of this is really helpful and I can't get over how much it sounds like Eliza.
[+] [-] pelorat|2 years ago|reply
So... beware when you use it.
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] rchaves|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] zmmmmm|2 years ago|reply
[+] [-] paulmedwards|2 years ago|reply
[+] [-] ukuina|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] qup|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] behat|2 years ago|reply
Thanks for sharing!
[+] [-] ralfelfving|2 years ago|reply
[+] [-] I_am_tiberius|2 years ago|reply
[+] [-] dave1010uk|2 years ago|reply
1. Download LLaVA from https://github.com/Mozilla-Ocho/llamafile
2. Run Whisper locally for speech to text
3. Save screenshots and send to the model, with a script like https://til.dave.engineer/openai/gpt-4-vision/
[+] [-] trenchgun|2 years ago|reply
[+] [-] dekhn|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
The pilot name comes from Microsoft's use of "Copilot" for their AI assistant products, and I tried to play on it with macOSpilot which is maco(s)pilot. I think that naming has completely flown over everyone's heads :D
[+] [-] smcleod|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] hmottestad|2 years ago|reply
Do you know of a simple setup that I can run locally with support for both images and text?
[+] [-] kssreeram|2 years ago|reply
[1]: https://iris.fun/
[+] [-] LeoPanthera|2 years ago|reply
[+] [-] mdrzn|2 years ago|reply
[+] [-] d4rkp4ttern|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] pyryt|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] quinncom|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply
[+] [-] ralfelfving|2 years ago|reply