top | item 45574021

(no title)

architectonic | 4 months ago

How much computing power would one need to get this working completely local running a half decent llm fine tuned to sound like santa with all tts, stt and the pipecat inbetween?

discuss

order

teaearlgraycold|4 months ago

I started looking into this with a Pi 5. It seemed like it was not quite performant enough. But I'm not an expert with these things and maybe someone else could make it work. We definitely have the technology to pull this off in this form factor. It would just be really expensive (maybe $500) and might also get a little hot.

Sean-Der|4 months ago

If I was building it to be 'local only' I would run the inference on a remote host in my house.

Having a microcontroller in the phone is nice because it is WAY less likely to break. I love being able to flash a simple firmware/change things would fighting it too much.

Oh! Also I do all the 'WebRTC/AI dev' in the browser. When I get it working how I like, then do I switch over to doing the microcontroller stuff.

kwindla|4 months ago

This repo is one possible starting point for tinkering with local agents on macOS. I've got versions of this for NVIDIA platforms but I tend to gravitate to using LLMs that are too big to fit on most NVIDIA consumer cards.

https://github.com/kwindla/macos-local-voice-agents

oofbey|4 months ago

More than you can physically fit in a phone like that. Many hundreds if not thousands of watts of GPU.

margalabargala|4 months ago

That's not true. You could run such an LLM on a lower end laptop GPU, or a phone GPU. Very low power and low space. This isn't 2023 anymore, a Santa-specific LLM would not be so intensive.

trenchpilgrim|4 months ago

I run LLMs and TTS capable of this on my laptop since last year