top | item 46442083

(no title)

ImPrajyoth | 2 months ago

oh absolutely. burning a coal plant to decide if i should close discord is peak 2025 energy. strictly speaking, using the local model (Ollama) is 'free' in terms of watts since my laptop is on anyway, but yeah, if the inefficiency is the art, I'm the artist.

discuss

fragmede|1 month ago

Running ollama to compute inference uses energy that wouldn't have been used if you weren't running ollama. There's no free lunch here.

redfloatplane|1 month ago

An interesting thought experiment - a fully local, off-grid, off-network LLM device. Solar or wind or what have you. I suppose the Mac Studio route is a good option here, I think Apple make the most energy efficient high-memory options. Back of the napkin indicates it’s possible, just a high up front cost. Interesting to imagine a somewhat catastrophe-resilient LLM device…

evilduck|1 month ago

Macs would be the most power efficient with faster memory but an AI Max 395+ based system would probably be the most cost efficient right now. A Framework Desktop with 128GB of shared RAM only pulls 400W (and could be underclocked) and is cheaper by enough that you could buy it plus 400W of solar panels and a decently large battery for less than a Mac Studio with 128GB of RAM. Unfortunately the power efficiency win is more expensive than just buying more power generation and storage ability.

ImPrajyoth|1 month ago

That is the endgame.

I think we are moving toward a bilayered compute model: The Cloud: For massive reasoning.

The Local Edge: A small, resilient model that lives on-device and handles the OS loop, privacy, and immediate context.

BrainKernel is my attempt to prototype that Local Edge layer. Its messy right now, but I think the OS of 2030 will definitely have a local LLM baked into the kernel.

bdhcuidbebe|2 months ago

> using the local model (Ollama) is 'free' in terms of watts since my laptop is on anyway

Now that’s a cursed take on power efficency

ImPrajyoth|2 months ago

efficiency is just a mindset. if i save 3 seconds of my own attention by burning 300 watts of gpu, the math works out in my favor!