Cost wise it does not seem very effective. .5 token / sec (the optimized one) is 3600 tokens an hour, which costs about 200-300 watts for an active 3090+system. Running 3600 tokens on open router @.4$ for llama 3.1 (3.3 costs less), is about $0,00144. That money buys you about 2-3 watts (in the Netherlands).Great achievement for privacy inference nonetheless.
teo_zero|8 days ago
IsTom|8 days ago
Aerroon|8 days ago
It probably won't matter much here though.
qoez|7 days ago
culopatin|7 days ago
thatwasunusual|8 days ago
Why is this so damn important? Isn't it more important to end up with the best result?
I (in Norway) use a homelab with Ollama to generate a report every morning. It's slow, but it runs between 5-6 am, energy prices are at a low, and it doesn't matter if it takes 5 or 50 minutes.
xienze|7 days ago
You’re wondering why someone would prefer to get the same or better result in less time for less money?