top | item 44256430

(no title)

enqk | 8 months ago

If you only have two points you would always assume linear. But what if it’s quadratic, like this article claims?

https://www.timdavis.com/blog/scale-or-surrender-when-watts-...

discuss

Good point!

The good news would be that GPT-4o average energy usage per query would be lower than 20 Wh.

The bad news is that there's a quadratic increase in energy usage with the increase in a model's maximum context window. GPT 3.5 -> GPT 4 was an increase from thousands of tokens to hundreds of thousands of tokens.