top | item 47206270

(no title)

JKCalhoun | 19 hours ago

I've never understood the obsession with token/s. I'm fine with asking a question and then going on to another task (which might be making coffee).

Even with a cloud-based LLM where the response is pretty snappy, I still find that I wander off and return when I am ready to digest the entire response.

discuss

order

ibeckermayer|11 hours ago

Your workflow is unusual, oftentimes there is a vigorous back and forth, or a desired output like code generation, etc where a low tk/s drastically effects ux and user productivity.

But the real kicker here is the 90s ttft, that means you ask a question and don't see anything for a full minute and a half.

nitinreddy88|18 hours ago

You are fine with it. But may be rest of the world is not. Anyway, to compare performance/benchmark, we need metrics and this is one of the basic metric to measure.