top | item 44303563

(no title)

drag0s | 8 months ago

one example where non-thinking matters would be latency-sensitive workflows, for example voice AI.

discuss

jjani|8 months ago

Correct, though pretty much anything end-user facing is latency-sensitive, voice is a tiny percentage. No one likes waiting, the involvement of an LLM doesn't change this from a user PoV.

eru|8 months ago

I wonder if you can hide the latency, especially for voice?

What I have in mind is to start the voice response with a non-thinking model, say a sentence or two in a fraction of a second. That will take the voice model a few seconds to read out. In that time, you use a thinking model to start working on the next part of the response?

In a sense, very similar to how everyone knows to stall in an interview by starting with 'this is a very good question...', and using that time to think some more.