top | item 38207543

(no title)

srdjanr | 2 years ago

Probably because it's too expensive. Prompt can be processed quickly but output tokens are much slower (and that means more expensive).

From my local test on a 13B model, output tokens are 20-30x more expensive than input tokens. So OpenAI's pricing structure is based on expectation that there's much more input than output tokens in an average response. It didn't matter too much if a small percentage of requests used all 4k tokens for output, but with 128k it's a different story.

discuss

No comments yet.