top | item 39985139

(no title)

xrendan | 1 year ago

This generally resonates with what we've found. Some colour based on our experiences.

It's worth spending a lot of time thinking about what a successful LLM call actually looks like for your particular use case. That doesn't have to be a strict validation set `% prompts answered correctly` is good for some of the simpler prompts, but especially as they grow and handle more complex use cases that breaks down. In an ideal world

> chain-of-thought has a speed/cost vs. accuracy trade-off a big one.

Observability is super important and we've come to the same conclusion of building that internally.

> Fine-tune your model

Do this for cost and speed reasons rather than to improve accuracy. There are decent providers (like Openpipe, relatively happy customer, not associated) who will handle the hard work for you.

discuss

No comments yet.