top | item 41870383

(no title)

tweezy | 1 year ago

I've tried a few things that seem to work. The first works pretty much perfectly, but adds quite a bit of latency to the final response. The second isn't perfect, but it's like 95% there

1 - the first option is to break this in to three prompts. The first prompt is either write a brief version, an outline of the full response, or even the full response. The second prompt is a validator, so you pass the output of the first to a prompt that says "does this follow the instructions. Return True | False." If True, send it to a third that says "Now rewrite this to answer the user's question." If False, send it back to the first with instructions to improve the response. This whole process can mean it takes 30 seconds or longer before the streaming of the final answer starts.

There are plenty of variations on the above process, so obviously feel free to experiment.

2 - The second option is to have instructions in your main prompt that says "Start each response with an internal dialogue wrapped in <thinking> </thinking> tags. Inside those tags first describe all of the rules you need to follow, then plan out exactly how you will respond to the user while following those rules."

Then on your frontend have the UI watch for those tags and hide everything between them from the user. This method isn't perfect, but it works extremely well in my experience. And if you're using a model like gpt-4o or claude 3.5 sonnet, it makes it really hard to make a mistake. This is the approach we're currently going with.

discuss

No comments yet.