GPT Best Practices

[+] maxdoop|2 years ago|reply

I find it interesting that most of these tactics can be summarized into: "write clearly and provide ample information."

I have a side business for ACT/SAT prep. I teach English, and often have to remind students about sentence structure and word flow. For example, I can't say "My mom, my grandma, and I went to her favorite store" -- in that example, there is no clear way to know who "her" is.

Similarly, I see many people claim GPT-n is "dumb", yet when I see their prompt, I realize it was a bad prompt. There are clear logical inconsistencies, there is inadequate information, and there is confusing word usage.

I've been astounded by GPT-4 and have nearly 5x-ed my productivity with it (for coding and for parsing documents). But I think my experience is a result of my habitual "standardized testing" writing style, while others' poor experience is a result of their more layman writing style.

[+] diydsp|2 years ago|reply

In what pre-ChatGPT world did "write clearly and provide ample information" include such familiar and long-known tips as:

* Ask the model to adopt a persona

* Use delimiters to clearly indicate distinct parts of the input

* Specify the desired length of the output

> others' poor experience is a result of their more layman writing style.

I guess we'll have to pass the hat around for those wretched souls. In the meantime, someone needs to tell the English teacher that "layman" is not an adjective.

> sentence structure and word flow

In my experience ChatGPT doesn't care about those. It's able to infer through quite a large amount of sloppiness. The much larger gains come from guiding it into a model of the world, as opposed to direct it to respond to lean perspectives like, "What do I eat to be better?"

[+] hammock|2 years ago|reply

>I have a side business for ACT/SAT prep. I teach English, and often have to remind students about sentence structure and word flow. For example, I can't say "My mom, my grandma, and I went to her favorite store" -- in that example, there is no clear way to know who "her" is.

The Lord’s work. I deal with this issue at work all day long.

I am lucky I had a high school English teacher who DRILLED into me the slogan “no pronouns without a clear, one-word antecedent.”

That slogan is probably a top 2 mantra for me that has paid dividends in my communication skills. The other one would be “break long sentences into short ones.”

[+] nickfromseattle|2 years ago|reply

> Similarly, I see many people claim GPT-n is "dumb", yet when I see their prompt, I realize it was a bad prompt.

I see this comment a lot, but I drew a different conclusion. People with high reading / writing ability assign less value to GPT's writing ability, whereas people with lower reading / writing ability assign more value to GPT's ability.

Edit:

> yet when I see their prompt, I realize it was a bad prompt.

So, I asked ChatGPT to re-write my reply.

----

Prompt: "Re-write this text, but more intelligently."

Output: I frequently encounter this observation, yet I have arrived at a divergent inference. Individuals possessing exceptional proficiency in reading and writing tend to ascribe a diminished significance to the writing aptitude of GPT, whereas individuals with more limited abilities in reading and writing tend to attach greater value to GPT's prowess.

----

Prompt: "Re-write this text, but make it sound better."

Output: I frequently encounter this recurring sentiment, yet I have formulated an alternative perspective. Individuals endowed with advanced reading and writing prowess tend to bestow a lesser degree of importance upon GPT's writing ability, whereas individuals possessing more modest proficiency in reading and writing are inclined to accord greater significance to GPT's remarkable capabilities.

-----

Prompt: "Re-write this text, but try and use the word 'ability' less."

Output: I frequently come across this comment, but my conclusion differs. Individuals who excel in reading and writing tend to place a diminished emphasis on GPT's writing prowess, while those with more limited skills in reading and writing tend to attach greater importance to GPT's capabilities.

----

Conclusion: I like my original reply better than GPT's re-write.

[+] glenngillen|2 years ago|reply

It's interesting you say this. I spent the weekend playing with ChatGPT to try and get it to build a Swift app for iOS and macOS (I have zero previous experience with Swift). Thankfully I had a compiler to back me up and tell me if things actually worked. I found the whole experience a little jarring. ChatGPT was pretty good at generating some code, but it felt a lot like a job interview where I'm working hard to coach a candidate into the right answer. Or, now that you mention it, some previous experiences I've had trying to work with outsourced providers where we're trying to overcome a language barrier.

The problems are often that I got exactly what I asked for. Not a thing more, no context that I thought would be assumed (e.g., don't remove the functionality I asked you to implement in the previous step), just a very literal interpretation of the asks.

I definitely found myself quickly adapting to try and be clearer and potentially over expressive in my prompts.

[+] hospitalJail|2 years ago|reply

>I see many people claim GPT-n is "dumb"

Depends.

Can't do math or logic. I have a question I ask ChatGPT to see if it can do logic yet, it still cannot. (Can't mention this question here or it will get fixed.)

Its great for brain storming or low risk problems. I don't think the accuracy problem will ever be fixed.

I probably 5x my productivity as well, but that doesnt mean its able to do logic.

[+] mensetmanusman|2 years ago|reply

I think we will find that certain personality and thinking types will be the most successful with this technology.

It will be interesting if only the highly educated are able to best leverage this, because that would be unfortunate and would accelerate inequality.

I also really hope this can be used to improve learning to bridge this gap, and this summer I will have my high school intern use this technology frequently with the hope that it accelerates his improvement.

[+] fnordpiglet|2 years ago|reply

I think something interesting is that this unlocks huge potential for English majors and puts engineering / math / comp sci at a structural disadvantage. Hmmm

[+] raincole|2 years ago|reply

> I find it interesting that most of these tactics can be summarized into: "write clearly and provide ample information."

I'm so used to ChatGPT now that for any non-trivial programming task that I'm not 100% confident with, I check how ChatGPT approaches it first. Not just one or two sentence, but a detailed summary with code example. Usually a single prompt of mine costs >300 tokens. Sometimes >1000 tokens.

I honestly don't know whether ChatGPT made me dumber or smarter. On one hand I'm relying on it a lot. On the other hand, I'm forced to actually think over my task in hand, not just telling myself that I've thought it through.

[+] weinzierl|2 years ago|reply

*"write clearly and provide ample information."

... but if you provide too much information your counterpart might lose interest and forget what you said first.

[+] ojosilva|2 years ago|reply

> I've been astounded by GPT-4 and have nearly 5x-ed my productivity with it (for coding and for parsing documents).

I wonder if StackOverflow, Quora, GitHub Issues and similar websites have noticed a decrease in new questions/issues raised? LLMs have a kind of network effect that it makes people ask less questions in public sites and more privately to their own LLM sessions, and maybe even give up if the LLM does not answer. That reduces critical mass at q&a forums, so, given enough time, less people will go there for answers. I'd say that's a great loss!

In a way, sarcastically speaking, it's one of AI's way of becoming our overlords.

[+] mcguire|2 years ago|reply

I submitted a puzzle from https://dmackinnon1.github.io/fickleSentries/, with the basic prompt, "I am going to present you with a logic puzzle. I would like you to solve the puzzle."

https://pastebin.com/a3WzgvK4

The solution GPT-3.5 (I don't have access to 4.) gave was: "In conclusion, based on the statements and the given information, the treasure in the cave must be copper."

The solution given with the puzzle is "Here is one way to think about it: If Guard 1 is telling the truth, then the treasure must be diamonds. If Guard 1 is lying, then the treasure can be copper or gold. If Guard 2 is telling the truth, then the treasure must be silver. If Guard 2 is lying, then the treasure can be diamonds or rubies. The only possible option based on the statements of both guards is diamonds."

Is there any way to improve that prompt?

[+] taf2|2 years ago|reply

Heh yeah me too for software — fun thing we have all of our after hours chats powered by gpt 4 with our documentation.

Many people try to chat with it like a dumb bot using single words… few people have figured out you get very in-depth answers - so over all I’m very pleased with our results but still funny how people react to it…

[+] BeFlatXIII|2 years ago|reply

What are some other clarifications to that sentence besides those in the forms "My mom and I went with my grandma to her favorite store" or "I went with my mom and grandma to my mom's favorite store"?

[+] _ea1k|2 years ago|reply

I've been surprised at how many times I've realized that a bad result from GPT was because I specified the problem inadequately or incorrectly. Or that I took the solution, but applied it incorrectly.

[+] nathan_compton|2 years ago|reply

I find GPT4 incredibly productive and useful and I would still emphatically call it dumb.

[+] jwr|2 years ago|reply

I've been trying to use the OpenAI API for the last two weeks or so (GPT-4 mostly). This article rubs me the wrong way. "GPT Best Practices" indeed.

Most of my calls end with a time out (on their side) after 10 minutes. I get 524 and 502 errors, sometimes 429, and sometimes a mildly amusing 404 Model not found. The only way I can get reasonable responses is to limit my requests to less than 1400 tokens, which is too little in my application.

And on top of that they actually charge me for every request. Yes, including those 524s, 502s and 429s, where I haven't seen a single byte of a response. That's fraudulent. I reported this to support twice, a week later I haven't even heard back.

Their status page happily states that everything is just fine.

From the forums it seems I'm not the only one experiencing these kinds of problems.

I'd argue "GPT Best Practices" should include having working APIs, support that responds, and not charging customers for responses that are never delivered.

[+] sagarpatil|2 years ago|reply

If I were you, I would move to Azure OpenAI. I’ve seen benchmarks where Azure OpenAI is 3x faster than just OpenAI.

Although, there’s a waitlist for GPT-4, so I’m not sure if you’ll access immediately.

[+] avindroth|2 years ago|reply

That's odd, I have been heavily using GPT-4 API (near 100 requests a day) and didn't notice any errors like that. I noticed maybe 1 or 2 errors with really chat history.

[+] hammock|2 years ago|reply

All of these best practices are great for managers dealing with their staff as well:

1. Write clear instructions

2. Provide reference text

3. Split complex tasks into simpler subtasks

4. Give time to "think"

5. Use external tools

6. Test changes systematically

[+] f0e4c2f7|2 years ago|reply

This is a good observation. I find that working with LLMs feels closer to the skills of managing a team than to coding itself. Intuitions about how to divide work and understanding strengths and limitations seem to go far.

[+] fnordpiglet|2 years ago|reply

Revision for generality:

All of these best practices are great for humans dealing with their humans as well:

[+] tikkun|2 years ago|reply

Yes, it goes in the other direction too. A few books that I've read about delegation have been quite helpful for prompt writing.

[+] justanotheratom|2 years ago|reply

RE: Give time to "think"

"Transformers need tokens to think" - @karpathy on Chain of Thought prompting.

[+] simonw|2 years ago|reply

Great to see OpenAI upping their game when it comes to providing documentation for how to get the most out of of their models.

I shuddered a bit at "Ask the model to adopt a persona" because I thought it was going to be that "You are the world's greatest expert on X" junk you see people spreading around all the time, but it was actually good advice on how to use the system prompt - their example was:

> "When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph."

I thought the section with suggestions on how to automate testing of prompts was particularly useful - I've been trying to figure that out myself recently. https://platform.openai.com/docs/guides/gpt-best-practices/s...

[+] mritchie712|2 years ago|reply

This example stuck out to me[0]. We've been calling this a "triage" prompt and it's quite effective when you have multiple paths a user could go down or if they could be asking for multiple things at once.

0 - https://platform.openai.com/docs/guides/gpt-best-practices/s...

[+] tomduncalf|2 years ago|reply

In his State of GPT talk (1), Karpathy does say the “you are an expert” type prompts help, because the model has learned from all sorts of content both good and bad, so it kind of narrows it down to more likely provide good answers

1: https://m.youtube.com/watch?v=s6zNXZaIiiI

[+] orbital-decay|2 years ago|reply

Two more practices that are relevant to how transformers work:

- instead of using it as a source of facts, use it to transform the text with the facts you provide, which is does much better. (if the accuracy is important for your case)

- to improve the answer, ask it to reflect on its own result and reiterate the answer. The model produces the result token by token, so it's unable to check its validity at the inference time. This way you put it back into the context and explicitly tell the model to make a second pass.

[+] tikkun|2 years ago|reply

Here's my personal template for semi-complex prompts:

System message

  [A couple sentences of instructions]

  Example 1 - Input
  ##
  [example input 1]
  ##
  Example 1 - Output
  ##
  [example output 1]
  ##

User message

  Actual 1 - Input
  ##
  [the thing you want it to process]
  ##
  Actual 1 - Output
  ##

Fill in all the [] sections. Then hit submit. This should work pretty well. I'd suggest setting the temperature to 0 if you want more predictable responses.

I wrote up additional info here: https://llm-utils.org/My+template+for+a+semi-complex+GPT-4+p...

I first played with GPT early 2021, and have been actively using it since mid 2022. This the method I've found to have the best tradeoff between complexity and effectiveness.

Note that I always try to zero shot it first, and I only use this method for things where zero shot fails, and where I need GPT to get right and that it's worth the effort of making a few shot prompt for.

[+] awinter-py|2 years ago|reply

'tell it exactly the answer you want and keep changing your prompt until it spits that back at you. if you know the right answer already, you will know when it gets it right. assume it will break. ideally use a different tool.'

[+] minimaxir|2 years ago|reply

These are good examples of how to leverage the system prompt, which is vastly underdiscussed as that is only available via the API or the Playground and not the megapopular ChatGPT webapp. Even in LangChain it requires some hacking to get working and may not be stable across generations.

I am releasing a minimal Python chat AI package interface this week which very heavily encourages use the system prompt for efficient generations that are also stable and can hand a variety of user inputs. The results have been very effective!

[+] samwillis|2 years ago|reply

Absolutely nothing about preventing or mitigating prompt injections.

Any other "best practices" for any other sort of platform, database or language, should include suggestions on how to keep your system secure and not vulnerable to abuse.

Coding for LLMs right now is a bit like coding with PHP+MySQL in the late 90s to early 00s, throw stuff at it with little thought and see what happens, hence the wave of SQL injection vulnerabilities in software of that era. The best practices haven't even really been established, particularly when it comes to security.

[+] TeMPOraL|2 years ago|reply

I'm still somewhat confident it'll eventually be formally proven that you can't make a LLM (or the successor generative models) resistant to "prompt injections" without completely destroying its general capability of understanding and reasoning about their inputs.

SQL injections, like all proper injection attacks (I'm excluding "prompt injections" here), are caused by people treating code as unstructured plaintext, and doing in plaintext-space the operations that should happen in the abstract, parsed state - one governed by the grammar of the language in question. The solution to those is to respect the abstraction / concept boundaries (or, in practice, just learn and regurgitate a few case-by-case workarounds, like "prepared statements!").

"Prompt injections" are entirely unlike that. There is no aspect of doing insertion/concatenation at the wrong abstraction level, because there are no levels here. There is no well-defined LLMML (LLM Markup Language). LLMs (and their other generative cousins, like image generation models) are the first widely used computer systems that work directly on unstructured plaintext. They are free to interpret it however they wish, and we only have so much control over it (and little insight into). There are no rules - there's only training that's trying to make them respond the way humans would. And humans, likewise, are "vulnerable" to the same kind of "prompt injections" - seeing a piece of text that forces them to recontextualize the thing they've read so far.

I think mitigations are the only way forward, and at least up to the point we cross the human-level artificial general intelligence threshold, "prompt injection" and "social engineering" will quickly become two names for the same thing.

[+] tester457|2 years ago|reply

For as long as LLMs are a blackbox prompt injection will never be fully solved. Prompt injection is an alignment problem.

[+] Demmme|2 years ago|reply

Yes becose that isn't the promise of the article and it's about them and how you use their platform.

There is no relevant promtinjection you should be aware of because you will not be affected by it ajyway

[+] Der_Einzige|2 years ago|reply

Prompt injection becomes not a problem if you write a restrictive enough template for your prompt with a a LLM template language, such as what Guidance from microsoft provides.

You can literally force it to return responses that are only one of say 100 possible responses (i.e. structure the output in such a way that it can only return a highly similar output but with a handful of keywords changing).

It's work, but it will work with enough constraints because you've filtered the models ability to generate "naughty" output.

[+] Peretus|2 years ago|reply

Not affiliated with them apart from being an early customer, but we're working with Credal.ai to solve this problem. In addition to being able to redact content automatically before it hits the LLM, they also have agreements in place with OpenAI and Anthropic for data deletion, etc. Ravin and the team have been super responsive and supportive and I'd recommend them for folks who are looking to solve this issue.

[+] smoldesu|2 years ago|reply

Are there any proven ways to mitigate prompt injections?

[+] nomadness|2 years ago|reply

[deleted]

[+] thrdbndndn|2 years ago|reply

The only thing I still use ChatGPT semi-frequently is to translate stuff, mainly from Japanese to my native language or English.

And I'm surprised how often it failed to follow the basic instruction of

    Please translate the following paragraph to X-language.
    (Paragraph in Japanese.)

And I have to say "Please translate the following paragraph to X-language" every single time -- I can't just say, "hey, please just translate paragraphs I give from now on." It won't follow it for very long before it starts to do other random stuff or tries to follow the content of the Japanese paragraphs I was trying to get translated.

Any clue how to make it better? I use 3.5 FWIW.

[+] wskish|2 years ago|reply

Interesting that we have OpenAI using the more generic "GPT" here. Previously they would refer more specifically to ChatGPT, GPT-3, or GPT-4. I am guessing this is related to their trademark application for GPT, which was initially refused by the USPTO on the grounds of "GPT" being "merely descriptive".

https://news.ycombinator.com/item?id=36155583

[+] jesterson|2 years ago|reply

Hijacking the discussion - did anyone noticed OpenAI does sille mistakes very often now?

Not talking about complex matters, just not closed quotes or incorrect closing quotes, which is very easy to see.

When confronted it apologies and then gives corrected answer, which... yea, sometimes have other silly mistakes too.

It wasnt in GPT-3 so wild guess would be they adding mistakes on purpose.

[+] AtlasBarfed|2 years ago|reply

Ahhh, one of my pet peeves in IT:

"best practices" ... what proof do they offer that these are the best? Is there a detailed deconstruction of why these are the best, and what practices they are superior to? A multifaceted analysis that may show other practices to be better under some circumstances?

Oh, you mean "recommended practices".

[+] sagarpatil|2 years ago|reply

I would highly recommend watching - State of GPT video by Karpathy at Microsoft Build to get an idea of how LLM’s work. There are some really good prompting ideas. It changed the way I use GPT-4.

Video: https://youtu.be/bZQun8Y4L2A

[+] RcouF1uZ4gsC|2 years ago|reply

This is sounding more like programming and less like an assistant.

[+] MuffinFlavored|2 years ago|reply

> Use external tools

I have yet to find a good way for example to feed ChatGPT GPT-4 (or GPT-3.5 for that matter) "here is a semi-large list of like... songs. help me classify which genre they are closest to" because of the token limit/timeout in the chat.

I'm sure an API integration is possible, but that opens yourself up to potentially "huge" costs compared to a guaranteed free implementation (or the fixed $20/mo)

Anybody able to feed it rows/cells from Google Sheets easily?

[+] braindead_in|2 years ago|reply

> The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy. Then before presenting the output to the user, the output is parsed and only part of the output is made visible.

This is quite remarkable. Can we now say that inner monologue is also an emergent property? Can this lead to some form of self-awareness or introspection?

[+] foxbyte|2 years ago|reply

Just came across this valuable piece on GPT best practices, and it reminded me of an interesting point I read elsewhere. It's crucial to shape the input prompts effectively as the AI's response heavily depends on the input provided, mirroring a 'garbage in, garbage out' principle for AI interactions.

[+] jerpint|2 years ago|reply

One « best practice » completely ignored by this document is how to ensure non-stochastic results (eg temperature=0), and better yet how to be « sure » which version of chatGPT you’re using (currently no way of knowing). I wish they would give more transparent versioning

[+] jofinjoseph|2 years ago|reply

Often the right prompt and hacks are way beyong the trivial tricks given in this article. Or for that matter, the course OpenAI created along with Deeplearning.ai.I wish there is a great compilation somewhere

189 comments