Large Language Models Are Human-Level Prompt Engineers

[+] bugglebeetle|2 years ago|reply

I can’t find the link to the paper right now, but after reading about how LLMs perform better with task breakdowns, I vastly improved my integrations by having ChatGPT generate prompts that decompose a general task into a series of tasks based on a sample input and output. I haven’t needed to make a self-refining system (one or two rounds of task decomposition and refinement resulted in the expected result for all inputs), but I would assume this is fairly trivial and that AIs can do it better than humans.

This is also an area where I expect OpenAI will continue to demolish the competition. The ability to recursively generate and process large prompts is truly nuts. I tried swapping in some of the “high-performing” LLama models and they all choked on anything more than a paragraph.

[+] axiom92|2 years ago|reply

For those curious about self-refining systems: https://selfrefine.info/ (our recent work).

[+] bravogamma|2 years ago|reply

Will you share examples for prompts that “vastly improved” your integrations?

[+] famouswaffles|2 years ago|reply

Capable enough LLMs are human level for lots of things. Reinforcement learning from ai feedback is a thing (the anthropic claude models use that). Strictly speaking, it's not necessary to have humans in the loop for a lot of these things.

Some are hesitant to admit we've created human level general intelligence but saying otherwise doesn't really hold up to scrutiny.

[+] lukasb|2 years ago|reply

I see people saying things like this but I have yet to see anyone show data for a non-trivial workflow with human-level accuracy over a wide range of inputs, without a human in the loop.

[+] macrolocal|2 years ago|reply

Maybe for oversight and liability.

[+] Buttons840|2 years ago|reply

Headlines like this hint towards AI improving itself. Prompting itself in this case. But as we see in reinforcement learning, algorithms that act and improve themselves are not new. The interesting thing will be weather or not they eventually "collapse".

For example, if an RL algorithms is performing well on an Atari game, you can stop the training and just let the agent run for years and the performance will remain about the same. However, if you allow the agent to continue training, it's not clear whether it will (1) continue improving, (2) stay about the same, or (3) collapse and perform much worse and never recover. I'm not an RL expert, but I've spent a lot of time experimenting and implementing the algorithms myself and I've seen all 3 of these scenarios play out, and I'm never quite sure what's going to happen so long as I allow the training to continue.

GTP4 will remain GTP4 forever, and that's amazing, but just because GTP4 is stable and amazing while it's not in training mode, doesn't mean it will remain stable if we allow it to bootstrap and prompt itself and prepare its own training data, etc.

[+] m3kw9|2 years ago|reply

Just like that. Prompt engineers are also obsolete

[+] jahewson|2 years ago|reply

I’ve said it before but the first jobs that AI will displace are those of people working in AI.

[+] skybrian|2 years ago|reply

When is writing a score function easier than describing what you want? As a UI, it doesn’t seem like an improvement?

[+] kaesar14|2 years ago|reply

This is pretty alarming tbh. Anyone already making a pivot out of SWE?

[+] drooby|2 years ago|reply

I am a SWE currently making a pivot into business owner.

The future I see is that everyone is about to become a CEO with a personal assistant that can run a business.

So I'm going to start building something of my own starting now.

[+] Tade0|2 years ago|reply

In a sense, Yes - to scoring function engineer.

But in seriousness - language models may be scaling in sophistication exponentially with time, but software engineering problems scale in complexity (on average) exponentially with lines of code. The base of this exponential function isn't large, but it's more than 1.

In the end there's a need for someone who understands what they're doing.

Personally, I use ChatGPT to discover libraries that solve my problems and the ~70% success ratio that I'm seeing with this is enough for me for now.

[+] rapind|2 years ago|reply

So today’s hard problems are becoming easy problems, but this produces a new set of harder problems for tomorrow. Ad infinitum.

49 comments