Seriously this is the part I dont understand about people parroting "prompt engineering" . Isn't it really just throwing random things at a non deterministic black box and hoping for the best?
I find it's more like that silly experiment where you have to make a sandwich exactly as a kid (or adult) writes the instructions. You _think_ you have a good set of instructions and then you get peanut butter on the outside. So, you revisit the instructions to be clearer about what you want done. That's how I see prompt engineering. In that case, you are simply learning how the model tends to follow instructions and crafting a prompt around that. Not so much random, more purposeful.
If someone says they're fine tuning a model (which is changing which layers are activated for a given input) it's generally well tolerated.
If someone says they're tuning a prompt (which is changing which layers are activated for a given input) it's met with extreme skepticism.
At the end of the day ML is probabilistic. You're always throwing random things at a black box and hoping for the best. There are strategies and patterns that work consistently enough (like ReACT) that they carry across many tasks, and there are some that you'll find for your specific task.
And just like any piece of software you define your scope well, test for things within that scope, and monitor for poor outputs.
> If someone says they're fine tuning a model (which is changing which layers are activated for a given input) it's generally well tolerated.
> If someone says they're tuning a prompt (which is changing which layers are activated for a given input) it's met with extreme skepticism.
There are good reasons for that though. The first is the model-owner tuning so that given inputs yield better outputs (in theory for other users too). The second is relying on the user to diagnose and fix the error. That being the "fix" is a problem if the output is supposed to be useful to people who don't know the answers themselves, or if the model is being touted as "intelligence" with a natural language interface, which is where the scepticism comes in...
I mean, a bugfix, a recommendation not to use the 3rd menu option or a "fork this" button are all valid routes to change the runtime behaviour of a program!
(and yes, I get that the "tuning" might simply be creating the illusion that the model approaches wider usability, and that "fine tuning" might actually have worse side effects. So it's certainly reasonable to argue that when a company defines its models' scope as "advanced reasoning capabilities" the "tuning" might also deserve scepticism, and conversely if it defines its scope more narrowly as something like "code complete" there might be a bit more onus on the user to provide structured, valid inputs)
No, its not. While GPT-4 (like some but not all other LLMs) is somewhat nondeterministic (even at zero temperature), that doesn’t mean there aren’t things that have predictable effects on the distribution of behavior that can be discovered and leveraged.
There’s even a term of art for making a plan up front and then hitting it with a low-skew latent space match: “Chain of Thought”. Yeah, it’s seen numbered lists before.
And if at first you don’t succeed, anneal the temperature and re-roll until you’ve got something that looks authentic.
lsaferite|2 years ago
FormerBandmate|2 years ago
It’s Clever Hans on steroids
Turskarama|2 years ago
BoorishBears|2 years ago
If someone says they're tuning a prompt (which is changing which layers are activated for a given input) it's met with extreme skepticism.
At the end of the day ML is probabilistic. You're always throwing random things at a black box and hoping for the best. There are strategies and patterns that work consistently enough (like ReACT) that they carry across many tasks, and there are some that you'll find for your specific task.
And just like any piece of software you define your scope well, test for things within that scope, and monitor for poor outputs.
notahacker|2 years ago
> If someone says they're tuning a prompt (which is changing which layers are activated for a given input) it's met with extreme skepticism.
There are good reasons for that though. The first is the model-owner tuning so that given inputs yield better outputs (in theory for other users too). The second is relying on the user to diagnose and fix the error. That being the "fix" is a problem if the output is supposed to be useful to people who don't know the answers themselves, or if the model is being touted as "intelligence" with a natural language interface, which is where the scepticism comes in...
I mean, a bugfix, a recommendation not to use the 3rd menu option or a "fork this" button are all valid routes to change the runtime behaviour of a program!
(and yes, I get that the "tuning" might simply be creating the illusion that the model approaches wider usability, and that "fine tuning" might actually have worse side effects. So it's certainly reasonable to argue that when a company defines its models' scope as "advanced reasoning capabilities" the "tuning" might also deserve scepticism, and conversely if it defines its scope more narrowly as something like "code complete" there might be a bit more onus on the user to provide structured, valid inputs)
verdagon|2 years ago
redox99|2 years ago
[1] https://arxiv.org/abs/2201.11903
bbor|2 years ago
dingosity|2 years ago
tiborsaas|2 years ago
dragonwriter|2 years ago
qgin|2 years ago
benreesman|2 years ago
And if at first you don’t succeed, anneal the temperature and re-roll until you’ve got something that looks authentic.
AstralStorm|2 years ago
In short, if you cannot use this thing without validating every step it's worthless for logic. You might as well solve the problem yourself.
unknown|2 years ago
[deleted]
throwawayadvsec|2 years ago
thumbuddy|2 years ago