WingNews

lsaferite|2 years ago

I find it's more like that silly experiment where you have to make a sandwich exactly as a kid (or adult) writes the instructions. You _think_ you have a good set of instructions and then you get peanut butter on the outside. So, you revisit the instructions to be clearer about what you want done. That's how I see prompt engineering. In that case, you are simply learning how the model tends to follow instructions and crafting a prompt around that. Not so much random, more purposeful.

FormerBandmate|2 years ago

That isn’t the model reasoning. That’s you figuring out exactly what parameters you need to use to make the model give the result you want.

It’s Clever Hans on steroids

Turskarama|2 years ago

Can you come up with a method that will get you a good response every single time? Because if you can't, it's not really engineering.

BoorishBears|2 years ago

If someone says they're fine tuning a model (which is changing which layers are activated for a given input) it's generally well tolerated.

If someone says they're tuning a prompt (which is changing which layers are activated for a given input) it's met with extreme skepticism.

At the end of the day ML is probabilistic. You're always throwing random things at a black box and hoping for the best. There are strategies and patterns that work consistently enough (like ReACT) that they carry across many tasks, and there are some that you'll find for your specific task.

And just like any piece of software you define your scope well, test for things within that scope, and monitor for poor outputs.

notahacker|2 years ago

> If someone says they're fine tuning a model (which is changing which layers are activated for a given input) it's generally well tolerated.

> If someone says they're tuning a prompt (which is changing which layers are activated for a given input) it's met with extreme skepticism.

There are good reasons for that though. The first is the model-owner tuning so that given inputs yield better outputs (in theory for other users too). The second is relying on the user to diagnose and fix the error. That being the "fix" is a problem if the output is supposed to be useful to people who don't know the answers themselves, or if the model is being touted as "intelligence" with a natural language interface, which is where the scepticism comes in...

I mean, a bugfix, a recommendation not to use the 3rd menu option or a "fork this" button are all valid routes to change the runtime behaviour of a program!

(and yes, I get that the "tuning" might simply be creating the illusion that the model approaches wider usability, and that "fine tuning" might actually have worse side effects. So it's certainly reasonable to argue that when a company defines its models' scope as "advanced reasoning capabilities" the "tuning" might also deserve scepticism, and conversely if it defines its scope more narrowly as something like "code complete" there might be a bit more onus on the user to provide structured, valid inputs)

verdagon|2 years ago

ELI5 layers? Could someone like me see when I've used one layer as opposed to another, when using ChatGPT?

redox99|2 years ago

Obviously not. For example using Chain of Thought will increase the model's performance[1]

[1] https://arxiv.org/abs/2201.11903

bbor|2 years ago

That’s science :). We have now established many strategies for context-independent effective prompting via typical experimental research - see https://help.openai.com/en/articles/6654000-best-practices-f...

dingosity|2 years ago

That is explicitly NOT science.

tiborsaas|2 years ago

Almost, but it's more like throwing ideas with an expected outcome at a non deterministic black box and hoping for the best.

dragonwriter|2 years ago

No, its not. While GPT-4 (like some but not all other LLMs) is somewhat nondeterministic (even at zero temperature), that doesn’t mean there aren’t things that have predictable effects on the distribution of behavior that can be discovered and leveraged.

qgin|2 years ago

It’s not deterministic, but if you set temperature = 0, you’ll start of find reliable techniques for all kinds outcomes.

benreesman|2 years ago

There’s even a term of art for making a plan up front and then hitting it with a low-skew latent space match: “Chain of Thought”. Yeah, it’s seen numbered lists before.

And if at first you don’t succeed, anneal the temperature and re-roll until you’ve got something that looks authentic.

AstralStorm|2 years ago

Right, but is it actually logically sound and complete, rather than truthy?

In short, if you cannot use this thing without validating every step it's worthless for logic. You might as well solve the problem yourself.

unknown|2 years ago

[deleted]

throwawayadvsec|2 years ago

There are definitely patterns that transfer across different problems, it's not random at all.

thumbuddy|2 years ago

Yes. Yes it is.

(no title)

discuss