top | item 44462031

(no title)

Apple released a paper showing the diminishing returns of "deep learning" specifically when it comes to math. For example, it has a hard time solving the Tower of Hanoi problem past 6-7 discs, and that's not even giving it the restriction of optimal solutions. The agents they tested would hallucinate steps and couldn't follow simple instructions.

On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

discuss

senko|8 months ago

That's one reading of that paper.

The other is that they intentionally forced LLMs to do the things we know are bad at (following algorithms, tasks that require more context that available, etc) without allowing them to solve it in a way they're optimized to do (write a code that implements the algorithm).

A cynical read is that the paper is the only AI achievement Apple has managed to do in the past few years.

(There is another: they managed not to lose MLX people to Meta)

koakuma-chan|8 months ago

> On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

It is different. There are usually two main parts to the prompt:

1. The context.

2. The instructions.

The context part has to be optimized to be as small as possible, while still including all the necessary information. It can also be compressed via, e.g., LLMLingua.

On the other hand, the instructions part must be optimized to be as detailed as possible, because otherwise the LLM will fill the gaps with possibly undesirable assumptions.

So "context engineering" refers to engineering the context part of the prompt, while "prompt engineering" could refer to either engineering of the whole prompt, or engineering of the instructions part of the prompt.

0x445442|8 months ago

I'm getting on in years so I'm becoming progressively more ignorant on technical matters. But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode. Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.

OJFord|8 months ago

Let's just call all aspects of LLM usage 'x-engineering' to professionalise it, even while we're barely starting to figure it out.

antonvs|8 months ago

It’s fitting, since the industry is largely driven by hype engineering.

skeeter2020|8 months ago

We used to call both of these "being good with the Google". Equating it to engineering is both hilarious and insulting.

triyambakam|8 months ago

It is a stretch but not semantically wrong. Strictly, engineering is the practical application of science; we could say that the study of the usage of a model is indeed science and so by applying this science it is engineering.

hnlmorg|8 months ago

Context engineering isn’t a rebranding. It’s a widening of scope.

Like how all squares are rectangles, but not all rectangles are squares; prompt engineering is context engineering but context engineering also includes other optimisations that are not prompt engineering.

That all said, I don’t disagree with your overall point regarding the state of AI these days. The industry is full of so much smoke and mirrors these days that it’s really hard to separate the actual novel uses of “AI” vs the bullshit.

bsenftner|8 months ago

Context engineering is the continual struggle of software engineers to explain themselves, in an industry composed of weak communicators that interrupt to argue before statements are complete, do not listen because they want to speak, and speak over one another. "How to use LLMs" is going to be argued forever simply because those arguing are simultaneously not listening.

vidarh|8 months ago

The paper in question is atrocious.

If you assume any kind of error rate of consequence, and you will get that, especially if temperature isn't zero, and at larger disk sizes you'd start to hit context limits too.

Ask a human to repeatedly execute the Tower of Hanoi algorithm for similar number of steps and see how many will do so flawlessly.

They didn't measure "the diminishing returns of 'deep learning'"- they measured limitations of asking a model to act as a dumb interpreter repeatedly with a parameter set that'd ensure errors over time.

For a paper that poor to get released at all was shocking.

sitkack|8 months ago

At this point all of Apple's AI take-down papers have serious flaws. This one has been beaten to death. Finding citations is left to the reader.