top | item 46590672

(no title)

0xf8 | 1 month ago

I very much agree with how you’ve categorized the initial state condition that is amenable to LLM assisted SWE and works well to a greater state of beneficial order. And implicitly I also agree most of the complement to that set of applied contexts yields ~medium to not so productive results.

But what do you mean by “LLM prompting is on average much less analyzable” ? Isn’t structured prompting (what that should optimally look like) the most objective and well defined part of the whole workflow. it’s the lowest entropy part of the situation, we know pretty well what a good LLM prompt is and what will be ineffective, even LLMs “know” that. Do you mean “context engineering” is hard to optimize around ? That’s often thought of interchangeably I think, but regardless that has in fact become the “hard problem” (user facing) in effectively leveraging LLM for dev work. Ever since the reasoning class models were introduced I think, it became more about context engineering in practice than prompting. Nowadays from the very onset Even resuming a session efficiently often requires a non-trivial approach that we’ve already started to design patterns and built tools around, (like CLI coding workflows adding /compact as user directive, etc).

I’m not a software engineer by trade, so I can’t pretend to know what that fully entails at the tail ends of enterprise scale and complexity, but I’ve spent a decent amount of time programming and as far as LLMs go, I think there’s probably somewhere down the road where we get so methodical about context engineering and tooling and memory management, all of the vast still somewhat nebulous surrounding space and scaffolding to LLM workflows that have a big impact on productive use of them—we may eventually engineer that aspect to an extent that will be able to much more consistently yield better results across more applied contexts than the “clean code”/“trivial app” dichotomy. But … I think the depth of additional effort and knowledge and skill required by human user to do this optimal context engineering (once we fully understand how even) to get the best out of LLMs… I think that quickly just converges to — what it means to be a competent software engineer already. the meta layers around just “writing code” that are required to build robust systems and maintain them, the amount of work required to coerce non-deterministic models into effectively internalizing that, or at minimum not fvcking it up… that juice might not be worth the squeeze when it’s essentially what a good developer’s job is already. If that’s true then there will likely remain a ceiling of finite productivity you can expect from LLM assisted development for a long time… (I conjecture).

discuss

friendzis|1 month ago

> Do you mean “context engineering” is hard to optimize around ? That’s often thought of interchangeably I think,

The so called "context" is part of the prompt.

> we may eventually engineer that aspect to an extent that will be able to much more consistently yield better results across more applied contexts than the “clean code”/“trivial app” dichotomy.

> the amount of work required to coerce non-deterministic models into effectively internalizing that,

That's, essentially, the point here. You write a prompt (or context, or memory, or whatever people want to call it to make themselves feel better), get code out, test the code and get test failures. Now what? Unless the problem is obvious lack of information in the prompt (i.e. something was not defined), there are no methodical ways to patch the prompt in a way that consistently fixes the error.

You can take program code, apply certain analytical rules on it and exhaustively define all the operations, states and side effects the program will have. That might be an extremely hard exercise to do in full, but in the end this is what it means to be analyzable. You can take a reduced set of rules and heuristics and quickly build a general structure of the operations and analyze deficiencies. If you are given a prompt, regardless of how well structured it is, you cannot, by definition, in general tell what the eventual output is going to look like without invoking the full ruleset (i.e. running the prompt through an LLM), therefore average fix of a prompt is effectively a full rewrite, which does not invoke the shortcut I have invoked.