The new skill in AI is not prompting, it's context engineering

[+] simonw|8 months ago|reply

I wrote a bit about this the other day: https://simonwillison.net/2025/Jun/27/context-engineering/

Drew Breunig has been doing some fantastic writing on this subject - coincidentally at the same time as the "context engineering" buzzword appeared but actually unrelated to that meme.

How Long Contexts Fail - https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-ho... - talks about the various ways in which longer contexts can start causing problems (also known as "context rot")

How to Fix Your Context - https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... - gives names to a bunch of techniques for working around these problems including Tool Loadout, Context Quarantine, Context Pruning, Context Summarization, and Context Offloading.

[+] the_mitsuhiko|8 months ago|reply

Drew Breunig's posts are a must read on this. This is not only important for writing your own agents, it is also critical when using agentic coding right now. These limitations/behaviors will be with us for a while.

[+] Daub|8 months ago|reply

For visual art I feel that the existing approaches in context engineering are very much lacking. An Ai understands well enough such simple things as content (bird, dog, owl etc), color (blue green etc) and has a fair understanding of foreground/background. However, the really important stuff is not addressed.

For example: in form, things like negative shape and overlap. In color contrast things like Ratio contrast and dynamic range contrast. Or how manipulating neighboring regional contrast produces tone wrap. I could go on.

One reason for this state of affairs is that artists and designers lack the consistent terminology to describe what they are doing (though this does not stop them from operating at a high level). Indeed, many of the terms I have used here we (my colleagues and I) had to invent ourselves. I would love to work with an AI guru to address this developing problem.

[+] daxfohl|8 months ago|reply

I'm surprised there isn't already an ecosystem of libraries that just do this. When building agents you either have to roll your own or copy an algorithm out of some article.

I'd expect this to be a lot more plug and play, and as swappable as LLMs themselves by EOY, along with a bunch of tooling to help with observability, A/B testing, cost and latency analysis (since changing context kills the LLM cache), etc.

[+] risyachka|8 months ago|reply

“A month-long skill” after which it won’t be a thing anymore, like so many other.

[+] storus|8 months ago|reply

Those issues are considered artifacts of the current crop of LLMs in academic circles; there is already research allowing LLMs to use millions of different tools at the same time, and stable long contexts, likely reducing the amount of agents to one for most use cases outside interfacing different providers.

Anyone basing their future agentic systems on current LLMs would likely face LangChain fate - built for GPT-3, made obsolete by GPT-3.5.

[+] dosnem|8 months ago|reply

Providing context makes sense to me, but do you have any examples of providing context and then getting the AI to produce something complex? I am quite a proponent of AI but even I find myself failing to produce significant results on complex problems, even when I have clone + memory bank, etc. it ends up being a time sink of trying to get the ai to do something only to have me eventually take over and do it myself.

[+] old_man_cato|8 months ago|reply

[flagged]

[+] arbitrary_name|8 months ago|reply

From the first link:Read large enough context to ensure you get what you need.

How does this actually work, and how can one better define this to further improve the prompt?

This statement feels like the 'draw the rest of the fucking owl' referred to elsewhere in the thread

[+] JoeOfTexas|8 months ago|reply

So who will develop the first Logic Core that automates the context engineer.

[+] TZubiri|8 months ago|reply

Rediscovering encapsulation

[+] benreesman|8 months ago|reply

The new skill is programming, same as the old skill. To the extent these things are comprehensible, you understand them by writing programs: programs that train them, programs that run inferenve, programs that analyze their behavior. You get the most out of LLMs by knowing how they work in detail.

I had one view of what these things were and how they work, and a bunch of outcomes attached to that. And then I spent a bunch of time training language models in various ways and doing other related upstream and downstream work, and I had a different set of beliefs and outcomes attached to it. The second set of outcomes is much preferable.

I know people really want there to be some different answer, but it remains the case that mastering a programming tool involves implemtenting such, to one degree or another. I've only done medium sophistication ML programming, and my understand is therefore kinda medium, but like compilers, even doing a medium one is the difference between getting good results from a high complexity one and guessing.

Go train an LLM! How do you think Karpathy figured it out? The answer is on his blog!

[+] JohnMakin|8 months ago|reply

> Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates.

Ok, I can buy this

> It is about the engineering of context and providing the right information and tools, in the right format, at the right time.

when the "right" format and "right" time are essentially, and maybe even necessarily, undefined, then aren't you still reaching for a "magic" solution?

If the definition of "right" information is "information which results in a sufficiently accurate answer from a language model" then I fail to see how you are doing anything fundamentally differently than prompt engineering. Since these are non-deterministic machines, I fail to see any reliable heuristic that is fundamentally indistinguishable than "trying and seeing" with prompts.

[+] v3ss0n|8 months ago|reply

At this point , due to non-deterministic nature and hallucination context engineering is pretty much magic. But here are our findings.

1 - LLM Tends to pick up and understand contexts that comes at top 7-12 lines.Mostly first 1k token is best understood by llms ( tested on Claude and several opensource models ) so - most important contexts like parsing rules need to be placed there.

2 - Need to keep context short . Whatever context limit they claim is not true . They may have long context window of 1 mil tokens but only up to avg 10k token have good accuracy and recall capabilities , the rest is just bunk , just ignore them. Write the prompt and try compressing/summerizing it without losing key information manually or use of LLM.

3 - If you build agent-to-agent orchestration , don't build agents with long context and multiple tools, break them down to several agents with different set of tools and then put a planning agent which solely does handover.

4 - If all else fails , write agent handover logic in code - as it always should.

From building 5+ agent to agent orchestration project on different industries using autogen + Claude - that is the result.

[+] mentalgear|8 months ago|reply

It's magical thinking all the way down. Whether they call it now "prompt" or "context" engineering because it's the same tinkering to find something that "sticks" in non-deterministic space.

[+] edwardbernays|8 months ago|reply

The state of the art theoretical frameworks typically separates these into two distinct exploratory and discovery phases. The first phase, which is exploratory, is best conceptualized as utilizing an atmospheric dispersion device. An easily identifiable marker material, usually a variety of feces, is metaphorically introduced at high velocity. The discovery phase is then conceptualized as analyzing the dispersal patterns of the exploratory phase. These two phases are best summarized, respectively, as "Fuck Around" followed by "Find Out."

[+] Aeolun|8 months ago|reply

There is only so much you can do with prompts. To go from the 70% accuracy you can achieve with that to the 95% accuracy I see in Claude Code, the context is absolutely the most important, and it’s visible how much effort goes into making sure Claude retrieves exactly the right context, often at the expense of speed.

[+] dinvlad|8 months ago|reply

> when the "right" format and "right" time are essentially, and maybe even necessarily, undefined, then aren't you still reaching for a "magic" solution?

Exactly the problem with all "knowing how to use AI correctly" advice out there rn. Shamans with drums, at the end of the day :-)

[+] thomastjeffery|8 months ago|reply

Models are Biases.

There is no objective truth. Everything is arbitrary.

There is no such thing as "accurate" or "precise". Instead, we get to work with "consistent" and "exhaustive". Instead of "calculated", we get "decided". Instead of "defined" we get "inferred".

Really, the whole narrative about "AI" needs to be rewritten from scratch. The current canonical narrative is so backwards that it's nearly impossible to have a productive conversation about it.

[+] andy99|8 months ago|reply

It's called over-fitting, that's basically what prompt engineering is.

[+] niemandhier|8 months ago|reply

LLM agents remind me of the great Nolan movie „Memento“.

The agents cannot change their internal state hence they change the encompassing system.

They do this by injecting information into it in such a way that the reaction that is triggered in them compensates for their immutability.

For this reason I call my agents „Sammy Jenkins“.

[+] baxtr|8 months ago|reply

>Conclusion

Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates. It is about the engineering of context and providing the right information and tools, in the right format, at the right time. It’s a cross-functional challenge that involves understanding your business use case, defining your outputs, and structuring all the necessary information so that an LLM can “accomplish the task."

That’s actually also true for humans: the more context (aka right info at the right time) you provide the better for solving tasks.

[+] zaptheimpaler|8 months ago|reply

I feel like this is incredibly obvious to anyone who's ever used an LLM or has any concept of how they work. It was equally obvious before this that the "skill" of prompt-engineering was a bunch of hacks that would quickly cease to matter. Basically they have the raw intelligence, you now have to give them the ability to get input and the ability to take actions as output and there's a lot of plumbing to make that happen.

[+] crystal_revenge|8 months ago|reply

Definitely mirrors my experience. One heuristic I've often used when providing context to model is "is this enough information for a human to solve this task?". Building some text2SQL products in the past it was very interesting to see how often when the model failed, a real data analyst would reply something like "oh yea, that's an older table we don't use any more, the correct table is...". This means the model was likely making a mistake that a real human analyst would have without the proper context.

One thing that is missing from this list is: evaluations!

I'm shocked how often I still see large AI projects being run without any regard to evals. Evals are more important for AI projects than test suites are for traditional engineering ones. You don't even need a big eval set, just one that covers your problem surface reasonably well. However without it you're basically just "guessing" rather than iterating on your problem, and you're not even guessing in a way where each guess is an improvement on the last.

edit: To clarify, I ask myself this question. It's frequently the case that we expect LLMs to solve problems without the necessary information for a human to solve them.

[+] zacharyvoase|8 months ago|reply

I love how we have such a poor model of how LLMs work (or more aptly don't work) that we are developing an entire alchemical practice around them. Definitely seems healthy for the industry and the species.

[+] simonw|8 months ago|reply

The stuff that's showing up under the "context engineering" banner feels a whole lot less alchemical to me than the older prompt engineering tricks.

Alchemical is "you are the world's top expert on marketing, and if you get it right I'll tip you $100, and if you get it wrong a kitten will die".

The techniques in https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... seem a whole lot more rational to me than that.

[+] hackable_sand|8 months ago|reply

This is offensive to alchemy.

[+] __MatrixMan__|8 months ago|reply

Reminds me of quantum mechanics

[+] munificent|8 months ago|reply

All of these blog posts to me read like nerds speedrunning "how to be a tech lead for a non-disastrous internship".

Yes, if you have an over-eager but inexperienced entity that wants nothing more to please you by writing as much code as possible, as the entity's lead, you have to architect a good space where they have all the information they need but can't get easily distracted by nonessential stuff.

[+] tptacek|8 months ago|reply

Just to keep some clarity here, this is mostly about writing agents. In agent design, LLM calls are just primitives, a little like how a block cipher transform is just a primitive and not a cryptosystem. Agent designers (like cryptography engineers) carefully manage the inputs and outputs to their primitives, which are then composed and filtered.

[+] dinvlad|8 months ago|reply

I feel like ppl just keep inventing concepts for the same old things, which come down to dancing with the drums around the fire and screaming shamanic incantations :-)

[+] viccis|8 months ago|reply

When I first used these kinds of methods, I described it along those lines to my friend. I told him I felt like I was summoning a demon and that I had to be careful to do the right incantations with the right words and hope that it followed my commands. I was being a little disparaging with the comment because the engineer in me that wants reliability, repeatability, and rock solid testability struggles with something that's so much less under my control.

God bless the people who give large scale demos of apps built on this stuff. It brings me back to the days of doing vulnerability research and exploitation demos, in which no matter how much you harden your exploits, it's easy for something to go wrong and wind up sputtering and sweating in front of an audience.

[+] ozim|8 months ago|reply

Finding a magic prompt was never “prompt engineering” it was always “context engineering” - lots of “AI wannabe gurus” sold it as such but they never knew any better.

RAG wasn’t invented this year.

Proper tooling that wraps esoteric knowledge like using embeddings, vector dba or graph dba becomes more mainstream. Big players improve their tooling so more stuff is available.

[+] mountainriver|8 months ago|reply

You can give most of the modern LLMs pretty darn good context and they will still fail. Our company has been deep down this path for over 2 years. The context crowd seems oddly in denial about this

[+] arkmm|8 months ago|reply

What are some examples where you've provided the LLM enough context that it ought to figure out the problem but it's still failing?

[+] ethan_smith|8 months ago|reply

We've experienced the same - even with perfectly engineered context, our LLMs still hallucinate and make logical errors that no amount of context refinement seems to fix.

[+] tupac_speedrap|8 months ago|reply

I mean at some point it is probably easier to do the work without AI and at least then you would actually learn something useful instead of spending hours crafting context to actually get something useful out of an AI.

[+] ModernMech|8 months ago|reply

"Wow, AI will replace programming languages by allowing us to code in natural language!"

"Actually, you need to engineer the prompt to be very precise about what you want to AI to do."

"Actually, you also need to add in a bunch of "context" so it can disambiguate your intent."

"Actually English isn't a good way to express intent and requirements, so we have introduced protocols to structure your prompt, and various keywords to bring attention to specific phrases."

"Actually, these meta languages could use some more features and syntax so that we can better express intent and requirements without ambiguity."

"Actually... wait we just reinvented the idea of a programming language."

[+] throwawayoldie|8 months ago|reply

Only without all that pesky determinism and reproducibility.

(Whoever's about to say "well ackshually temperature of zero", don't.)

[+] nimish|8 months ago|reply

A half baked programming language that isn't deterministic or reproducible or guaranteed to do what you want. Worst of all worlds unless your input and output domains are tolerant to that, which most aren't. But if they are, then it's great

[+] georgeburdell|8 months ago|reply

We should have known up through Step 4 for a while. See: the legal system

[+] mindok|8 months ago|reply

“Actually - curly braces help save space in the context while making meaning clearer”

[+] 8organicbits|8 months ago|reply

One thought experiment I was musing on recently was the minimal context required to define a task (to an LLM, human, or otherwise). In software, there's a whole discipline of human centered design that aims to uncover the nuance of a task. I've worked with some great designers, and they are incredibly valuable to software development. They develop journey maps, user stories, collect requirements, and produce a wealth of design docs. I don't think you can successfully build large projects without that context.

I've seen lots of AI demos that prompt "build me a TODO app", pretend that is sufficient context, and then claim that the output matches their needs. Without proper context, you can't tell if the output is correct.

[+] CharlieDigital|8 months ago|reply

I was at a startup that started using OpenAI APIs pretty early (almost 2 years ago now?).

"Back in the day", we had to be very sparing with context to get great results so we really focused on how to build great context. Indexing and retrieval were pretty much our core focus.

Now, even with the larger windows, I find this still to be true.

The moat for most companies is actually their data, data indexing, and data retrieval[0]. Companies that 1) have the data and 2) know how to use that data are going to win.

My analogy is this:

    > The LLM is just an oven; a fantastical oven.  But for it to produce a good product still depends on picking good ingredients, in the right ratio, and preparing them with care.  You hit the bake button, then you still need to finish it off with presentation and decoration.

[0] https://chrlschn.dev/blog/2024/11/on-bakers-ovens-and-ai-sta...

[+] 0points|8 months ago|reply

Only more mental exercises to avoid reading the writing on the wall:

LLM DO NOT REASON !

THEY ARE TOKEN PREDICTION MACHINES

Thank you for your attention in this matter!

[+] kachapopopow|8 months ago|reply

I'll quote myself since it seems oddly familiar:

---

Forget AI "code", every single request will be processed BY AI! People aren't thinking far enough, why bother with programming at all when an AI can just do it?

It's very narrow to think that we will even need these 'programmed' applications in the future. Who needs operating systems and all that when all of it can just be AI.

In the future we don't even need hardware specifications since we can just train the AI to figure it out! Just plug inputs and outputs from a central motherboard to a memory slot.

Actually forget all that, it'll just be a magic box that takes any kind of input and spits out an output that you want!

[+] jumploops|8 months ago|reply

To anyone who has worked with LLMs extensively, this is obvious.

Single prompts can only get you so far (surprisingly far actually, but then they fall over quickly).

This is actually the reason I built my own chat client (~2 years ago), because I wanted to “fork” and “prune” the context easily; using the hosted interfaces was too opaque.

In the age of (working) tool-use, this starts to resemble agents calling sub-agents, partially to better abstract, but mostly to avoid context pollution.

[+] mrhillsman|8 months ago|reply

Hi everyone,

After working on something related for some months now I would like to put it out there based on the considerable attention being put towards "context engineering". I am proposing the *Context Window Architecture (CWA)* – a conceptual reference architecture to bring engineering discipline to LLM prompt construction. Would love for others to participate and provide feedback. A reference implementation where CWA is used in a real-world/pragmatic scenario could be great to tease out more regarding context engineering and if CWA is useful. Additionally I am no expert by far so feedback and collaboration would be awesome.

Blog post: https://mrhillsman.com/posts/context-engineering-realized-co...

Proposal via Google Doc: https://docs.google.com/document/d/1qR9qa00eW8ud0x7yoP2XicH3...

[+] slavapestov|8 months ago|reply

I feel like if the first link in your post is a tweet from a tech CEO the rest is unlikely to be insightful.

[+] coderatlarge|8 months ago|reply

i don’t disagree with your main point, but is karpathy a tech ceo right now?

518 comments