Attention Is the New Big-O: A Systems Design Approach to Prompt Engineering

trjordan|6 months ago

A few of these points are right, but a lot of it is so far lost from the reality of LLMs that it isn't even wrong.

Yes: structure beats vibes. Primacy/recency bias is real. Treating prompts as engineering artifacts is empirically helpful.

But:

- “Reads all tokens at once.” Not quite. Decoder LLMs do a parallel prefill over the prompt, then sequential token-by-token decode with causal masks. That nuance is why primacy/recency and KV-cache behavior matter, and why instruction position can swing results.

- Embeddings & “labels.” Embeddings are learned via self-supervised next-token prediction, not from a labeled thesaurus. “Feline ≈ cat” emerges statistically, not by annotation.

- "Structure >> content". Content is what actually matters. “Well-scaffolded wrong spec” will give you confidently wrong output.

- Personas are mostly style. Yes, users like words typed in their style better, but it'll actually hide certain information that a "senior software engineer" might not know.

I don't really get the Big-O analogy thing, either. Models are constantly exposing and shifting how they direct attention, which is exactly the opposite of the durably true nature of algorithmic complexity. Memorizing how current models like their system prompts written is hardly the same thing.

jimkri|6 months ago

I agree with your opinion. This statement you called out, “Reads all tokens at once.”, shows the misunderstanding, and then the document header formatting shows/proves the lack of attention used.

The headers go from numbered to having just 1 section with a sub-number. Section 3 has 3.1-3.4, and then the next section doesn't follow it.

I noticed this when doing large-scale LaTeX documentation builds if you are not explicit about the formatting. The well-scaffolded build falls apart since the token count is too high, and a proper batch process is not in place, or the prompt is vague. "Use clear headings and bullet points" is not precise. Depending on your document type, you need to state all requirements to design with attention.

eric-burel|6 months ago

I don't buy the arguments made here. You can't call it attention optimized without opening the LLM brain and assessing what happens in the attention layers. Does the quoted paper did some of that? I know Anthropic is advanced in this area but I haven't seen that many results elsewhere yet. I mean the fact that optimizing for attention makes better prompt is a solid hypothesis, but I don't read a proof here.

ozgung|6 months ago

I think the author is confusing the attention mechanism with attention in the general sense. He refers to a phenomenon called “attention drift”, but neither Google nor LLM searches return any relevant references to that term. In fact, ChatGPT only cited this same blog post.

jdefr89|6 months ago

Yea they sound like plausible enough arguments.. I often test my vague input against structured ones and for many tasks it didn't seem like hallucinations happened significantly less. Honestly the more structured you have to be, the less "general" your model probably is, and in theory you want your model to be as general as possible. Seems like here, you're simply helping it over-fit.. At least that's what my intuition tells me but I am yet to really check that out either.

gashmol|6 months ago

Aside - It's funny to me how many developers still don't like to call their craft engineering and how fast LLM users jumped on the opportunity.

rockostrich|6 months ago

Not sure if this is a part of it, but there are places where "engineer" is a protected title that requires a degree or license. I work with a bunch of folks in Quebec and they have to use the title "software developer" unless they are a member of the Order of Engineers. I find this to be pretty silly considering someone can have a degree in Mechanical Engineering and use the title "Software Engineer" but someone with a degree in Computer Science can't.

nutjob2|6 months ago

The less rigorous and more vague some theory is, the easier it is to use it to make unfalsifiable claims. That's the essence of the current discussion around AGI. No one knows what it is or describe it concretely so it can do anything and everything and everyone's going to lose their jobs.

It's funny because of the irony of "prompt engineering" being as close to cargo culting as things get. No one knows what the model is or how it's structured in a higher level (non implementation) sense, people just try different things until something works, and try what they've seen other people do.

This article is at least interesting in that it takes a stab at explaining prompt efficacy with some sort of concrete basis, even if it lacks rigor.

It's actually a really important question about LLMs: how are they to be used to get the best results? All the work seems to be on the back end, but the front end is exceedingly important. Right now it's some version of Deep Thought spitting out the answer '42'.

NitpickLawyer|6 months ago

TBF I always read prompt engineering as the 2nd definition, not the first.

> 1. the branch of science and technology concerned with the design, building, and use of engines, machines, and structures.

2. the action of working artfully to bring something about.

So you're trying to learn what / how to prompt, in order to "bring something about" (the results you're after).

whoamii|6 months ago

What’s the problem?

-A Prompt Architect

nateroling|6 months ago

Can you write a prompt to optimize prompts?

Seems like an LLM should be able to judge a prompt, and collaboratively work with the user to improve it if necessary.

alexc05|6 months ago

100% yes! There've been some other writers who've been doing parallel work around that in the last couple weeks.

https://www.dbreunig.com/2025/06/10/let-the-model-write-the-... is an example.

You can see the hands on results in this hugging face branch I was messing around in:

here is where I tell the LLM to generate prompts for me based on research so far

https://github.com/AlexChesser/transformers/blob/personal/vi...

here is the prompts that produced:

https://github.com/AlexChesser/transformers/tree/personal/vi...

and here is the result of those prompts:

https://github.com/AlexChesser/transformers/tree/personal/vi.... (also look at the diagram folders etc..)

gerad|6 months ago

GPT 5 prompt optimizer: https://platform.openai.com/chat/edit?models=gpt-5&optimize=...

chopete3|6 months ago

I use Grok to write the prompts. Its excellent. I think human created prompts are insufficient in almost all cases.

Write your prompt in some shape and ask grok

Please rewrite this prompt for higher accuracy

-- Your prompt

user3939382|6 months ago

The LLM is basically a runtime that needs optimized input bc the output is compute bottlenecked. Input quality scales with domain knowledge, specificity and therefore human time input. You can absolutely navigate an LLMs attention piecemeal around a spec until you build an optimized input.

CuriouslyC|6 months ago

This is pretty much DSPy.

slt2021|6 months ago

yes, just prepend your request to llm with "Please give me a well-structured LLM prompt that will solve this problem..."

Xorakios|6 months ago

FWIW, after my programming skills became hopelessly outdated, with an economics degree and too old to start over, I generally promoted my skillset as a translator between business and tech teams.

A lot of what I received as input was more like the first type of instruction, what I sent to the actual development team was closer to the second.

lubujackson|6 months ago

To add to these good points - for bigger changes, don't just have LLMs restructure your prompt, but break it down into a TODO list, a summary and/or build a scaffolded result then continue from that. LLMs thrive in structure, and the more architectural you can make both your inputs and outputs, the more consistent your results will be.

For example, pass an LLM a JSON structure with keys but no values and it tends to do a much better job populating the values than trying to fully generate complex data from a prompt alone. Then you can take that populated JSON to do something even more complex in a second prompt.

boredtofears|6 months ago

Does Claude code do this by default? It seems for most prompts that I give it, it ends up breaking things into TODO lists and reformulating the prompt. This seems to work well for most tasks.

cobbzilla|6 months ago

This article has some solid general advice on prompt-writing for anyone, even though the examples are technical.

I found the “Big-O” analogy a bit strained & distracting, but still a good read.

alexc05|6 months ago

I'll admit that it's a little shoehorned in at the end :)

... but you know how editors are with writing the headline for clicks against the wishes of the journalist writing the article. You'll always see Journos sayign stuff like "don't blame me, that's my editor, I don't write the headlines"

I did toy with the idea of going with something like: `Prompt Engineering is a wrapper around Attention.`

But my editor overruled me *FOR THE CLICKS!!!*

Full disclosure: I'm also the editor

mavilia|6 months ago

This was a great refresher on things I’ve seen writings of but never thought deeply about. A lot of it already “made sense” yet in my day to day I’m still doing the bad versions of the prompts.

Do you have preference n a more continual system like Claude Code for one big prompt or just trying to do one task and starting something new?

CuriouslyC|6 months ago

Having a lot of context in the main agent is an antipattern. You want the main agent to call subagents to keep high level reasoning context clean. If you're not using subagents you should start a new chat any time you're doing something fairly different from what you were doing before.

esafak|6 months ago

Could you share an example with results so we can see what difference it made?

alexc05|6 months ago

I can't republish anything that happens in a production/proprietary environment.

One of the things that I think is pretty great about being able to share these particular prompts is that you can run this on one of your own repos to see how it turns out.

ACTUALLY!! Hold on. A couple weekends ago I spent some time doing some underlying research with huggingface/transformers and I have it on a branch.

https://github.com/AlexChesser/transformers/tree/personal/vi...

You can look at the results of an architectural research prompt.

Unfortunately I don't have a "good mode" side by side with a "bad mode" at the moment. I can work on that in the future.

The underlying research linked has the experimental design version of this with each piece evaluated in isolation.

cobbzilla|6 months ago

oh my please read TFA it has exactly the answers you seek.

energy123|6 months ago

I can vouch for this prompting best practice. It leads to better results and better instruction following, whatever the cause.

ath3nd|6 months ago

I hate to be a purist here, but structing your sentences into coherent chunks is basic communication, and giving it fancy names like "prompt engineering" is doing a disservice to the term "engineering" which is already a concept stretched pretty thin.

LLMs are random and unpredictable, the opposite of what real engineering is. We better start using terms like "invocations", "incantations", "spells" or "rain dance"/"rituals" to describe how to effectively "talk" to LLMs, because a science it most definitely isn't.

And yeah, taking the five seconds extra to do the bare minimum in structuring your communication will yield better results in literally any effort. Don't see why this concept deserves an article.

PS I am also extremely triggered from the idea of comparing Big-O, a scientific term and exact concept with well understood and predictable outcomes, with "prompt engineering" which is basically "my random thoughts and anecdotal biases of how to communicate better with one of the many similar but different fancy autocompletes with randomness built in".

jwilber|6 months ago

Nowadays, basically no architecture with an API is using standard attention anymore. There are all kinds of attention alternatives (e.g. Hyena) and tricks (e.g. Sliding Window, etc.) that make this analogy, as presented, flat out incorrect.

In addition, for the technical aspect to make sense, a more effective article would place the points should be shown alongside evals. For example, if you're trying to make a point about where to put important context in the prompt, show a classic needle-in-the-haystack eval, or a jacobian matrix, alongside the results. Otherwise it's largely more prompt fluff.

slt2021|6 months ago

also LLM providers optimize your request routing to a cheaper model sometimes, I wonder if there is a way to structure prompt that will route to a large (more expensive, and arguably better) model for better results

AlecSchueler|6 months ago

> There are all kinds of attention alternatives (e.g. Hyena) and tricks (e.g. Sliding Window, etc.) that make this analogy, as presented, flat out incorrect.

Not to doubt you but could you explain why so?

45 comments