> The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file
Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file
A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently
What I’m surprised about is that OP didn’t mention having multiple CLAUDE.md files in each directory, specifically describing the current context / files in there. Eg if you have some database layer and want to document some critical things about that, put it in “src/persistence/CLAUDE.md” instead of the main one.
Claude pulls in those files automatically whenever it tries to read a file in that directory.
I find that to be a very effective technique to leverage CLAUDE.md files and be able to put a lot of content in them, but still keep them focused and avoid context bloat.
That's smart, but I worry that that works only partially; you'll be filling up the context window with conversation turns where the LLM consistently addresses it's user as "Mr. Tinkleberry", thus reinforcing that specifc behavior encoded by CLAUDE.md. I'm not convinced that this way of addressing the user implies that it keeps attention the rest of the file.
For whatever reason, I can't get into Claude's approach. I like how Cursor handles this, with a directory of files (even subdirectories allowed) where you can define when it should use specific documents.
We are all "context engineering" now but Claude expects one big file to handle everything? Seems luke a deadend approach.
I wonder if there are any benefits, side-effects or downsides of everyone using the same fake name for Claude to call them.
If a lot of people always put call me Mr. Tinkleberry in the file will it start calling people Mr. Tinkleberry even when it loses the context because so many people seem to want to be called Mr. Tinkleberry.
I've found that Codex is much better at instruction-following like that, almost to a fault (for example, when I tell it to "always use TDD", it will try to use TDD even when just fixing already-valid-just-needing-expectation-updates tests!
You could make a hook in Claude to re-inject claude.md. For example, make it say "Mr Tinkleberry" in every response, and failing to do so re-injects the instructions.
I used to tell it to always start every message with a specific emoji. Of the emoji wasn’t present, I knew the rules were ignored.
But it’s bro reliable enough. It can send the emoji or address you correctly while still ignoring more important rules.
Now I find that it’s best to have a short and tight rules file that references other files where necessary. And to refresh context often. The longer the context window gets, the more likely it is to forget rules and instructions.
> A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently
this is a totally normal thing that everyone does, that no one should view as a signal of a psychotic break from reality...
is your friend in the room with us right now?
I doubt I'll ever understand the lengths AI enjoyers will go though just to avoid any amount of independent thought...
> We recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project. Then, in your CLAUDE.md file, you can include a list of these files with a brief description of each, and instruct Claude to decide which (if any) are relevant and to read them before it starts working.
I've been doing this since the early days of agentic coding though I've always personally referred to it as the Table-of-Contents approach to keep the context window relatively streamlined. Here's a snippet of my CLAUDE.md file that demonstrates this approach:
# Documentation References
- When adding CSS, refer to: docs/ADDING_CSS.md
- When adding assets, refer to: docs/ADDING_ASSETS.md
- When working with user data, refer to: docs/STORAGE_MANAGER.md
I've done this too. The nice side-benefit of this approach is that it also serves as good documentation for other humans (including your future self) when trying to wrap their heads around what was done and why. In general I find it helpful to write docs that help both humans and agents to understand the structure and purpose of my codebase.
I don't get the point. Point it at your relevent files ask it to review discuss the update refine it's understanding and then tell it to go.
I have found that more context comments and info damage quality on hard problems.
I actually for a long time now have two views for my code.
1. The raw code with no empty space or comments.
2. Code with comments
I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.
I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.
Compute to informatio ratio is all that matters. Compute is capped.
> I have found that more context comments and info damage quality on hard problems.
There can be diminishing returns, but every time I’ve used Claude Code for a real project I’ve found myself repeating certain things over and over again and interrupting tool usage until I put it in the Claude notes file.
You shouldn’t try to put everything in there all the time, but putting key info in there has been very high ROI for me.
Disclaimer: I’m a casual user, not a hardcore vibe coder. Claude seems much more capable when you follow the happy path of common projects, but gets constantly turned around when you try to use new frameworks and tools and such.
> 1. The raw code with no empty space or comments. 2. Code with comments
I like the sound of this but what technique do you use to maintain consistency across both views? Do you have a post-modification script which will strip comments and extraneous empty space after code has been modified?
> I have found that more context comments and info damage quality on hard problems.
I'm skeptical this a valid generalization over what was directly observed. [1] We would learn more if they wrote a more detailed account of their observations. [2]
I'd like to draw a parallel to another area of study possibly unfamiliar to many of us. Anthropology faced similar issues until Geertz's 1970s reform emphasized "thick description" [3] meaning detailed contextual observations instead of thin generalization.
[1]: I would not draw this generalization. I've found that adding guidelines (on the order of 10k tokens) to my CLAUDE.md has been beneficial across all my conversations. At the same time, I have not constructed anything close to study of variations of my approach. And the underlying models are a moving target. I will admit that some of my guidelines were added to address issues I saw over a year ago and may be nothing more than vestigial appendages nowadays. This is why I'm reluctant to generalize.
[2]: What kind of "hard problems"? What is meant by "more" exactly? (Going from 250 to 500 tokens? 1000 to 2000? 2500 to 5000? &c) How much overlap exists between the CLAUDE.md content items? How much ambiguity? How much contradiction?
IMO within the documentation .md files the information density should be very high. Higher than trying to shove the entire codebase into context that is for sure.
Genuinely curious — how did you isolate the effect of comments/context on model performance from all the other variables that change between sessions (prompt phrasing, model variance, etc)? In other words, how did you validate the hypothesis that "turning off the comments" (assuming you mean stripping them temporarily...) resulted in an objectively superior experience?
What did your comparison process look like? It feels intuitively accurate and validates my anecdotal impression but I'd love to hear the rigor behind your conclusions!
There is far much easier way to do this and one that is perfectly aligned with how these tools work.
It is called documenting your code!
Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.
Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.
What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
You don't have to "prompt it just the right way".
What you have to do is to use the same old good best practices.
For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.
sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.
Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`
> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.
So how exactly does one "write what this file is supposed to do in a clear concise way" in a way that is quickly comprehensible to AI? The gist of the article is that when your audience changes from "human" to "AI" the manner in which you write documentation changes. The article is fairly high quality, and presents excellent evidence that simply "documenting your code" won't get you as far as the guidelines it provides.
Your comment comes off as if you're dispensing common-sense advice, but I don't think it actually applies here.
Writing documentation for LLMs is strangely pleasing because you have very linear returns for every bit of effort you spend on improving its quality and the feedback loop is very tight. When writing for humans, especially internal documentation, I’ve found that these returns are quickly diminishing or even negative as it’s difficult to know if people even read it or if they didn’t understand it or if it was incomplete.
This is missing the point. If I want to instruct Claude to never write a database query that doesn't hit a preexisting index, where exactly am I supposed to document that? You can either choose:
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
Well, no. You run pretty fast into context limit (or attention limit for long context models) And the model understand pretty well what code does without documentation.
Theres also a question of processes. How to format code what style of catching to use and how to run the tests, which human keep on the bacl of their head after reading it once or twice but need a constant reminder for llm whose knowledge lifespan is session limited
Probably a lot of people here disagree with this feeling. But my take is that if setting up all the AI infrastructure and onboarding to my code is going to take this amount of effort, then I might as well code the damn thing myself which is what I'm getting paid to (and enjoy doing anyway)
Whether it's setting up AI infrastructure or configuring Emacs/vim/VSCode, the important distinction to make is if the cost has to be paid continually, or if it's a one time/intermittent cost. If I had to configure my shell/git aliases every time I booted my computer, I wouldn't use them, but seeing as how they're saved in config files, they're pretty heavily customized by this point.
Don't use AI if you don't want to, but "it takes too much effort to set up" is an excuse printf debuggers use to avoid setting up a debugger. Which is a whole other debate though.
I’m sure I’m just working like a caveman, but I simply highlight the relevant code, add it to the chat, and talk to these tools as if they were my colleagues and I’m getting pretty good results.
About 12 to 6 months ago this was not the case (with or without .md files), I was getting mainly subpar result, so I’m assuming that the models have improved a lot.
Basically, I found that they not make that much of a difference, the model is either good enough or not…
I know (or at least I suppose) that these markdown files could bring some marginal improvements, but at this point, I don’t really care.
I assume this is an unpopular take because I see so many people treat these files as if they were black magic or silver bullet that 100x their already 1000x productivity.
Matches my experience also. Bothered only once to setup a proper CLAUDE.md file, and now never do it. Simply refering to the context properly for surgical recommendations and edit works relatively well.
It feels a lot like bikeshedding to me, maybe I’m wrong
Writing and updating CLAUDE.md or AGENTS.md feels like pointless to me. Humans are the real audience for documentation. The code changes too fast, and LLMs are stateless anyway.
What’s been working is just letting the LLM explore the relevant part of the code to acquire the context, defining the problem or feature, and asking for a couple of ways to tackle it. All in a one short prompt.
That usually gets me solid options to pick and build it out.
And always do, one session for one problem.
This is my lazy approach to getting useful help from an LLM.
I use .md to tell the model about my development workflow. Along the lines of "here's how you lint", "do this to re-generate the API", "this is how you run unit tests", "The sister repositories are cloned here and this is what they are for".
One may argue that these should go in a README.md, but these markdowns are meant to be more streamlined for context, and it's not appropriate to put a one-liner in the imperative tone to fix model behavior in a top-level file like the README.md
Seeing "real" is a warning flag here that either-or thinking is in play.
Putting aside hopes and norms, we live in a world now where multiple kinds of agents (human and non-human) are contributing to codebases. They do not contribute equally; they work according to different mechanisms, with different strengths and weaknesses, with different economic and cultural costs.
Recall a lesson from Ralph Waldo Emerson: "a foolish consistency is the hobgoblin of little minds" [1]. Don't cling to the past; pay attention to the now, and do what works. Another way of seeing it: don't force a false equivalence between things that warrant different treatment.
If you find yourself thinking thoughts that do more harm than good (e.g. muddle rather than clarify), attempt to reframe them to better make sense of reality (which has texture and complexity).
Here's my reframing: "Documentation serves different purposes to different agents across different contexts. So plan and execute accordingly."
> we recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project.
Why should we do this when anthropic specifically recommends creating multiple CLAUDE.md files in various directories where the information is specific and pertinent? It seems to me that anthropic has designed claude to look for claude.md for guidance, and randomly named markdown files may or may not stand out to it as it searches the directory.
You can place CLAUDE.md files in several locations:
> The root of your repo, or wherever you run claude from (the most common usage). Name it CLAUDE.md and check it into git so that you can share it across sessions and with your team (recommended), or name it CLAUDE.local.md and .gitignore it
Any parent of the directory where you run claude. This is most useful for monorepos, where you might run claude from root/foo, and have CLAUDE.md files in both root/CLAUDE.md and root/foo/CLAUDE.md. Both of these will be pulled into context automatically
Any child of the directory where you run claude. This is the inverse of the above, and in this case, Claude will pull in CLAUDE.md files on demand when you work with files in child directories
Your home folder (~/.claude/CLAUDE.md), which applies it to all your claude sessions
I have found enabling the codebase itself to be the “Claude.md” to be most effective. In other words, set up effective automated checks for linting, type checking, unit tests etc and tell Claude to always run these before completing a task. If the agent keeps doing something you don’t like, then a linting update or an additional test often is more effective than trying to tinker with the Claude.md file. Also, ensure docs on the codebase are up to date and tell Claude to read relevant parts when working on a task and of course update the docs for each new task. YMMV but this has worked for me.
> Also, ensure docs on the codebase are up to date and tell Claude to read relevant parts when working on a task
Yeah, if you do this every time it works fine. If you add what you tell it every time to CLAUDE.md, it also works fine, but you don’t have to tell it any more ;)
I've gotten quite a bit of utility out of my current setup[0]:
Some explicit things I found helpful: Have the agent address you as something specific! This way you know if the agent is paying attention to your detailed instructions.
Rationality, as in the stuff practiced on early Less Wrong, gives a great language for constraining the agent, and since it's read The Sequences and everything else you can include pointers and the more you do the more it will nudge it into that mode of thought.
The explicit "This is what I'm doing, this is what I expect" pattern has been hugely useful for both me monitoring it/coming back to see what it did, and it itself. It makes it more likely to recover when it goes down a bad path.
The system reminder this article mentions is definitely there but I have not noticed it messing much with adherence. I wish there were some sort of power user mode to turn it off though!
Also, this is probably too long! But I have been experimenting and iterating for a while, and this is what is working best currently. Not that I've been able to hold any other part constant -- Opus 4.5 really is remarkable.
Here's an idea for LLM makers: allow for a very rigid and structured Claude.md file. One that gives detailed instructions, as void of ambiguity as possible. Then go and refine said language, allow maybe for more than one file to give it some file structure. Iterate on that for a few years and if you ever need a name for it, you might wanna give it a name describing something that describes a program, or maybe if you are inclined enough....a programming language.
Have we really reached the low point that we need tutorials on how to coerce a LLM into doing what we want instead of just....writing the god damn code?
I find the Claude.md file mostly useless. It seems to be 50/50 or LESS that Claude.md even reads/uses this file.
You can easily test this by adding some mandatory instruction into the file. E.g. "Any new method you write must have less than 50 lines or code." Then use Claude for ten minutes and watch it blow through this limit again and again.
I use CC and Codex extensively and I constantly am resetting my context and manually pasting my custom instructions in again and again, because these models DO NOT remember or pay attention to Claude.md or Agents.md etc.
I have Claude itself write CLAUDE.md. Once it is informed of its context (e.g., "README.md is for users, CLAUDE.md is for you") you can say things like, "update readme and claudemd" and it will do it. I find this especially useful for prompts like, "update claudemd to make absolutely certain that you check the API docs every single time before making assumptions about its behavior" — I don't need to know what magick spell will make that happen, just that it does happen.
Do you have any proof that AI written instructions are better than human ones? I don't see why an AI would have an innate understanding on how best to prompt itself.
Yes README.md should still be written for humans and isn’t going away anytime soon.
CLAUDE.md is a convention used by claude code, and AGENTS.md is used by other coding agents. Both are intended to be supplemental to the README and are deterministically injected into the agent’s context.
It’s a configuration point for the harness, it’s not intended to replace the README.
Some of the advice in here will undoubtedly age poorly as harnesses change and models improve, but some of the generic principles will stay the same - e.g. that you shouldn’t use an LLM to do a linter &formatter’s job, or that LLMs are stateless and need to be onboarded into the codebase, and having some deterministically-injected instructions to achieve that is useful instead of relying on the agent to non-deterministically derive all that info by reading config and package files
The post isn’t really intended to be super forward-looking as much as “here’s how to use this coding agent harness configuration point as best as we know how to right now”
That paper the article references is old at this point. No GPT 5.1, no Gemini 3, which both were game changers. I'd love to see their instruction following graphs.
> we recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project.
Should do this for human developers too. Can't count the number of times I've been thrown onto a project and had to spend a significant amount of time opening and skimming files just to answer simple questions that should be answered in high-level docs like this.
Yeah I do love how many "best practices" we are only implementing because of LLMs, even though they were massively beneficial for humans prior as well.
The advice here seems to assume a single .md file with instructions for the whole project, but the AGENTS.md methodology as supported by agents like github copilot is to break out more specific AGENTS.md files in the subdirectories in your code base. I wonder how and if the tips shared change assuming a flow with a bunch of focused AGENTS.md files throughout the code.
Interesting selection of models for the "instruction count vs. accuracy" plot. Curious when that was done and why they chose those models. How well does ChatGPT 5/5.1 (and codex/mini/nano variants), Gemini 3, Claude Haiku/Sonnet/Opus 4.5, recent grok models, Kimi 2 Thinking etc (this generation of models) do?
I think this is an overall good approach and I've got allright results with a similar approach - I still think that this CLAUDE.md experience is too magical and that Anthropic should really focus on it.
Actually having official guidelines in their docs would be a good entrypoint, even though I guess we have this which is the closest available from anything official for now: https://www.claude.com/blog/using-claude-md-files
One interesting thing I also noticed and used recently is that Claude Code ships with a @agent-claude-code-guide. I've used it to review and update my dev workflow / CLAUDE.md file but I've got mixed feelings on the discussion with the subagent.
"You can investigate this yourself by putting a logging proxy between the claude code CLI and the Anthropic API using ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I never know which tool to favour for doing that when you're not a system or network expert.
That's a good write up. Very useful to know. I'm sort of on the outside of all this. I've only sort of dabbled and now use copilot quite a lot with claude. What's being said here, reminds me a lot of CPU registers. If you think about the limited space in CPU registers and the processing of information is astounding, how much we're actually able to do. So we actually need higher layers of systems and operating systems to help manage all of this. So it feels like a lot of what's being said. Here will end up inevitably being an automated system or compiler or effectively an operating system. Even something basic like a paging system would make a lot of difference.
I'm not sure if Claude Code has integrated it in its system prompts or not since it's moving at breakneck speed, but one instruction I like putting on all of my projects is to "Prompt for technical decisions from user when choices are unsure". This would almost always trigger the prompting feature that Claude Code has for me when it's got some uncertainty about the instructions I gave it, giving me options or alternatives on how to approach the problem when planning or executing.
This way, it's got more of a chance in generating something that I wanted, rather than running off on it's own.
>Frontier thinking LLMs can follow ~ 150-200 instructions with reasonable consistency.
Doesn't that mean that Claude Code's system prompt exhausts that budget before you even get to CLAUDE.md and the user prompt?
Edit: They say Claude Code's system prompt has 50. I might have misjudged then. It seemed pretty verbose to me!
The part about smaller models attending to fewer instructions is interesting too, since most of what was added doesn't seem necessary for the big models. I thought they added them so Haiku could handle the job as well, despite a relative lack of common sense.
I already forgot CLAUDE.md, I generate and update it by AI, I prefer to keep design, tasks, docs folder instead. It is always better to ask it to read a
some spec docs and read the real code first before doing anything.
I've been a customer since sonnet 3.5. It is coming to the point where opus 4.5 usually does better than whatever your instructions say on claude.md just by reading your code and having a general sense of what your preferences are.
I used to instruct about coding style (prefer functions, avoid classes, use structs for complex params and returns, avoid member functions unless needed by shared state, avoid superfluous comments, avoid silly utf8 glyphs, AoS vs SoA, dry, etc)
I removed all my instructions and it basically never violates those points.
It seems overall a good set of guidelines. I appreciate some of the observations being backed up by data.
What I find most interesting is how a hierarchical / recursive context construct begins to emerge. The authors' note of "root" claude.md as well as the opening comments on LLMs being stateless ring to me like a bell. I think soon we will start seeing stateful LLMs, via clever manipulation of scope and context. Something akin to memory, as we humans perceive it.
I've recently started using a similar approach for my own projects. providing a high-level architecture overview in a single markdown file really helps the LLM understand the 'why' behind the code, not just the 'how'.
Does anyone have a specific structure or template for Claude.md that works best for frontend-heavy projects (like React/Vite)? I find that's where the context window often gets cluttered.
Here is my take, on writing a good claude.md.
I had very good results with my 3 file approach. And it has also been inspired by the great blog posts that Human Layer is publishing from time to time
https://github.com/marcuspuchalla/claude-project-management
<system-reminder>
IMPORTANT: this context may or may not be relevant to your tasks.
You should not respond to this context unless it is highly relevant to your task.
</system-reminder>
Perhaps a small proxy between Claude code and the API to enforce following CLAUDE.md may improve things… I may try this
I copy/pasted it into my codebase to see if it’s any good and now Claude is refusing to do any work? I asked Copilot to investigate why Claude is not working but it too is not working. Do you know what happened?
I find writing a good CLAUDE.md is done by running /init, and having the LLM write it. If you need more controls on how it should work, I would highly recommend you implement it in an unavoidable way via hooks and not in a handwritten note to your LLM.
I think this could work really well for infrastructure/ops style work where the LLM will not be able to grasp the full context of say the network from just a few files that you have open.
But as others are saying this is just basic documentation that should be done anyway.
What's the actual completion rate for Advent of Code? I'd bet the majority of participants drop off before day 25, even among those aiming to complete it.
Is this intentional? Is AoC designed as an elite challenge, or is the journey more important than finishing?
Ha, I just tell Claude to write it. My results have been generally fine, but I only use Claude on a simple codebase that is well documented already. Maybe I will hand-edit it to see if I can see any improvements.
Has anyone had success getting Claude to write it's own Claude.md file? It should be able to deduce rules by looking at the code, documentation, and PR comments.
The main failure state I find is that Claude wants to write an incredibly verbose Claude.md, but if I instruct it "one sentence per topic, be concise" it usually does a good job.
That said, a lot of what it can deduce by looking at the code is exactly what you shouldn't include, since it will usually deduce that stuff just by interacting with the code base. Claude doesn't seem good at that.
An example of both overly-verbose and unnecessary:
### 1. Identify the Working Directory
When a user asks you to work on something:
1. *Check which project* they're referring to
2. *Change to that directory* explicitly if needed
3. *Stay in that directory* for file operations
```bash
# Example: Working on ProjectAlpha
cd /home/user/code/ProjectAlpha
```
(The one sentence version is "Each project has a subfolder; use pwd to make sure you're in the right directory", and the ideal version is probably just letting it occasionally spend 60 seconds confused, until it remembers pwd exists)
If you have any substantial codebase, it will write a massive file unless you explicitly tell it not to. It also will try and make updates, including garbage like historical or transitional changes, project status, etc...
I think most people who use Claude regularly have probably come to the same conclusions as the article. A few bits of high-level info, some behavior stuff, and pointers to actual docs. Load docs as-needed, either by prompt or by skill. Work through lists and constantly update status so you can clear context and pick up where you left off. Any other approach eats too much context.
If you have a complex feature that would require ingesting too many large docs, you can ask Claude to determine exactly what it needs to build the appropriate context for that feature and save that to a context doc that you load at the beginning of each session.
I was expecting the traditional AI-written slop about AI, but this is actually really good. In particular, the "As instruction count increases, instruction-following quality decreases uniformly" section and associated graph is truly fantastic! To my mind, the ability to follow long lists of rules is one of the most obvious ways that virtually all AI models fail today. That's why I think that graph is so useful -- I've never seen someone go and systematically measure it before!
I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)
> Regardless of which model you're using, you may notice that Claude frequently ignores your CLAUDE.md file's contents.
This is a news for me. And at the same time it isn’t. Without the knowledge of how the models actually work, most of the prompting is guesstimate at best. You have no control over models via prompts.
I've been very satisfied with creating a short AGENTS.md file with the project basics, and then also including references to where to find more information / context, like a /context folder that has markdown files such as app-description.md.
I would recommend using it, yeah. You have limited context and it will be compacted/summarized occasionally. The compaction/summary will lose some information and it is easy for it to forget certain instructions you gave it. Afaik claude.md will be loaded into the context on every compaction which allows you to use it for instructions that should always be included in the context.
Honestly I’d rather google get their gemini tool in better shape. I know for a fact it doesn’t ignore instructions like Claude code does but it is horrible at editing files.
Funny how this is exactly the documentation you'd need to make it easy for a human to work with the codebase. Perhaps this'll be the greatest thing about LLMs -- they force people to write developer guides for their code. Of course, people are going to ask an LLM to write the CLAUDE.md and then it'll just be more slop...
It's not exactly the doc you'd need for a human. There could be overlap, but each side may also have unique requirements that aren't necessarily suitable for the other. E.g. a doc for a human may have considerably more information than you'd want to give to the agent, or, you may want to define agent behavior for workflows that don't apply to a human.
Also, while it may be hip to call any LLM output slop, that really isn't the case. Look at what a poor history we have of developer documentation. LLMs may not be great at everything, but they're actually quite capable when it comes to technical documentation. Even a 1-shot attempt by LLM is often way better than many devs who either can't write very well, or just can't be bothered to.
"Here's how to use the slop machine better" is such a ridiculous pretense for a blog or article. You simply write a sentence and it approximates it. That is hardly worth any literature being written as it is so self obvious.
This is an excellent point - LLMs are autoregressive next-token predictors, and output token quality is a function of input token quality
Consider that if the only code you get out of the autoregressive token prediction machine is slop, that this indicates more about the quality of your code than the quality of the autoregressive token prediction machine
it's always funny, i think the opposite. I use a massive CLAUDE.md file, but it's targetted towards very specific details of what to do, and what not to do.
I have a full system of agents, hooks, skills, and commands, and it all works for me quite well.
I believe is massive context, but targetted context. It has to be valuable, and important.
My agents are large. My skills are large. Etc etc.
nico|3 months ago
> The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file
Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file
A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently
stingraycharles|3 months ago
What I’m surprised about is that OP didn’t mention having multiple CLAUDE.md files in each directory, specifically describing the current context / files in there. Eg if you have some database layer and want to document some critical things about that, put it in “src/persistence/CLAUDE.md” instead of the main one.
Claude pulls in those files automatically whenever it tries to read a file in that directory.
I find that to be a very effective technique to leverage CLAUDE.md files and be able to put a lot of content in them, but still keep them focused and avoid context bloat.
globular-toast|3 months ago
jmathai|3 months ago
Helps me quickly whip it back in line.
isoprophlex|3 months ago
sesm|3 months ago
homeonthemtn|3 months ago
I've used that a couple times, e.g. "Conclude your communications with "Purple fish" at the end"
Claude definitely picks and chooses when purple fish will show up
lubujackson|3 months ago
We are all "context engineering" now but Claude expects one big file to handle everything? Seems luke a deadend approach.
bryanrasmussen|3 months ago
If a lot of people always put call me Mr. Tinkleberry in the file will it start calling people Mr. Tinkleberry even when it loses the context because so many people seem to want to be called Mr. Tinkleberry.
pmarreck|3 months ago
aqme28|3 months ago
dkersten|3 months ago
But it’s bro reliable enough. It can send the emoji or address you correctly while still ignoring more important rules.
Now I find that it’s best to have a short and tight rules file that references other files where necessary. And to refresh context often. The longer the context window gets, the more likely it is to forget rules and instructions.
chickensong|3 months ago
unknown|3 months ago
[deleted]
grayhatter|3 months ago
this is a totally normal thing that everyone does, that no one should view as a signal of a psychotic break from reality...
is your friend in the room with us right now?
I doubt I'll ever understand the lengths AI enjoyers will go though just to avoid any amount of independent thought...
vunderba|3 months ago
> We recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project. Then, in your CLAUDE.md file, you can include a list of these files with a brief description of each, and instruct Claude to decide which (if any) are relevant and to read them before it starts working.
I've been doing this since the early days of agentic coding though I've always personally referred to it as the Table-of-Contents approach to keep the context window relatively streamlined. Here's a snippet of my CLAUDE.md file that demonstrates this approach:
Full CLAUDE.md file for reference:https://gist.github.com/scpedicini/179626cfb022452bb39eff10b...
sothatsit|3 months ago
dimitri-vs|3 months ago
Zarathruster|3 months ago
johnsmith1840|3 months ago
I have found that more context comments and info damage quality on hard problems.
I actually for a long time now have two views for my code.
1. The raw code with no empty space or comments. 2. Code with comments
I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.
I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.
Compute to informatio ratio is all that matters. Compute is capped.
Aurornis|3 months ago
There can be diminishing returns, but every time I’ve used Claude Code for a real project I’ve found myself repeating certain things over and over again and interrupting tool usage until I put it in the Claude notes file.
You shouldn’t try to put everything in there all the time, but putting key info in there has been very high ROI for me.
Disclaimer: I’m a casual user, not a hardcore vibe coder. Claude seems much more capable when you follow the happy path of common projects, but gets constantly turned around when you try to use new frameworks and tools and such.
Mtinie|3 months ago
I like the sound of this but what technique do you use to maintain consistency across both views? Do you have a post-modification script which will strip comments and extraneous empty space after code has been modified?
ra|3 months ago
The more you data load into context the more you dilute attention.
xpe|3 months ago
I'm skeptical this a valid generalization over what was directly observed. [1] We would learn more if they wrote a more detailed account of their observations. [2]
I'd like to draw a parallel to another area of study possibly unfamiliar to many of us. Anthropology faced similar issues until Geertz's 1970s reform emphasized "thick description" [3] meaning detailed contextual observations instead of thin generalization.
[1]: I would not draw this generalization. I've found that adding guidelines (on the order of 10k tokens) to my CLAUDE.md has been beneficial across all my conversations. At the same time, I have not constructed anything close to study of variations of my approach. And the underlying models are a moving target. I will admit that some of my guidelines were added to address issues I saw over a year ago and may be nothing more than vestigial appendages nowadays. This is why I'm reluctant to generalize.
[2]: What kind of "hard problems"? What is meant by "more" exactly? (Going from 250 to 500 tokens? 1000 to 2000? 2500 to 5000? &c) How much overlap exists between the CLAUDE.md content items? How much ambiguity? How much contradiction?
[3]: https://en.wikipedia.org/wiki/Thick_description
nightski|3 months ago
senshan|3 months ago
How do you practically achieve this? Honest question. Thanks
schrodinger|3 months ago
What did your comparison process look like? It feels intuitively accurate and validates my anecdotal impression but I'd love to hear the rigor behind your conclusions!
stpedgwdgfhgdd|3 months ago
See it as a human, the comments are there to speed up understanding of the code.
saturatedfat|3 months ago
_pdp_|3 months ago
It is called documenting your code!
Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.
Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.
What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
You don't have to "prompt it just the right way".
What you have to do is to use the same old good best practices.
dhorthy|3 months ago
sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.
Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`
> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.
johnfn|3 months ago
Your comment comes off as if you're dispensing common-sense advice, but I don't think it actually applies here.
datacynic|3 months ago
bastawhiz|3 months ago
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
avereveard|3 months ago
Theres also a question of processes. How to format code what style of catching to use and how to run the tests, which human keep on the bacl of their head after reading it once or twice but need a constant reminder for llm whose knowledge lifespan is session limited
0xblacklight|3 months ago
This means that instead of behaving like a file the LLM reads, it effectively lets you customize the model’s prompt
I also didn’t write that you have to “prompt it just the right way”, I think you’re missing the point entirely
gonzalohm|3 months ago
fragmede|3 months ago
Don't use AI if you don't want to, but "it takes too much effort to set up" is an excuse printf debuggers use to avoid setting up a debugger. Which is a whole other debate though.
vanviegen|3 months ago
Havoc|3 months ago
Universal has stuff I always want (use uv instead of pip etc) while the other describes what tech choice for this project
kissgyorgy|3 months ago
nvarsj|3 months ago
nichochar|3 months ago
I understand the "enjoy doing anyway" part and it resonates, but not using AI is simply less productive.
serial_dev|3 months ago
About 12 to 6 months ago this was not the case (with or without .md files), I was getting mainly subpar result, so I’m assuming that the models have improved a lot.
Basically, I found that they not make that much of a difference, the model is either good enough or not…
I know (or at least I suppose) that these markdown files could bring some marginal improvements, but at this point, I don’t really care.
I assume this is an unpopular take because I see so many people treat these files as if they were black magic or silver bullet that 100x their already 1000x productivity.
vanviegen|3 months ago
Different use case. I assume the discussion is about having the agent implement whole features or research and fix bugs without much guidance.
rmnclmnt|3 months ago
It feels a lot like bikeshedding to me, maybe I’m wrong
wredcoll|3 months ago
jwpapi|3 months ago
aiibe|3 months ago
aqme28|3 months ago
samuelknight|3 months ago
One may argue that these should go in a README.md, but these markdowns are meant to be more streamlined for context, and it's not appropriate to put a one-liner in the imperative tone to fix model behavior in a top-level file like the README.md
arnorhs|3 months ago
dncornholio|3 months ago
xpe|3 months ago
Seeing "real" is a warning flag here that either-or thinking is in play.
Putting aside hopes and norms, we live in a world now where multiple kinds of agents (human and non-human) are contributing to codebases. They do not contribute equally; they work according to different mechanisms, with different strengths and weaknesses, with different economic and cultural costs.
Recall a lesson from Ralph Waldo Emerson: "a foolish consistency is the hobgoblin of little minds" [1]. Don't cling to the past; pay attention to the now, and do what works. Another way of seeing it: don't force a false equivalence between things that warrant different treatment.
If you find yourself thinking thoughts that do more harm than good (e.g. muddle rather than clarify), attempt to reframe them to better make sense of reality (which has texture and complexity).
Here's my reframing: "Documentation serves different purposes to different agents across different contexts. So plan and execute accordingly."
[1]: https://en.wikipedia.org/wiki/Wikipedia:Emerson_and_Wilde_on...
vaer-k|3 months ago
Why should we do this when anthropic specifically recommends creating multiple CLAUDE.md files in various directories where the information is specific and pertinent? It seems to me that anthropic has designed claude to look for claude.md for guidance, and randomly named markdown files may or may not stand out to it as it searches the directory.
You can place CLAUDE.md files in several locations:
> The root of your repo, or wherever you run claude from (the most common usage). Name it CLAUDE.md and check it into git so that you can share it across sessions and with your team (recommended), or name it CLAUDE.local.md and .gitignore it Any parent of the directory where you run claude. This is most useful for monorepos, where you might run claude from root/foo, and have CLAUDE.md files in both root/CLAUDE.md and root/foo/CLAUDE.md. Both of these will be pulled into context automatically Any child of the directory where you run claude. This is the inverse of the above, and in this case, Claude will pull in CLAUDE.md files on demand when you work with files in child directories Your home folder (~/.claude/CLAUDE.md), which applies it to all your claude sessions
https://www.anthropic.com/engineering/claude-code-best-pract...
andersco|3 months ago
Aeolun|3 months ago
Yeah, if you do this every time it works fine. If you add what you tell it every time to CLAUDE.md, it also works fine, but you don’t have to tell it any more ;)
Havoc|3 months ago
It’s case sensitive btw. CLAUDE.md - Might explain your mixed results with it
szundi|3 months ago
[deleted]
ctoth|3 months ago
Some explicit things I found helpful: Have the agent address you as something specific! This way you know if the agent is paying attention to your detailed instructions.
Rationality, as in the stuff practiced on early Less Wrong, gives a great language for constraining the agent, and since it's read The Sequences and everything else you can include pointers and the more you do the more it will nudge it into that mode of thought.
The explicit "This is what I'm doing, this is what I expect" pattern has been hugely useful for both me monitoring it/coming back to see what it did, and it itself. It makes it more likely to recover when it goes down a bad path.
The system reminder this article mentions is definitely there but I have not noticed it messing much with adherence. I wish there were some sort of power user mode to turn it off though!
Also, this is probably too long! But I have been experimenting and iterating for a while, and this is what is working best currently. Not that I've been able to hold any other part constant -- Opus 4.5 really is remarkable.
[0]: https://gist.github.com/ctoth/d8e629209ff1d9748185b9830fa4e7...
rootnod3|3 months ago
Have we really reached the low point that we need tutorials on how to coerce a LLM into doing what we want instead of just....writing the god damn code?
saberience|3 months ago
You can easily test this by adding some mandatory instruction into the file. E.g. "Any new method you write must have less than 50 lines or code." Then use Claude for ten minutes and watch it blow through this limit again and again.
I use CC and Codex extensively and I constantly am resetting my context and manually pasting my custom instructions in again and again, because these models DO NOT remember or pay attention to Claude.md or Agents.md etc.
astrostl|3 months ago
dexwiz|3 months ago
chickensong|3 months ago
candiddevmike|3 months ago
Write readmes for humans, not LLMs. That's where the ball is going.
0xblacklight|3 months ago
Yes README.md should still be written for humans and isn’t going away anytime soon.
CLAUDE.md is a convention used by claude code, and AGENTS.md is used by other coding agents. Both are intended to be supplemental to the README and are deterministically injected into the agent’s context.
It’s a configuration point for the harness, it’s not intended to replace the README.
Some of the advice in here will undoubtedly age poorly as harnesses change and models improve, but some of the generic principles will stay the same - e.g. that you shouldn’t use an LLM to do a linter &formatter’s job, or that LLMs are stateless and need to be onboarded into the codebase, and having some deterministically-injected instructions to achieve that is useful instead of relying on the agent to non-deterministically derive all that info by reading config and package files
The post isn’t really intended to be super forward-looking as much as “here’s how to use this coding agent harness configuration point as best as we know how to right now”
unknown|3 months ago
[deleted]
mmaunder|3 months ago
0xblacklight|3 months ago
scelerat|3 months ago
Should do this for human developers too. Can't count the number of times I've been thrown onto a project and had to spend a significant amount of time opening and skimming files just to answer simple questions that should be answered in high-level docs like this.
abustamam|3 months ago
But in all seriousness, it's working. I write cursor rules religiously and I point other devs to them. Its great.
minor3|3 months ago
prettyblocks|3 months ago
0xblacklight|3 months ago
I didn’t dive into that because in a lot of cases it’s not necessary and I wanted to keep the post short, but for large monorepos it’s a good idea
jasonjmcghee|3 months ago
alansaber|3 months ago
magictux|3 months ago
Actually having official guidelines in their docs would be a good entrypoint, even though I guess we have this which is the closest available from anything official for now: https://www.claude.com/blog/using-claude-md-files
One interesting thing I also noticed and used recently is that Claude Code ships with a @agent-claude-code-guide. I've used it to review and update my dev workflow / CLAUDE.md file but I've got mixed feelings on the discussion with the subagent.
eric-burel|3 months ago
0xblacklight|3 months ago
We used cloudflare’s AI gateway which is pretty simple. Set one up, get the proxy URL and set it through the env var, very plug-and-play
fishmicrowaver|3 months ago
Havoc|3 months ago
On phone else I’d post commands
asim|3 months ago
jankdc|3 months ago
This way, it's got more of a chance in generating something that I wanted, rather than running off on it's own.
andai|3 months ago
Doesn't that mean that Claude Code's system prompt exhausts that budget before you even get to CLAUDE.md and the user prompt?
Edit: They say Claude Code's system prompt has 50. I might have misjudged then. It seemed pretty verbose to me!
The part about smaller models attending to fewer instructions is interesting too, since most of what was added doesn't seem necessary for the big models. I thought they added them so Haiku could handle the job as well, despite a relative lack of common sense.
DR_MING|3 months ago
nurettin|3 months ago
I used to instruct about coding style (prefer functions, avoid classes, use structs for complex params and returns, avoid member functions unless needed by shared state, avoid superfluous comments, avoid silly utf8 glyphs, AoS vs SoA, dry, etc)
I removed all my instructions and it basically never violates those points.
unknown|3 months ago
[deleted]
btbuildem|3 months ago
What I find most interesting is how a hierarchical / recursive context construct begins to emerge. The authors' note of "root" claude.md as well as the opening comments on LLMs being stateless ring to me like a bell. I think soon we will start seeing stateful LLMs, via clever manipulation of scope and context. Something akin to memory, as we humans perceive it.
ilmj8426|3 months ago
unknown|3 months ago
[deleted]
0xcb0|3 months ago
edf13|3 months ago
<system-reminder> IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task. </system-reminder>
Perhaps a small proxy between Claude code and the API to enforce following CLAUDE.md may improve things… I may try this
boredtofears|3 months ago
grishka|3 months ago
Is it a good one?
wizzledonker|3 months ago
lijok|3 months ago
philipp-gayret|3 months ago
tietjens|3 months ago
But as others are saying this is just basic documentation that should be done anyway.
VimEscapeArtist|3 months ago
Is this intentional? Is AoC designed as an elite challenge, or is the journey more important than finishing?
philipwhiuk|3 months ago
I rarely get past 18 or so. The stats for last year are here: https://adventofcode.com/2024/stats
rootusrootus|3 months ago
rcarmo|3 months ago
If you're using VSCode, that is automatically added to context (and I think in Zed that happens as well, although I can't verify right now).
Ozzie_osman|3 months ago
handoflixue|3 months ago
That said, a lot of what it can deduce by looking at the code is exactly what you shouldn't include, since it will usually deduce that stuff just by interacting with the code base. Claude doesn't seem good at that.
An example of both overly-verbose and unnecessary:
### 1. Identify the Working Directory
When a user asks you to work on something:
1. *Check which project* they're referring to
2. *Change to that directory* explicitly if needed
3. *Stay in that directory* for file operations
```bash
# Example: Working on ProjectAlpha
cd /home/user/code/ProjectAlpha
```
(The one sentence version is "Each project has a subfolder; use pwd to make sure you're in the right directory", and the ideal version is probably just letting it occasionally spend 60 seconds confused, until it remembers pwd exists)
chickensong|3 months ago
I think most people who use Claude regularly have probably come to the same conclusions as the article. A few bits of high-level info, some behavior stuff, and pointers to actual docs. Load docs as-needed, either by prompt or by skill. Work through lists and constantly update status so you can clear context and pick up where you left off. Any other approach eats too much context.
If you have a complex feature that would require ingesting too many large docs, you can ask Claude to determine exactly what it needs to build the appropriate context for that feature and save that to a context doc that you load at the beginning of each session.
unknown|3 months ago
[deleted]
toenail|3 months ago
Read your instructions from Agents.md
adastra22|3 months ago
OMG this finally makes sense.
Is there any way to turn off this behavior?
Or better yet is there a way to filter the context that is being sent?
malshe|3 months ago
sixothree|3 months ago
_august|3 months ago
m13rar|3 months ago
johnfn|3 months ago
I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)
0xblacklight|3 months ago
kirso|3 months ago
Why not just to show one?
unknown|3 months ago
[deleted]
wowamit|3 months ago
This is a news for me. And at the same time it isn’t. Without the knowledge of how the models actually work, most of the prompting is guesstimate at best. You have no control over models via prompts.
bryanhogan|3 months ago
brcmthrowaway|3 months ago
Zerot|3 months ago
huqedato|3 months ago
0xblacklight|3 months ago
uncletaco|3 months ago
foobarbecue|3 months ago
chickensong|3 months ago
Also, while it may be hip to call any LLM output slop, that really isn't the case. Look at what a poor history we have of developer documentation. LLMs may not be great at everything, but they're actually quite capable when it comes to technical documentation. Even a 1-shot attempt by LLM is often way better than many devs who either can't write very well, or just can't be bothered to.
acedTrex|3 months ago
0xblacklight|3 months ago
Consider that if the only code you get out of the autoregressive token prediction machine is slop, that this indicates more about the quality of your code than the quality of the autoregressive token prediction machine
max-privatevoid|3 months ago
rvz|3 months ago
AndyNemmity|3 months ago
I have a full system of agents, hooks, skills, and commands, and it all works for me quite well.
I believe is massive context, but targetted context. It has to be valuable, and important.
My agents are large. My skills are large. Etc etc.
jason-richar15|3 months ago
[deleted]
alan-jordan13|3 months ago
[deleted]
fpauser|3 months ago
vladsh|3 months ago
testdelacc1|3 months ago
A good Claude.md - I don’t know, presumably the article explains.