(no title)
motoboi | 1 month ago
So the model won’t “understand” that you have a skill and use it. The generation of the text that would trigger the skill usage is made via Reinforcement Learning with human generated examples and usage traces.
So why don’t the model use skills all the time? Because it’s a new thing, there is not enough training samples displaying that behavior.
They also cannot enforce that via RL because skills use human language, which is ambiguous and not formal. Force it to use skills always via RL policy and you’ll make the model dumber.
So, right now, we are generating usage traces that will be used to train the future models to get a better grasp of when to use skills not. Just give it time.
AGENTS.md, on the other hand, is context. Models have been trained to follow context since the dawn of the thing.
vidarh|1 month ago
The skills frontmatter end up in context as well.
If AGENTS.md outperform skills in a given agent, it is down to specifically how the skills frontmatter is extracted and injected into the context, because that is the only difference between the two approaches.
EDIT: I haven't tried to check this so this is pure speculation, but I suppose there is the possibility that some agents might use a smaller model to selectively decide what skills frontmatter to include in context for a bigger model. E.g. you could imagine Claude passing the prompt + skills frontmatter to Haiku to selectively decide what to include before passing to Sonnet or Opus. In that case, depending on approach, putting it directly in AGENTS.md might simply be a question of what information is prioritised in the ouput passed to the full model. (Again: this is pure speculation of a possible approach; though it is one I'd test if I were to pick up writing my own coding agent again)
But really the overall point is that AGENTS.md vs. skills here still is entirely a question of what ends up in the "raw" context/prompt that gets passed to the full model, so this is just nuance to my original answer with respect to possible ways that raw prompt could be composed.
OJFord|1 month ago
Hence the submission's conclusion:
> Our working theory [for why this performs better] comes down to three factors.
> No decision point. With AGENTS.md, there's no moment where the agent must decide "should I look this up?" The information is already present.
> Consistent availability. Skills load asynchronously and only when invoked. AGENTS.md content is in the system prompt for every turn.
> No ordering issues. Skills create sequencing decisions (read docs first vs. explore project first). Passive context avoids this entirely.
js8|1 month ago
How do you know? What if AGI can be implemented as a reasonably small set of logic rules, which implement what we call "epistemology" and "informal reasoning"? And this set of rules is just being run in a loop, producing better and better models of reality. It might even include RL, for what we know.
And what if LLMs already know all these rules? So they are AGI-complete without us knowing.
To borrow from Dennett, we understand LLMs from the physical stance (they are neural networks) and the design stance (they predict next token of language), but do we understand them from an intentional stance, i.e. what rules they employ when they running chain-of-thought for example?
blueprint|1 month ago
themoose8|1 month ago
They're very useful, but as we all know - they're far from infallible.
We're probably plateauing on the improvement of the core GPT technology. For these models and APIs to improve, it's things like Skills that need to be worked on and improved, to reduce those mistakes that it makes and produce better output.
So it's pretty disappointing to see that the 'Skills' feature set as implemented, as great of a concept as it is, is pretty bogus compared to just front loading the AGENTS.md file. This is not obvious and valuable to know.
whattheheckheck|1 month ago
coldtea|1 month ago
This makes the assumption that AGI is not autocomplete of steroids, which even before LLMs was a very plausible suggested mechanism for what intelligence is.
baby|1 month ago
DanOpcode|1 month ago
jacobkg|1 month ago
wahnfrieden|1 month ago
bzGoRust|1 month ago
anal_reactor|1 month ago
https://en.wikipedia.org/wiki/GNU/Linux_naming_controversy