It's so nice that skills are becoming a standard, they are imo a much bigger deal long-term than e.g. MCP.
Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).
Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).
Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.
Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.
On top of everything you've described, one more advantage is that you can use the agents themselves to edit / improve / add to the skills. One easy one to do is something like "take the key points from this session and add the learnings as a skill". It works both on good sessions with new paths/functionality and on "bad" sessions where you had to hand-hold the agent. And they're pretty good at summarising and extracting tidbits. And you can always skim the files and do quick edits.
Compared to MCPs, this is a much faster and more approachable flow to add "capabilities" to your agents.
Something that’s under-emphasized and vital to understand about Skills is that, by the spec, there’s no RAG on the content of Skill code or markdown - the names and descriptions in every skill’s front-matter are included verbatim in your prompt, and that’s all that’s used to choose a skill.
So if you have subtle logic in a Skill that’s not mentioned in a description, or you use the skill body to describe use-cases not obvious from the front-matter, it may never be discovered or used.
Additionally, skill descriptions are all essentially prompt injections, whether relevant/vector-adjacent to your current task or not; if they nudge towards a certain tone, that may apply to your general experience with the LLM. And, of course, they add to your input tokens on every agentic turn. (This feature was proudly brought to you by Big Token.) So be thoughtful about what you load in what context.
Some agentic systems do apply RAG to skills, there's nothing about skills that requires blind insertion into prompts.
This is really an agentic harness issue, not an LLM issue per se.
In 2026, I think we'll see agentic harnesses much more tightly integrated with their respective LLMs. You're already starting to see this, e.g. with Google's "Interactions" API and how different LLMs expect CoT to be maintained.
There's a lot of alpha in co-optimizing your agentic harness with how the LLM is RL-trained on tool use and reasoning traces.
Honestly the index seems as much a liability as a boon. Keeping the context clean and focused is one of the most important things for getting the best out of lmms. For now I prefer just adding my md files to the context whenever I deem them relevant.
Skills are much simpler than mcps, which are hopelessly overengineered, but even skills seem unnecessarily overengineered. You could fix the skill index taking up place in the context, by just making it a tool available to the agent (but not an mcp!).
I already was doing something similar on a regular basis.
I have many "folders"... each with a README.md, a scripts folder, and an optional GUIDE.md.
Whenever I arrive at some code that I know can be reused easily (for example: clerk.dev integration hat spans frontend and backend both), I used to create a "folder" of the same.
I think Skills could turn into something like open source libraries: standardized solutions to common problems, often written by experts.
Imagine having Skills available that implements authentication systems, multi-tenancy, etc.. in your codebase without having to know all the details about how to do this securely and correctly. This would probably boost code quality a lot and prevent insecure/buggy vibe coded products.
And then you make a global index of those skills available to models, where they can search for an appropriate skill on demand, then download and use them automatically.
A lot of the things we want continuous learning for can actually be provided by the ability to obtain skills on the fly.
In my opinion it’s to some degree an artifact of immature and/or rapidly changing technology. Basically not many know what the best approach is, all the use cases aren’t well understood, and things are changing so rapidly they’re basically just creating interfaces around everything so you can change flow in and out of LLMs any way you may desire.
Some paths are emerging popular, but in a lot of cases we’re still not sure even these are the long term paths that will remain. It doesn’t help that there’s not a good taxonomy (that I’m aware of) to define and organize the different approaches out there. “Agent” for example is a highly overloaded term that means a lot of things and even in this space, agents mean different things to different groups.
All marketing names for APIs and prompts. IMO you don't need to even try to follow, because there's nothing inherently new or innovative about any of this.
None of them matter that much. They're all just ways to bring in context. Think of them as conveniences.
Tools are useful so the AI can execute commands, but beyond that it's just ways to help you build the context for your prompt. Either pulling in premade prompts that provides certain instructions or documentation, or providing more specialized tools for the model to use along with instructions on using those tools.
Recently there was a submission (https://news.ycombinator.com/item?id=45840088) breaking down how agents are basically just a loop of querying a LLM, sometimes receiving a specially-formatted (using JSON in the example) "request to use a tool", and having the main program detect, interpret and execute those requests.
What do "skills" look like, generically, in this framework?
Before the first loop iteration, the harness sends a message to the LLM along the lines of.
<Skills>
<Skill>
<Name>postgres</Name>
<Description>Directions on how to query the pre-prod postgres db</Description>
<File>skills/postgres.md</File>
</Skill>
</Skills>
The harness then may periodically resend this notification so that the LLM doesn't "forget" that skills are available. Because the notification is only name + description + file, this is cheap r.e tokens. The harness's ability to tell the LLM "IMPORTANT: this is a skill, so pay attention and use it when appropriate" and then periodically remind them of this is what differentiates a proper Anthropic-style skill from just sticking "If you need to do postgres stuff, read skills/postgres.md" in AGENTS.md. Just how valuable is this? Not sure. I suspect that a sufficiently smart LLM won't need the special skill infrastructure.
(Note that skill name is not technically required, it's just a vanity / convenience thing).
The agent can selectively loads one or more of the "skills", which means it'll pull it's prompt once it decided that it should be loaded, and the skill can have accompanying scripts that the prompt also describes to the LLM.
So it's just like a standard way to bring in prompts/scripts to the LLM with support from the tooling directly.
It would be trivial to create something like this but there are a few major problems with running such a platform that I think makes it not worth while for anyone (maybe some providers will try it, but it's still tough).
- you will be getting a TON of spam. Just look at all the MCP folks, and how they're spamming everywhere with their claude-vibed mcp implementation over something trivial.
- the security implications are enormous. You'd need a way to vet stuff, moderate, keep track of things and so on. This only compounds with more traffic, so it'd probably be untenable really fast.
- there's probably 0 money in this. So you'd have to put a lot of work in maintaining a platform that attracts a lot of abuse/spam/prompt kiddies, while getting nothing in return. This might make sense to do for some companies that can justify this cost, but at that point, you'd be wondering what's in it for them. And what control do they exert on moderation/curation, etc.
I think the best we'll get in this space is from "trusted" entities (i.e. recognised coders / personalities / etc), from companies themselves (having skills in repos for known frameworks might be a thing, like it is with agents.md), and maybe from the token providers themselves.
it feels like people keep attempting this idea, largely because its easy to build, but in practice people aren't interested using others' prompts because the cost to create a customized skill/gpt/prompt/whatever is near zero
I created a skill to write skills (based on the Anthropic docs). I think the value is really in making the skills work for your workflows and code base
People are really misunderstanding Skills, in my opinion. It's not really about the .md file. It's about the bundling of code and instructions. Skills assume a code execution environment.
You could already pre-approve an executable and just call that from your prompt. The context savings by adding/indexing metadata and dynamically loading the rest of the content as-needed is the big win here IMHO.
I wonder if generated skills could be useful to codify the outcome of long sessions where the agent has tried a bunch of things and then finally settled on a solution based on a mixture of test failures and user feedback
Obviously they are empowering Codex and Claude etc, and many will be open source or free.
But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?
I can see at least a couple of ways it might be done - skills requiring API keys or or other authentication approaches, but this adds friction to an otherwise smooth skill integration process.
Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.
I don't understand how skills are different than just instructing your model to read all the front-matters from a given folder on your filesystem and then decide if they need to read the file body.
One difference is the model might have been trained/fine-tuned to be better at "read all the front-matters from a given folder on your filesystem and then decide..." compared a model with those instructions only in its context.
Also, does your method run scripts and code in any kind of sandbox or other containment or do you give it complete access to your system? #yolo
Are we sure that unrestricted free-form Markdown content is the best configuration format for this kind of thing? I know there is a YAML frontmatter component to this, but doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?
I would like my agents to be inherently evaluable, and free-text instructions do not lend themselves easily to systematic evaluation.
>doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?
The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.
Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.
At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.
The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.
The DSPy + GEPA idea for this mentioned above[1] seems like it could be a reasonable approach for systematic evaluation of skills (not agents as a whole though). I'm going to give this a bit of a play over the holiday break to sort out a really good jj-vcs skill.
Ah, yes, simple text files that describe concepts, and that may contain references to other concepts, or references to dive in deeper. We could even call these something like a link. And they form a sort of... web, maybe ?
Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills
Imagine SUBPROGRAMs that implement well-specified sequences of operations in a COmmon Business-Oriented Language, which can CALL each other. We are truly sipping rocket fuel.
Given how precious the main context is would it not make sense to have the skill index and skill runner occur in a subagent? e.g. "run this query against the dev db" the skills index subagent finds the db skill, runs the query then returns the result to the main context.
Thanks for that! You mentioned Antigravity seemed slow, I just started playing with it too (but not really given it a good go yet to really evaluate) but I had the model set to Gemini Flash, maybe you get a speed up if you do that?
I’m probably missing it, but I don’t see how you can share skills across agents, other than maybe symlinking .claude/skills and .codex/skills to the same place?
What is the advantage of skills over just calling code? From where I’m standing a Claude.md with a couple of examples of a particular bash script (examples and bash also written by Claude) is enough.
one thing that I am missing from the specification is a way to inject specific variables into the skills. If I create let's say a postgres-skill, then I can either (1) provide the password on every skill execution or (2) hardcode the password into my script. To make this really useful there needs to be some kind of secret storage that the agent can read/write. This would also allow me as a programmer to sell the skills that I create more easily to customers.
Agent Skills let you extend Codex with task-specific capabilities. A skill packages instructions, resources, and optional scripts so Codex can perform a specific workflow reliably. You can share skills across teams or the community, and they build on the open Agent Skills standard.
Skills are available in both the Codex CLI and IDE extensions.
This seems great and all, but to my surprise the default $plan skill in Codex prefers to writing plan files to ~/.codex/plans. Is this intentional, or an idiosyncrasy of my particular instance of Codex? Every agent tool I've ever seen before puts planning documentation in the repo folder itself, not in a global user directory. Why this weird decision?
The skills that matter most to me are the ones I create myself (with the skill creator skill) that are very specific and proprietary. For instance, a skill on how to write a service in my back-testing framework.
I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.
cube2222|2 months ago
Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).
Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).
Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.
Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.
NitpickLawyer|2 months ago
Compared to MCPs, this is a much faster and more approachable flow to add "capabilities" to your agents.
hu3|2 months ago
I'm having a hard time figuring out how could I leverage skills in a medium size web application project.
It's python, PostgreSQL, Django.
Thanks in advance.
I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.
mycall|2 months ago
btown|2 months ago
So if you have subtle logic in a Skill that’s not mentioned in a description, or you use the skill body to describe use-cases not obvious from the front-matter, it may never be discovered or used.
Additionally, skill descriptions are all essentially prompt injections, whether relevant/vector-adjacent to your current task or not; if they nudge towards a certain tone, that may apply to your general experience with the LLM. And, of course, they add to your input tokens on every agentic turn. (This feature was proudly brought to you by Big Token.) So be thoughtful about what you load in what context.
See e.g. https://github.com/openai/codex/blob/a6974087e5c04fc711af68f...
erichocean|2 months ago
This is really an agentic harness issue, not an LLM issue per se.
In 2026, I think we'll see agentic harnesses much more tightly integrated with their respective LLMs. You're already starting to see this, e.g. with Google's "Interactions" API and how different LLMs expect CoT to be maintained.
There's a lot of alpha in co-optimizing your agentic harness with how the LLM is RL-trained on tool use and reasoning traces.
Sammi|2 months ago
Skills are much simpler than mcps, which are hopelessly overengineered, but even skills seem unnecessarily overengineered. You could fix the skill index taking up place in the context, by just making it a tool available to the agent (but not an mcp!).
jimmydoe|2 months ago
freakynit|2 months ago
I have many "folders"... each with a README.md, a scripts folder, and an optional GUIDE.md.
Whenever I arrive at some code that I know can be reused easily (for example: clerk.dev integration hat spans frontend and backend both), I used to create a "folder" of the same.
When needed, I used to just copy-paste all the folder content using my https://www.npmjs.com/package/merge-to-md package.
This has worked flawlessly well for me uptil now.
Glad we are bringing such capability natively into these coding agents.
diamondfist25|2 months ago
astra90|2 months ago
Imagine having Skills available that implements authentication systems, multi-tenancy, etc.. in your codebase without having to know all the details about how to do this securely and correctly. This would probably boost code quality a lot and prevent insecure/buggy vibe coded products.
JimDabell|2 months ago
A lot of the things we want continuous learning for can actually be provided by the ability to obtain skills on the fly.
freakynit|2 months ago
andybak|2 months ago
Frost1x|2 months ago
Some paths are emerging popular, but in a lot of cases we’re still not sure even these are the long term paths that will remain. It doesn’t help that there’s not a good taxonomy (that I’m aware of) to define and organize the different approaches out there. “Agent” for example is a highly overloaded term that means a lot of things and even in this space, agents mean different things to different groups.
iLoveOncall|2 months ago
didibus|2 months ago
Tools are useful so the AI can execute commands, but beyond that it's just ways to help you build the context for your prompt. Either pulling in premade prompts that provides certain instructions or documentation, or providing more specialized tools for the model to use along with instructions on using those tools.
not_a_toaster|2 months ago
maddmann|2 months ago
ksdnjweusdnkl21|2 months ago
zahlman|2 months ago
What do "skills" look like, generically, in this framework?
colonCapitalDee|2 months ago
<Skills>
</Skills>The harness then may periodically resend this notification so that the LLM doesn't "forget" that skills are available. Because the notification is only name + description + file, this is cheap r.e tokens. The harness's ability to tell the LLM "IMPORTANT: this is a skill, so pay attention and use it when appropriate" and then periodically remind them of this is what differentiates a proper Anthropic-style skill from just sticking "If you need to do postgres stuff, read skills/postgres.md" in AGENTS.md. Just how valuable is this? Not sure. I suspect that a sufficiently smart LLM won't need the special skill infrastructure.
(Note that skill name is not technically required, it's just a vanity / convenience thing).
didibus|2 months ago
So it's just like a standard way to bring in prompts/scripts to the LLM with support from the tooling directly.
orliesaurus|2 months ago
NitpickLawyer|2 months ago
- you will be getting a TON of spam. Just look at all the MCP folks, and how they're spamming everywhere with their claude-vibed mcp implementation over something trivial.
- the security implications are enormous. You'd need a way to vet stuff, moderate, keep track of things and so on. This only compounds with more traffic, so it'd probably be untenable really fast.
- there's probably 0 money in this. So you'd have to put a lot of work in maintaining a platform that attracts a lot of abuse/spam/prompt kiddies, while getting nothing in return. This might make sense to do for some companies that can justify this cost, but at that point, you'd be wondering what's in it for them. And what control do they exert on moderation/curation, etc.
I think the best we'll get in this space is from "trusted" entities (i.e. recognised coders / personalities / etc), from companies themselves (having skills in repos for known frameworks might be a thing, like it is with agents.md), and maybe from the token providers themselves.
dkdcio|2 months ago
not ranked with comments but I’d expect solid quality from these and they should “just work” in Codex etc.
relativeadv|2 months ago
nickdichev|2 months ago
ericflo|2 months ago
chickensong|2 months ago
tehjoker|2 months ago
arnabgho|2 months ago
jimmydoe|2 months ago
ithkuil|2 months ago
dkdcio|2 months ago
pupppet|2 months ago
esafak|2 months ago
mkagenius|2 months ago
I've this mental map:
Frontmatter <---> Name and arguments of the function
Text part of Skill md <---> description field of the function
Code part of the Skill <---> body of the function
But the function wouldn't look as organised as the .md, also, Skill can have multiple function definitions.
jinushaun|2 months ago
mellosouls|2 months ago
Obviously they are empowering Codex and Claude etc, and many will be open source or free.
But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?
I can see at least a couple of ways it might be done - skills requiring API keys or or other authentication approaches, but this adds friction to an otherwise smooth skill integration process.
Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.
nextaccountic|2 months ago
shrx|2 months ago
tacone|2 months ago
pests|2 months ago
One difference is the model might have been trained/fine-tuned to be better at "read all the front-matters from a given folder on your filesystem and then decide..." compared a model with those instructions only in its context.
Also, does your method run scripts and code in any kind of sandbox or other containment or do you give it complete access to your system? #yolo
shimman|2 months ago
fassssst|2 months ago
mikaelaast|2 months ago
coldtea|2 months ago
The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.
Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.
At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.
Etheryte|2 months ago
joshka|2 months ago
[1]: https://news.ycombinator.com/item?id=46338371
heliumtera|2 months ago
There you go, you're welcome.
well_ackshually|2 months ago
Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills
username223|2 months ago
ollysb|2 months ago
stared|2 months ago
derrida|2 months ago
alexgotoi|2 months ago
As of this week, this also applies to Hacker News.
rdli|2 months ago
It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.
greymalik|2 months ago
zingar|2 months ago
jonrosner|2 months ago
j_bum|2 months ago
Otherwise, why not just keep the password in an .env file, and state “grab the password from the .env file” in your Postgres skill?
bavell|2 months ago
Why not the filesystem?
I would create a local file (e.g. .env) in each project using postgres, then in my postgres skill, tell the agent to check that file for credentials.
NamlchakKhandro|2 months ago
that's all there is to it.
If you want to go deeper, then Skills are dynamically unfolding prompts.
If you want a large library of skills and don't want to fill up your context window then checkout opencode-skillful
not_a_toaster|2 months ago
summarity|2 months ago
Anthropic: https://www.anthropic.com/engineering/equipping-agents-for-t...
Copilot: https://github.blog/changelog/2025-12-18-github-copilot-now-...
rochansinha|2 months ago
Skills are available in both the Codex CLI and IDE extensions.
dan_wood|2 months ago
unknown|2 months ago
[deleted]
user3939382|2 months ago
apetresc|2 months ago
karolcodes|2 months ago
haffi112|2 months ago
frankc|2 months ago
I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.
dmd|2 months ago
pylotlight|2 months ago
firemelt|2 months ago
can we use notepad or somrthing free and not proprietary?