top | item 46250820

(no title)

lacker | 2 months ago

I'm not sure if I have the right mental model for a "skill". It's basically a context-management tool? Like a skill is a brief description of something, and if the model decides it wants the skill based on that description, then it pulls in the rest of whatever amorphous stuff the skill has, scripts, documents, what have you. Is this the right way to think about it?

discuss

simonw|2 months ago

It's a folder with a markdown file in it plus optional additional reference files and executable scripts.

The clever part is that the markdown file has a section in it like this: https://github.com/datasette/skill/blob/a63d8a2ddac9db8225ee...

  ---
  name: datasette-plugins
  description: "Writing Datasette plugins using Python and the pluggy plugin system. Use when Claude needs to: (1) Create a new Datasette plugin, (2) Implement plugin hooks like prepare_connection, register_routes, render_cell, etc., (3) Add custom SQL functions, (4) Create custom output renderers, (5) Add authentication or permissions logic, (6) Extend Datasette's UI with menus, actions, or templates, (7) Package a plugin for distribution on PyPI"
  ---

On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.

spike021|2 months ago

Apologies for not reading all of your blogs on this, but a follow-up question. Are models still prone to reading these and disregarding them even if they should be used for a task?

Reason I ask is because a while back I had similar sections in my CLAUDE.md and it would either acknowledge and not use or just ignore them sometimes. I'm assuming that's more of an issue of too much context and now skill-level files like this will reduce that effect?

behnamoh|2 months ago

why did this simple idea take so long to become available? I remember even in llama 2 days I was doing this stuff, and that model didn't even function call.

kswzzl|2 months ago

> On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.

Maybe I still don't understand the mechanics - this happens "on startup", every time a new conversation starts? Models go through the trouble of doing ls/cat/extraction of descriptions to bring into context? If so it's happening lightning fast and I somehow don't notice.

Why not just include those descriptions within some level of system prompt?

kridsdale1|2 months ago

So it’s a header file. In English.

throwaway314155|2 months ago

Do skills get access to the current context or are they a blank slate?

leetrout|2 months ago

Have you used AWS bedrock? I assume these get pretty affordable with prompt caching...

prescriptivist|2 months ago

Skills have a lot of uses, but one in particular I like is replacing one off MCP server usage. You can use (or write) an MCP server for you CI system and then add the instructions to your AGENTS.md to query the CI MCP for build results for the current branch. Then you need to find a way to distribute the MCP server so the rest of the team can use it or cook it into your dev environment setup. But all you really care about is one tool in the MCP server, the build result. Or...

You can hack together a shell, python, whatever script that fetches build results from your CI server, dumps them to stdout in a semi structured format like markdown, then add a 10-15 line SKILL.md and you have the same functionality -- the skill just executes the one-off script and reads the output. You package the skill with the script, usually in a directory in the project you are working on, but you can also distribute them as plugins (bundles) that claud code can install from a "repository", which can just be a private git repo.

It's a little UNIX-y in a way, little tools that pipe output to another tool and they are useful in a standalone context or in a chain of tools. Whereas MCP is a full blown RPC environment (that has it's uses, where appropriate).

wiether|2 months ago

How do you manage the credentials to requests your CI server in this case? They are hardcoded in the script associated to your SKILL?

delaminator|2 months ago

Claude Code is not very good at “remembering” its skills.

Maybe they get compacted out of the context.

But you can call upon them manually. I often do something like “using your Image Manipulation skill, make the icons from image.png”

Or “use your web design skill to create a design for the front end”

Tbh i do like that.

I also get Claude to write its own skills. “Using what we learned about from this task, write a skill document called /whatever/using your writing skills skill”

I have a GitHub template including my skills and commands, if you want to see them.

https://github.com/lawless-m/claude-skills

jorl17|2 months ago

I'm so excited for the future, because _clearly_ our technology has loads to improve. Even if new models don't come out, the tooling we build upon them, and the way we use them, is sure to improve.

One particular way I can imagine this is with some sort of "multipass makeshift attention system" built on top of the mechanisms we have today. I think for sure we can store the available skills in one place and look only at the last part of the query, asking the model the question: "Given this small, self-contained bit of the conversation, do you think any of these skills is a prime candidate to be used?" or "Do you need a little bit more context to make that decision?". We then pass along that model's final answer as a suggestion to the actual model creating the answer. There is a delicate balance between "leading the model on" with imperfect information (because we cut the context), and actually "focusing it" on the task at hand, and the skill selection". Well, and, of course, there's the issue of time and cost.

I actually believe we will see several solutions make use of techniques such as this, where some model determines what the "big context" model should be focusing on as part of its larger context (in which it may get lost).

In many ways, this is similar to what modern agents already do. cursor doesn't keep files in the context: it constantly re-reads only the parts it believes are important. But I think it might be useful to keep the files in the context (so we don't make an egregious mistake) at the same time that we also find what parts of the context are more important and re-feed them to the model or highlight them somehow.

Sammi|2 months ago

I'm kinda confused about why this even is something that we need an extra feature for when it's basically already built in to the agentic development feature. I just keep a folder of md files and I add whatever one is relevant when it's relevant. It's kinda straight forward to do...

Just like you I don't edit much in these files on my own. Mostly just ask the model to update an md file whenever I think we've figured out something new, so the learning sticks. I have files for test writing, backend route writing, db migration writing, frontend component writing etc. Whenever a section gets too big to live in agents.md it gets it's own file.

marwamc|2 months ago

My understanding is this: A skill is made up of SKILL.md which is what tells claude how and when to use this skill. I'm a bit of a control freak so I'll usually explicitly direct claude to "load the wireframe-skill" and then do X.

Now SKILL.md can have references to more finegrained behaviors or capabilities of our skill. My skills generally tend to have a reference/{workflows,tools,standards,testing-guide,routing,api-integration}.md. These references are what then gets "progressively loaded" into the context.

Say I asked claude to use the wireframe-skill to create profileView mockup. While creating the wireframe, claude will need to figure out what API endpoints are available/relevant for the profileView and the response types etc. It's at this point that claude reads the references/api-integration.md file from the wireframe skill.

After a while I found I didn't like the progressive loading so I usually direct claude to load all references in the skill before proceeding - this usually takes up maybe 20k to 30k tokens, but the accuracy and precision (imagined or otherwise ha!) is worth it for my use cases.

kxrm|2 months ago

> I'm a bit of a control freak so I'll usually explicitly direct claude to "load the wireframe-skill" and then do X.

You shouldn't do this, it's generally considered bad practice.

You should be optimizing your skill description. Often times if I am working with Claude Code and it doesn't load I skill, I ask it why it missed the skill. It will guide me to improving the skill description so that it is picked up properly next time.

This iteration on skill description has allowed skills to stay out of context until they are needed rather predictably for me so far.

chrisweekly|2 months ago

My understanding is that use of "description" frontmatter is essential, bc Claude Code can read just the description without loading the entire file into context.

taytus|2 months ago

Easy, let me try to explain: You want to achieve X, so you ask your AI companion, "How do I do X?" Your companion thinks and tries a couple of things, and they eventually work. So you say, "You know what, next time, instead of figuring it out, just do this"... that is a skill. A recipe for how to do things.

jmalicki|2 months ago

Yes. I find these very useful for enforcing e.g. skills like debugging, committing code, make prs, responding to pr feedback from ai review agents, etc. without constantly polluting the context window.

So when it's time to commit, make sure you run these checks, write a good commit message, etc.

Debugging is especially useful since AI agents can often go off the rails and go into loops rewriting code - so it's in a skill I can push for "read the log messages. Inserting some more useful debug assertions to isolate the failure. Write some more unit tests that are more specific." Etc.

canadiantim|2 months ago

I think it’s also important to think of skills in the context of tasks, so when you want an agent to perform a specialized task, then this is the context, the resources and scripts it needs to perform the task.

hadlock|2 months ago

I'm excited to use this with the Ghidra cli mode to rapidly decompile physics engines from various games. Do I want my flight simulator to behave like the Cessna like in flight simulator 3.0 in the air? Codex can already do that. Do I want the plane to handle like Yoshi from Mario Kart 64 when taxiing? It hasn't been done yet but Claude code is apparently pretty good at pulling apart n64 roms so that seems within the realm of possibility.