top | item 46319256

(no title)

There's a pattern I keep seeing: LLMs used to replace things we already know how to do deterministically. Parsing a known HTML structure, transforming a table, running a financial simulation. It works, but it's like using a helicopter to cross the street: expensive, slow, and not guaranteed to land exactly where you intended.

The real opportunity with Agent Skills isn't just packaging prompts. It's providing a mechanism that enables a clean split: LLM as the control plane (planning, choosing tools, handling ambiguous steps) and code or sub-agents as the data/execution plane (fetching, parsing, transforming, simulating, or executing NL steps in a separate context).

This requires well-defined input/output contracts and a composition model. I opened a discussion on whether Agent Skills should support this kind of composability:

https://github.com/agentskills/agentskills/issues/11

discuss

basch|2 months ago

The same applies to context vs a database. If a reasoning model makes a decision about something, it should be put off to the side and stored as a value/variable/entry somewhere. Instead of using pages and pages of context, it makes sense for some tasks to "press" decisions that become more permanent to the conversation. You can somewhat accomplish that with notebooklm, by turning results into notes into sources, but notebooklm is insular and doesnt have the research and imaging features of gemini.

And also, in writing, writing from top to bottom has its disadvantages. It makes sense to emulate human writing process and have passes, as you flesh out, and conversely summarize writing.

Current LLMs can brute force these things through emulation/observation/mimicry but they arent as good as doing it the right way. Not only would I like to see "skills" but also "processes" where you create a well defined order that tasks are accomplished in sequence. Repeatable templates. This would essentially include variables in the templates, set for replacement.

rlupi|2 months ago

> Not only would I like to see "skills" but also "processes" where you create a well defined order that tasks are accomplished in sequence. Repeatable templates. This would essentially include variables in the templates, set for replacement.

You can do this with Gemini commands and extensions.

https://cloud.google.com/blog/topics/developers-practitioner...

gradus_ad|2 months ago

I've recently been doing some work with Autodesk. It would be great for an LLM to be as comfortable with the "vocabulary" of these applications as they are with code. Maybe part of this involves creating a language for CAD design in the first place. But the principle that we need to build out vocabularies and subsequently generate and expose "sentences" (workflows) for LLM's to train on seems like a promising direction.

Of course this requires substantial buy in from application owners - create the vocabulary - and users - agree to expose and share the sentences they generate - but the results would be worth it.

baq|2 months ago

Mildly amusing since i remember AutoCAD having a lisp interpreter ~30 years ago…?

ugh123|2 months ago

100%

Additionally, I can't even get claude or codex to reliable use the prompt and simple rules (use this command to compile) in an agents.md or whatever required markdown file is needed. Why would I assume they will reliably handle skills prompts spread about a codebase?

I've even seen tool usage deteriorate while it's thinking and self commanding through its output to say.. read code from a file. Sometimes it uses tail while other times it gets confused on the output and then writes a basic python program to parse lines and strings from the same file to effectively get what was the same output as before. How bizarre!

esafak|2 months ago

Skills are about empowering LLMs with tools, so the heavy lifting can still be deterministic. Furthermore, pipelines written in LLMs are simpler and less brittle, since handling variation is the essence of machine learning.

rk06|2 months ago

pipelines written in LLMs may be simpler. but they are definitely more brittle and even more non-deterministic.

if AI were deterministic, what difference would different AI model make?

itissid|2 months ago

Isn't atleast part of that GH issue something that this https://docs.boundaryml.com/guide/introduction/what-is-baml is also trying to solve? LLM inputs and outputs must be functions with defined functions. That was their starting point.

IIUC their most recent arc focuses on prompt optimization[0] where you can optimize — using DSPy and an optimization algo GEPA [1] — using relative weights on different things like errors, token usage, complexity.

[0] https://docs.boundaryml.com/guide/baml-advanced/prompt-optim... [1] https://github.com/gepa-ai/gepa?tab=readme-ov-file

deaux|2 months ago

Where in this post's article are you seeing this pattern?

> Parsing a known HTML structure

In most cases, HTML structures that are being parsed aren't known. If they're known, you control them, and you don't need to parse them in the first place. If they're someone else's, who knows when they'll change, or under what condition they're different.

But really, I don't see the stuff you're talking about happening in prod for non-one-off usecases. I see LLMs used in prod usecases exactly for data where you don't know exactly what its shape will be, and there's an enormous amount of such cases. If the same logic is needed every time, of course you don't have an LLM execute that logic, you have the LLM write a deterministic script.

_the_inflator|2 months ago

I agree partly.

Skills are essentially boiling down to distributed parts of a Main Prompt. If you consider a state model you can see this pattern: Task is the state and combining the task's specifics skills defines the current prompt augmentation. When the task changes, another prompt emerges.

In the end, it is the clear guidance of the Agent that is the deciding factor.

hintymad|2 months ago

> Parsing a known HTML structure, transforming a table, running a financial simulation.

Transforming an arbitrary table is still hard, especially a table on a webpage or in a document. Sometimes I even struggle to find the right library. The effort does not seem worth it for one-off need of such transformation too. LLM can be a great tool for doing the tasks.