(no title)
VladVladikoff | 1 day ago
Also and this is just my ignorance about Claws, but if we allow an agent permission to rewrite its code to implement skills, what stops it from removing whatever guardrails exist in that codebase?
VladVladikoff | 1 day ago
Also and this is just my ignorance about Claws, but if we allow an agent permission to rewrite its code to implement skills, what stops it from removing whatever guardrails exist in that codebase?
drujensen|1 day ago
I installed nanoclaw to try to out.
What is kinda crazy is that any extension like discord connection is done using a skill.
A skill is a markdown file written in English to provide a step by step guide to an ai agent on how to do something.
Basically, the extensions are written by claude code on the fly. Every install of nanoclaw is custom written code.
There is nothing preventing the AI Agent from modifying the core nanoclaw engine.
It’s ironic that the article says “Don’t trust AI agents” but then uses skills and AI to write the core extensions of nanoclaw.
jimminyx|1 day ago
I did my best to communicate this but I guess it was still missed:
NanoClaw is not software that you should run out of the box. It is designed as a sort of framework that gives a solid foundation for you to build your own custom version.
The idea is not that you toggle on a bunch of features and run it. You should customize, review, and make sure that the code does what you want.
So you should not trust the coding agents that they didn't break the security model while adding discord. But after discord is added, you review the code changes and verify that it's correct. And because even after adding discord you still only have 2-3k loc, it's actually something you can realistically do.
Additionally, the skills were originally a bit ad-hoc. Now they are full working, tested and reviewed reference implementations. Code is separate from markdown files. When adding a new integration or messaging channel, the agent uses `git merge` to merge the changes in, rather than rewriting from scratch. Adding the first channel is fully deterministic. The agent only resolves merge conflicts if there are any.
MarkSweep|1 day ago
sanex|1 day ago
bitwize|1 day ago
"Every copy of Nanoclaw is personalized." So if I use it long enough will I see the Wario apparition?
gronky_|1 day ago
You can see here that it’s only given write access to specific directories: https://github.com/qwibitai/nanoclaw/blob/8f91d3be576b830081...
fvdessen|1 day ago
float4|1 day ago
It feels like, just like SWEs do with AI, we should treat the claw as an enthusiastic junior: let it do stuff, but always review before you merge (or in this case: send).
jrecyclebin|1 day ago
coffeefirst|1 day ago
But that’s not an agent, that’s a webhook.
Even without disk access, you can email the agent and tell it to forward all the incoming forgot password links.
[Edit: if anyone wants to downvote me that's your prerogative, but want to explain why I'm wrong?]
msdz|1 day ago
Prompt injection is _probably_ solvable if something like [1] ever finds a mainstream implementation and adoption, but agents not being deterministic, as in “do not only what I’ve told you to do, but also how I meant it”, all while assuming perfect context retention, is a waaay bigger issue. If we ever were to have that, software development as a whole is solved outright, too.
[1] Google DeepMind: Defeating Prompt Injections by Design. https://arxiv.org/abs/2503.18813