niyikiza's comments

niyikiza | 3 days ago | on: Agent Safehouse – macOS-native sandboxing for local agents

The two-layer framing is right. Sandbox-exec contains local blast radius, and that's important. But if the agent already has a credential in memory, sandboxing the filesystem doesn't help. I've been working on a primitive for scoped authorization at the tool call level: what was this agent allowed to do, for which task, signed by whom. The core is open-sourced: https://github.com/tenuo-ai/tenuo

niyikiza | 16 days ago | on: NIST Seeking Public Comment on AI Agent Security (Deadline: March 9, 2026)

Good distinction, but I wonder if it's worth going further: context integrity may be fundamentally unsolvable. Agents consume untrusted input by design. Trying to guarantee the model won't be tricked seems like the wrong layer to bet on. What seems more promising is accepting that the model will be tricked and constraining what it can do when that happens. Authorization at the tool boundary, scoped to the task and delegation chain rather than the agent's identity. If a child agent gets compromised, it still can't exceed the authority that was delegated to it. Contain the blast radius instead of trying to prevent the confusion.

(Disclaimer: working on this problem at tenuo.ai)

niyikiza | 21 days ago | on: OpenClaw is dangerous

Right on. Human-in-the-loop doesn't scale at agent speed. Sandboxing constrains tool execution environments, but says nothing about which actions an agent is authorized to take. That gets even worse once agents start delegating to other agents.I've been building a capability-based authz solution: task-scoped permissions that can only narrow through delegation, cryptographically enforced, offline verification. MIT/Apache2.0, Rust Core. https://github.com/tenuo-ai/tenuo

niyikiza | 1 month ago | on: AI is killing B2B SaaS

Spot on. You could argue that most companies buying B2B SaaS could almost always build a clone internally but they need someone to assume SLA and liability.

niyikiza | 1 month ago | on: Kimi K2.5 Technical Report [pdf]

The Agent Swarm section is fascinating. I'm working on authorization for multi-agent systems so this is relevant to my interests. Lots of interesting parallels to capability-based security models.

niyikiza | 1 month ago | on: The Hallucination Defense

Exactly ... and that's why I'm skeptical of "AI verifies AI" as the primary safety mechanism. The verifier for moving money should be deterministic: constraints, allowlists, spend limits, invoice/PO matching, etc. The LLM can propose actions, but the execution should be gated by a human/polic-issued scope that's mechanically enforced. That's the whole point: constrain the non-deterministic layer with a deterministic one. [0] [0] https://tenuo.dev/constraints

niyikiza | 1 month ago | on: The Hallucination Defense

A worker agent doesn't mint warrants. It receives them. Either it requests a capability and an issuer approves, or the issuer pushes a scoped warrant when assigning a task. Either way, the issuer signs and the agent can only act within those bounds.

At execution time, the "verifier" checks the warrant: valid signatures, attenuation (scope only narrows through delegation), TTL (authority is task-scoped), and that the action fits the constraints. Only then does the call proceed.

This is sometimes called the P/Q model: the non-deterministic layer proposes, the deterministic layer decides. The agent can ask for anything. It only gets what's explicitly granted.

If the agent asks for the wrong thing, it fails closed. If an overly broad scope is approved, the receipt makes that approval explicit and reviewable.

niyikiza | 1 month ago | on: The Hallucination Defense

Yep ... that's exactly the direction. Think "default deny + step-up," not "grant everything up front."

You keep a coarse cap (e.g. email read/write, invoice pay) but each task runs under a narrower, time-boxed warrant derived from that cap. Narrowing happens at the policy/UX layer (human or deterministic rules), not by the LLM. The LLM can request escalation ("need send"), but it only gets it via an explicit approval / rule.

Crypto isn't deciding scope. It's enforcing monotonic attenuation, binding the grant to an agent key, and producing a receipt that the scope was explicitly approved.

For a single-process agent this might be overkill. It matters more when warrants cross trust boundaries: third-party tools, sub-agents in different runtimes, external services. Offline verification means each hop can validate without calling home

niyikiza | 1 month ago | on: The Hallucination Defense

You've got the model right. And saving prompt logs does help with reconstruction.

But warrants aren't just "more audit data." They're an authorization primitive enforced in the critical path: scope and constraints are checked mechanically before the action executes. The receipt is a byproduct.

Prompt logs tell you what the model claimed it was doing. A warrant is what the human actually authorized, bound to an agent key, verifiable without trusting the agent runtime.

This matters more in multi-agent systems. When Agent A delegates to Agent B, which calls a tool, you want to be able to link that action back to the human who started it. Warrants chain cryptographically. Each hop signs and attenuates. The authorization provenance is in the artifact itself.

niyikiza | 1 month ago | on: Ask HN: How are you handling non-probabilistic security for LLM agents?

Working on this problem: https://github.com/tenuo-ai/tenuo

Different angle than policy-as-YAML. We use cryptographic capability tokens (warrants) that travel with the request. The human signs a scoped, time-bound authorization. The tool validates the warrant at execution, not a central policy engine.

On your questions:

Canonicalization: The warrant specifies allowed capabilities and constraints (e.g., path: /data/reports/*). The tool checks if the action fits the constraint. No need to normalize LLM output into a canonical representation.

Stateful intent: Warrants attenuate. Authority only shrinks through delegation. You can't escalate from "read DB" to "POST external" unless the original warrant allowed both. A sub-agent can only receive a subset of what its parent had, cryptographically enforced.

Latency: Stateless verification, ~27μs. No control plane calls. The warrant is self-contained: scope, constraints, expiry, holder binding, signature chain. Verification is local.

The deeper issue with policy engines: they check rules against actions, but they can't verify derivation. When Agent B acts, did its authority actually come from Agent A? Was it attenuated correctly?

Wrote about why capabilities are the only model that survives dynamic delegation: https://niyikiza.com/posts/capability-delegation/

niyikiza | 1 month ago | on: The Hallucination Defense

Right, the non-deterministic layer can't be the one deciding scope. That's the human's job at the root.

The LLM can request a narrower scope, but attenuation is monotonic and enforced cryptographically. You can't sign a delegation that exceeds what you were granted. TTL too: the warrant can't outlive its parent.

So yes, key management. But the pathological "Allow: *" has to originate from a human who signed it. That's the receipt you're left holding.

You're poking at the right edges though. UX for scope definition and revocation propagation are what we're working through now. We're building this at tenuo.dev if you want to dig in the spec or poke holes.

niyikiza | 1 month ago | on: The Hallucination Defense

Tokens + filters work for single-agent, single-hop calls. Gets murky when orchestrators spawn sub-agents that spawn tools. Any one of them can hallucinate or get prompt-injected. We're building around signed authorization artifacts instead. Each delegation is scoped and signed, chains are verifiable end-to-end. Deterministic layer to constrain the non-deterministic nature of LLMs.

niyikiza | 1 month ago | on: The Hallucination Defense

This is the problem we're working on.

When orchestrators spawn sub-agents spawn tools, there's no artifact showing how authority flowed through the chain.

Warrants are a primitive for this: signed authorization that attenuates at each hop. Each delegation is signed, scope can only narrow, and the full chain is verifiable at the end. Doesn't matter how many layers deep.

niyikiza | 1 month ago | on: The Hallucination Defense

Similar space, different scope/Approach. Tenuo warrants track who authorized what across delegation chains (human to agent, agent to sub-agent, sub-agent to tool) with cryptographic proof & PoP at each hop. Trace tracks provenance. Warrants track authorization flow. Both are open specs. I could see them complementing each other.

niyikiza | 1 month ago | on: The Hallucination Defense

You'd be surprised to see how often we're seeing those types of semantic attack vulnerabilities in Agent frameworks: https://niyikiza.com/posts/map-territory/

niyikiza | 1 month ago | on: The Hallucination Defense

> if you signed the document, you own its content. Versus some vendor-provided AI Agent which simply takes action on its own

Yeah that's exactly the I think we should adopt for AI agent tool calls as well: cryptographically signed, task scoped "warrants" that can be traceable even in cases of multi-agent delegation chains

niyikiza | 1 month ago | on: The Hallucination Defense

Agree ... retention is mandatory. The article argues you should retain authorization artifacts, not just event logs. Logs show what happened. Warrants show who signed off on what

niyikiza | 1 month ago | on: The Hallucination Defense

You're right, they should be responsible. The problem is proving it. "I asked it to summarize reports, it decided to email the competitor on its own" is hard to refute with current architectures.

And when sub-agents or third-party tools are involved, liability gets even murkier. Who's accountable when the action executed three hops away from the human? The article argues for receipts that make "I didn't authorize that" a verifiable claim

niyikiza | 1 month ago | on: Semantic Attacks: Exploiting What Agents See

Author here.

Correction: I accidentally submitted the Substack link instead of the full technical write-up. You can read the complete post with all the attack vectors here: https://niyikiza.com/posts/semantic-attacks/

We stumbled on these vectors while building an authorization protocol for agents.

Everyone seems focused on "Prompt Injection" (the brain), but the perception integrity seems to be under discussed. I look at agents like pilots flying on instruments: if the DOM feeds them false data, no amount of reasoning or prompt engineering can prevent the crash.

This post breaks down the specific ways attackers can compromise those instruments without touching the prompt.

niyikiza | 1 month ago | on: Show HN: Gambit, an open-source agent harness for building reliable AI agents

Injecting context via tool outputs to hit Layer 6 is a clever way to leverage the model spec.

The gap I keep coming back to is that even at Layer 6, enforcement is probabilistic. You are still negotiating with the model's weights. "Less likely to fail" is great for reliability, but hard to sell on a security questionnaire.

Tenuo operates at the execution boundary. It checks after the model decides and before the tool runs. Even if the model gets tricked (or just hallucinates), the action fails if the cryptographic warrant doesn't allow that specific action.

Re: Hypercore/P2P, I actually see that as the identity layer we're missing. You need a decentralized root of trust (Provenance) to verify who signed the Warrant (Authorization). Tenuo handles the latter, but it needs something like Hypercore for the former.

Would be curious to see how Gambit's Deck pattern could integrate with warrant-based authorization. Since you already have typed inputs/outputs, mapping those to signed capabilities seems like a natural fit.