top | item 46687551

Show HN: Circe – Deterministic, offline-verifiable receipts for AI agent actions

2 points| W_rey45 | 1 month ago |github.com

I’ve been working on a small primitive for agentic systems: a cryptographically signed receipt that records what an AI agent decided, what it did, and what changed — as a single canonical JSON artifact.

The problem: Agent systems today rely on logs, dashboards, or proprietary consoles for truth. Those are easy to forge, truncate, or lose. If an agent takes a high-stakes action (e.g. a firewall change, a deployment, a purchase), there’s no portable artifact you can independently verify later.

The idea: Treat agent execution like a signed transaction, not a log stream. Each run emits a receipt that can be verified offline, without trusting the issuer’s infrastructure.

How it works (minimal core):

Deterministic signing: Ed25519 signatures over a canonical JSON byte string

Canonicalization: RFC 8785-style JSON canonicalization (stable key ordering, UTF-8 encoding, no insignificant whitespace)

Tamper evidence: Any mutation of the signed payload flips the SHA-256 hash and invalidates the signature

Offline verification: A standalone verifier script; no network calls, no dependencies on the issuer

Try it locally (no network):

python verify_receipt.py hn_receipt.json python verify_receipt.py hn_receipt_tampered.json

The first passes; the second fails after a single-field mutation.

This is intentionally not a logging system, observability platform, or policy engine. It’s a small integrity / provenance primitive intended to compose with higher-level agent frameworks.

I’d appreciate feedback on:

Threat-model gaps (e.g. confused-deputy or context-hijacking risks)

Schema ergonomics for high-frequency or long-running agent pipelines

Canonicalization edge cases worth enforcing earlier

5 comments

order

W_rey45|1 month ago

Threat model / scope: This design assumes the signer’s private key is trusted at issuance time; it does not attempt to prove semantic correctness of the agent’s reasoning or inputs. The signature covers only the canonicalized signed_block; any mutation invalidates verification. Receipts are portable and verifiable offline but do not prevent a malicious issuer from signing false data (integrity primitive, not a truth oracle). Replay is detectable (e.g. via hash chaining or external indexing) but not prevented by the receipt alone. Confidentiality is out of scope; receipts are integrity-only artifacts. The goal is to make post-hoc tampering and log forgery detectable, not to replace policy enforcement or access control.

nulone|1 month ago

Solid primitive. Two questions:

1. Crash edge case: If an agent executes a side-effect and dies before signing the receipt, is that action orphaned? Any WAL-style intent/completion model?

2. Multi-step workflows: Do receipts chain natively (parent pointers/Merkle) or via external linking? (I see storage/ledgers are out of scope, but curious about the linkage design.)

The negative proof angle (proving AI didn't touch prod) is compelling for compliance.

W_rey45|1 month ago

Great questions.

1) Crash/orphan side-effect: agreed this needs an intent/commit model. The clean pattern is WAL-style: emit/sign an “INTENT” receipt before side-effect + a “COMMIT” receipt after; absence of COMMIT is itself evidence (“we can’t attest completion”). Another option is tool-level signing so the side-effecting tool returns a signed result that the agent includes.

2) Linking: yes linkage can be native without building storage. Next iteration is parent_hash (or prev_hash) inside the signed_block so receipts naturally chain; Merkle/log indexing stays external.

kxbnb|1 month ago

The "signed transaction, not a log stream" framing is exactly right. Logs are optimistic - you assume they're complete and unmodified. Receipts are pessimistic - you verify before trusting.

We've been thinking about a related problem at toran.sh: capturing what an agent actually sent to external APIs (and what came back) without trusting the agent's self-reported logs. Different angle - we focus on the API request/response level rather than the decision/action level - but the same underlying insight: the source of truth needs to be outside the agent's control.

The Ed25519 + canonical JSON approach is clean. Question: how are you handling schema evolution? If the receipt format changes, older receipts still need to verify but newer tooling might expect different fields.

W_rey45|1 month ago

Great question this is exactly the tension we’re trying to be explicit about.

CIRCE separates cryptographic verification from semantic interpretation. The signature covers a minimal, stable signed_block (canonicalized → hashed → signed). Everything else is metadata that can evolve without affecting verification.

Older receipts remain verifiable because the verifier only assumes the signed scope + canonicalization rules. Newer tooling can understand more fields, but must ignore unknown/missing fields (JWT / signed artifact style). We also include a schema identifier/hash for tooling selection, but it’s intentionally not security-critical — verification is purely about integrity.

Also: toran.sh’s angle is super aligned. Capturing actual API request/response outside the agent’s control feels like the “ground truth” complement to CIRCE’s “decision truth.” Curious: are you anchoring the API transcript via a sidecar/proxy with its own signing key, or are you doing something like a transparency log/Merkle chain for requests?