Agent Safehouse – macOS-native sandboxing for local agents

[+] e1g|4 days ago|reply

Creator here - didn't expect this to go public so soon. A few notes:

1. I built this because I like my agents to be local. Not in a container, not in a remote server, but running on my finely-tuned machine. This helps me run all agents on full-auto, in peace.

2. Yes, it's just a policy-generator for sandbox-exec. IMO, that's the best part about the project - no dependencies, no fancy tech, no virtualization. But I did put in many hours to identify the minimum required permissions for agents to continue working with auto-updates, keychain integration, and pasting images, etc. There are notes about my investigations into what each agent needs https://agent-safehouse.dev/docs/agent-investigations/ (AI-generated)

3. You don't even need the rest of the project and use just the Policy Builder to generate a single sandbox-exec policy you can put into your dotfiles https://agent-safehouse.dev/policy-builder.html

[+] atombender|4 days ago|reply

OP here. Sorry if this was premature. I came across it through your earlier comment on HN, started using it (as did a colleague), and we've been impressed enough with how efficient it is that I decided it deserved a post!

I've seen sandbox policy documents for agents before, but this is the first ready-to-use app I've come across.

I've only had a couple of points of friction so far:

- Files like .gitconfig and .gitignore in the home folder aren't accessible, and can't be made accessible without granting read only access to the home folder, I think?

- Process access is limited, so I can't ask Claude to run lldb or pkill or other commands that can help me debug local processes.

More fine-grained control would be really nice.

[+] TheBengaluruGuy|4 days ago|reply

I'm wondering if this could be adapted for openclaw. Running it in a machine that's accessible reduces friction and enables a lot of use-cases but equally hard to control/restrict it

[+] bouke|3 days ago|reply

I've read through the agent investigation of Codex on macOS. It looks like the default sandbox is pretty limited, however it doesn't match my experience:

- I asked the agent to change my global git username, Codex asked my permission to execute `git config --global user.name "Botje"` and after I granted permission, it was able to change this global configuration.

- I asked it to list my home directory and it was able to (this time without Codex asking for permission).

[+] asabla|4 days ago|reply

Oh woah!

I've been trying to get microsandbox to play nicely. But this is much closer to what I actually need.

I glimpsed through the site and the script. But couldn't really see any obvious gotchas.

Any you've found so far which hasn't been documented yet?

[+] pizlonator|3 days ago|reply

Just wanted to say, this is very cool even (and especially) if it's so simple.

Thanks for making it!

[+] quietsegfault|4 days ago|reply

What’s the difference between running natively and in a container, really?

[+] dionian|4 days ago|reply

i toyed around with policy builder for a few seconds, i was really impressed. great UX

[+] siwatanejo|4 days ago|reply

It's kinda funny that I, being skeptical about coding agents and their potential dangers, was interested to give your project a go because I don't trust AI.

Yet the first thing I find in your README is that to install your tool I need to trust some random server serve me an .sh file that I will execute in my computer (not sure if with sudo... but still).

Come on man, give me a tarball :)

EDIT: PS: before someone gives me the typical "but you could have malware in that tarball too!!!", well, it's easier to inspect what's inside the tarball and compare it to the sources of the repo, maybe also take a look at the CI of the repo to see if the tarball is really generated automatically from the contents of the repo ;)

[+] paxys|4 days ago|reply

Not sure I understand this. Agent CLIs already use sandbox-exec, and you can configure granular permissions. You are basically saying - give the agents access to everything, and configure permissions in this second sandbox-exec wrapper on top. But why use this over editing the CLI's settings file directly (e.g. https://code.claude.com/docs/en/sandboxing#configure-sandbox...)?

[+] scosman|3 days ago|reply

I have sandbox-exec setup for Claude like you suggest, but I’m not sure every CLI supports it? Claude only added it a month or two ago. A wrapper CLI that allows any command to be sandboxed is pretty appealing (Claude config was not trivial).

The downside is that it requires access to more than it technically needs (Claude keys for example). I’m working on a version where you sandbox the agent’s Bash tool, not the agent itself. https://github.com/Kiln-AI/Kilntainers

[+] lucideer|20 hours ago|reply

Some agents use sandbox-exec, some agents don't, some agents that do use it have bugs configuring & executing sandbox-exec.

This may also have bugs but having a single setup for your sandbox reduces the surface area & frankly I have significantly more trust in literally any individual dev creating a free generic open-source tool than I do in a company selling frontier models as a service & vibe-coding[0] an afterthought companion cli at 80k loc/mo.

[0] https://x.com/bcherny/status/2004887829252317325

[+] bootlooped|3 days ago|reply

I've had trouble with the sandbox functionality baked into agents being able to do what I want, particularly Gemini CLI. Being able to write your own .sb file is more powerful and portable.

Claude Code seemed to be able to reach outside its own sandbox sometimes, so I lost trust in it. Manually wrapping it in sandbox-exec solved the issue.

[+] hmokiguess|3 days ago|reply

I think the idea here is to move the responsibility layer away from the agent, rather than trust the CLI will behave and have to learn specific configs for each (given OP's tool works for any agent, not just Claude), this standardizes and centralizes it.

[+] zmmmmm|4 days ago|reply

This is great to see.

I honestly think that sandboxing is currently THE major challenge that needs to be solved for the tech to fully realise its potential. Yes the early adopters will YOLO it and run agents natively. It won't fly at all longer term or in regulated or more conservative corporate environments, let alone production systems where critical operations or data are in play.

The challenge is that we need a much more sophisticated version of sandboxing than anybody has made before. We can start with network, file system and execute permissions - but we need way more than that. For example, if you really need an agent to use a browser to test your application in a live environment, capture screenshots and debug them - you have to give it all kinds of permissions that go beyond what can be constrained with a traditional sandboxing model. If it has to interact with resources that cost money (say, create cloud resources) then you need an agent aware cloud cost / billing constraint.

Somehow all this needs to be pulled together into an actual cohesive approach that people can work with in a practical way.

[+] andybak|4 days ago|reply

> solved

Have you considered that it's unsolvable? Or - at least - there is an irreconcilable tension between capability and safety. And people will always choose the former if given the choice.

[+] silverstream|4 days ago|reply

File-level sandboxing is table stakes at this point — the harder problem is credentials and network. An agent inside sandbox-exec still has your AWS keys, GitHub token, whatever's in the environment. I've been running a setup where a local daemon issues scoped short-lived JWTs to agent processes instead of passing raw credentials through, so a confused agent can't escalate beyond what you explicitly granted. Works well for API access. But like you said, nothing at the filesystem level stops an agent from spinning up 50 EC2 instances on your account.

[+] xyzzy_plugh|4 days ago|reply

This is just a wrapper around sandbox-exec. It's nice that there are a ton of presets that have been thought out, since 90% of wielding sandbox-exec is correctly scoping it to whatever the inner environment requires (the other 90% is figuring out how sandbox-exec works).

I like that it's just a shell script.

I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc

[+] e1g|4 days ago|reply

Thanks, I picked Bash because I’m scared of all Go and Rust binaries out there!

Re “overlay FS” - I too wish this was possible on Macs, but the closest I got was restricting agents to be read-only outside of CWD which, after a few turns, bullies them into working in $TMP. Not the same though.

[+] kstenerud|4 days ago|reply

I took a more paranoid approach to sandboxing agents. They can do whatever they want inside their container, and then I choose which of their changes to apply outside as commits:

    ┌─ YOLO shell ──────────────────────┬─ Outer shell ─────────────────────┐
    │                                   │                                   │
    │ yoloai new myproject . -a         │                                   │
    │                                   │                                   │
    │ # Tell the agent what to do,      │                                   │
    │ # have it commit when done.       │                                   │
    │                                   │ yoloai diff myproject             │
    │                                   │ yoloai apply myproject            │
    │                                   │ # Review and accept the commits.  │
    │                                   │                                   │
    │ # ... next task, next commit ...  │                                   │
    │                                   │ yoloai apply myproject            │
    │                                   │                                   │
    │                                   │ # When you have a good set of     │
    │                                   │ # commits, push:                  │
    │                                   │ git push                          │
    │                                   │                                   │
    │                                   │ # Done? Tear it down:             │
    │                                   │ yoloai destroy myproject          │
    └───────────────────────────────────┴───────────────────────────────────┘

Works with Docker, Seatbelt, and Tart backends (I've even had it build an iOS app inside a seatbelt container).

https://github.com/kstenerud/yoloai

[+] dbmikus|4 days ago|reply

I've been working on an OSS project, Amika[1], to quickly spin up local or remote sandboxes for coding workloads. We support copy-on-write semantics locally (well, "copy-and-then-write" for now... we just copy directories to a temp file-tree).

It's tailored to play nicely with Git: spin up sandboxes form CLI, expose TCP/UDP ports of apps to check your work, and if running hosted sandboxes, share the sandbox URLs with teammates. I basically want running sandboxed agents to be as easy as `git clone ...`.

Docs are early and edges are rough. This week I'm starting to dogfood all my dev using Amika. Feedback is super appreciated!

FYI: we are also a startup, but local sandbox mgmt will stay OSS.

[1]: https://github.com/gofixpoint/amika

[+] divmain|4 days ago|reply

This is what I was going for with Treebeard[0]. It is sandbox-exec, worktrees, and COW/overlay filesystem. The overlay filesystem is nice, in that you have access to git-ignored files in the original directory without having to worry about those files being modified in the original (due to the COW semantics). Though, truthfully, I haven’t found myself using it much since getting it all working.

[0] https://github.com/divmain/treebeard

[+] tuananh|4 days ago|reply

isn't sandbox-exec already deprecated?

[+] simonw|4 days ago|reply

The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other.

This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.

What I really need is help figuring out which ones are trustworthy.

I think this needs to take the form of documentation combined with clearly explained and readable automated tests.

Most sandboxes - including sandbox-exec itself - are massively under-documented.

I am going to trust them I need both detailed documentation and proof that they work as advertised.

[+] pash|4 days ago|reply

Sandvault [0] (whose author is around here somewhere), is another approach that combines sandbox-exe with the grand daddy of system sandboxes, the Unix user system.

Basically, give an agent its own unprivileged user account (interacting with it via sudo, SSH, and shared directories), then add sandbox-exe on top for finer-grained control of access to system resources.

0. https://github.com/webcoyote/sandvault

[+] mkagenius|4 days ago|reply

A way to run claude code inside a apple container -

  $ container system start

  $ container run -d --name myubuntu ubuntu:latest sleep infinity

  $ container exec myubuntu bash -c "apt-get update -qq && apt-get install -y openssh-server"

  $ container exec myubuntu bash -c "
    apt-get install -y curl &&
    curl -fsSL https://deb.nodesource.com/setup_lts.x |
  bash - &&
    apt-get install -y nodejs
  "

  $ container exec myubuntu npm install -g @anthropic-ai/claude-code

  $ container exec myubuntu claude --version

[+] emmelaich|4 days ago|reply

Thanks, hadn't heard of this! In homebrew, too.

https://github.com/apple/container

[+] terhechte|3 days ago|reply

Shuru should do exactly what you want:

https://shuru.run

[+] sunnybeetroot|4 days ago|reply

Lume is also a nice wrapper around it

[+] varenc|4 days ago|reply

fun fact about `sandbox-exec`, the macOS util this relies on: Apple officially deprecated it in macOS Sierra back in 2016!

Its manpage has been saying it's deprecated for a decade now, yet we're continuing to find great uses for it. And the 'App Sandbox' replacement doesn't work at all for use cases like this where end users define their own sandbox rules. Hope Apple sees this usage and stops any plans to actually deprecate sandbox-exec. I recall a bunch of macOS internal services also rely on it.

[+] jasomill|4 days ago|reply

Aside from named profiles, I'm not sure it wasn't born deprecated.

In particular, has the profile language ever been documented by anything other than the examples used by the OS and third parties reverse engineering it?

[+] davidcann|4 days ago|reply

I made a native macOS app with a GUI for sandbox-exec, plus a network sandbox with per-domain filtering and secrets detection: https://multitui.com/

[+] synparb|4 days ago|reply

I’ve been playing around with https://nono.sh/ , which adds a proxy to the sandbox piece to keep credentials out of the agent’s scope. It’s a little worrisome that everyone is playing catch up on this front and many of the builtin solutions aren’t good.

[+] tl2do|4 days ago|reply

Intriguing, but...

Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.

But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.

That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.

[+] e1g|4 days ago|reply

No doubt they are getting better, but even a 0.1% chance of “rm -rf” makes it a question of “when” not “if”. And we sure spin that roulette a lot these days. Safehouse makes that 0%, which is categorically different.

Also, I don’t want it to be even theoretically possible for some file in node_modules to inject instructions to send my dotfiles to China.

[+] jeremyjh|4 days ago|reply

Prompt injection attacks are very much a thing. It doesn't matter how good the agent is, its vulnerable, and you don't know what you don't know.

[+] bilalq|4 days ago|reply

Look into git reflog. If the changes were committed, it was almost certainly possible to still restore them, even if the commit is no longer in your branch.

[+] alpb|4 days ago|reply

As I understand it, the problem nowadays doesn't seem to be so much that the agent is going to rm -rf / my host, it's more like it's going to connect to a production system that I'm authorized to on my machine or a database tool, and then it's going to run a potentially destructive command. There is a ton of value of running agents against production systems to troubleshoot things, but there are not enough guardrails to prevent destructive actions from the get-go. The solution seems to be specific to each system, and filesystem is just one aspect out of many.

[+] crossroadsguy|4 days ago|reply

As I understand it, the problem is these apps/agents can do all of these and lot more (if not absolutely everything, while I am sure it can go quite close to doing that).

Solution could be two parts:

OS bringing better and easier to use OS limitations (more granular permissions; install time options and defaults which will be visible to user right there and user can reject that with choices like:

- “ask later”

- “no”

- “fuck no”

with eli5 level GUIs (and well documented). Hell, a lot of these are already solved for mobile OS. While not taking away tools away from hands of the user who wants to go inside and open things up (with clear intention and effort; without having to notarise some shit or pay someone).

2. Then apps[1] having to, forced to, adhere to use those or never getting installed.

[1] So no treating of agents as some “other” kinds of apps. Just limit it for every app (unless user explicitly decides to open things up).

It will also be a great time to nuke the despicable mess like Electron Helpers and shit and app devs considering it completely fine to install a trillion other “things” when user installed just one app without explaining it in the beginning (and hence forced to keep their apps’ tentacles simple and limited)

[+] garganzol|4 days ago|reply

While we have `sandbox-exec` in macOS, we still don't have a proper Docker for macOS. Instead, the current Docker runs on macOS as a Linux VM which is useful but only as a Linux machine goes.

Having real macOS Docker would solve the problem this project solves, and 1001 other problems.

[+] mkagenius|4 days ago|reply

Apple containers were released a few months back. Been using it to sandbox claude/gemini-cli generated code[1].

You can use it to completely sandbox claude code too.

1. Coderunner - https://github.com/instavm/coderunner

[+] SiteMgrAI|3 days ago|reply

Sandboxing is going to be table stakes for any serious deployment of AI agents in regulated industries. In sectors like construction, healthcare, or finance, you cannot have an agent with unrestricted filesystem or network access making decisions that affect safety-critical documentation. The macOS sandbox approach is smart because it leverages the OS-level enforcement rather than relying on application-layer restrictions that an agent could potentially reason its way around. The real question is how you balance useful tool access with meaningful containment when the whole point of agents is autonomous action.

[+] agent5ravi|3 days ago|reply

Sandboxing is half the story. The other half is external blast radius: if your local agent can email/DM/pay using your personal accounts, the sandbox doesn't help much. What I want is a separate, revocable identity context per agent or per task: its own inbox/phone for verification, scoped credentials with expiry, and an audit log that survives delegation to sub-agents. We ran into this building Ravi: giving an agent a phone number is easy; keeping delegation traceable to the right principal is the hard bit.

[+] webpolis|3 days ago|reply

The macOS sandbox approach is clever, but there's an interesting philosophical split here: sandboxing constrains a local agent, whereas running agents in ephemeral cloud desktops removes the local risk surface entirely.

We built Cyqle (https://cyqle.in) partly around this idea — each session is a full Linux desktop that's cryptographically wiped on close (AES-256 key destroyed, data unrecoverable). Agents can do whatever they want inside, and the blast radius is zero by design. No residual state, no host OS exposure.

The tradeoff is latency and connectivity requirements. For teams already doing cloud-based dev work, it's a natural fit. For local-first workflows where you need offline capability or sub-50ms responsiveness, something like Agent Safehouse makes more sense.

Both approaches are worth having — the threat model differs depending on whether you're more worried about data exfiltration or local system compromise.

[+] hsaliak|4 days ago|reply

This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.

https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!

[+] w10-1|4 days ago|reply

But... why not just run macOS in a VM?

If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)

AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)

I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.

Not sure if I'm missing something...

[+] carderne|3 days ago|reply

How do agents tend to deal with getting blocked? Messing around with sandboxes, I've quite even seen them get blocked, assume something is wrong, and go _crazy_ trying to get around the block, never stopping to ask for user input. It might be good to add to the error message: "This is deliberate, don't try to get around it."

For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.

[1] https://github.com/carderne/pi-sandbox

[+] e1g|3 days ago|reply

Claude Code and Codex quickly figure out they are inside sandbox-exec environment. Maybe because they know it internally. Other agents often realize they are being blocked, and I haven't seen them go haywire yet.

Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.

[+] gbrindisi|3 days ago|reply

ah I also did my own sandbox and at least twice the agent inside tried really hard to go around the firewall, so I ended up intercepting calls to `connect` to return a message that says "Connection refused by the sandbox, don't try to bypass".

Code here: https://github.com/gbrindisi/agentbox

[+] matifali|4 days ago|reply

I wonder why you believe that running agents locally is the best approach. For most people, having agents operate remotely is more effective because the agent can stay active without your local machine needing to remain powered on and connected to the internet 24/7.

181 comments