IMO, for real file systems, just give a view via cgroups/namespaces.
Implementing a database abstraction as a file system for an LLM feels like an extra layer of indirection for indirection's sake: just have the LLM write some views/queries/stored procs and give it sane access permissions.
LLMs are smart enough to use databases, email, etc without needing a FUSE layer to do so, and permissions/views/etc will keep it from doing or seeing stuff it shouldn't. You'll be keeping access and permissions where they belong, and not in a FUSE layer, and you won't have to maintain a weird abstraction that's annoying/hampered with licensing issues if you want to deploy it cross platform.
Also, your simplified FUSE abstraction will not map accurately to the state of the world unless you're really comprehensive with your implementation, and at that point, you might as well be interacting directly in order to handle that state accurately.
Agree that to far fetched mappings to files don’t really make sense. The email example is more illustrative then real world inspired, thought it might be good to show how flexible the approach is.
I think there is a gap between “real file systems” and “non file things in a database” where mapping your application representation of things to a filesystem is useful. Basically all those platforms that let users upload files for different purposes and work with them (ex Google Drive, notion, etc). In those cases representing files to an agent via a filesystem is the more intuitive and powerful interface compared to some home grown tools that the model never saw during training.
We have also attempted to implement exactly this but it turned out to be really bad architecture.
The file system as an abstraction is actually not that good at all beyond the basic use-cases. Imagine you need to find an email. If you grep (via fuse) you will end up opening lots of files which will result in fetches to some API and it will be slow. You can optimise this and caching works after first fetch but the method is slow. The alternative is to leverage the existing API which will be million times faster. Now you could also create some kind of special file via fuse that acts like a search but it is weird and I don't think the models will do well with something so obscure.
We went as much as implementing this idea in rust to really test it out and ultimately it was ditched because, well it sucks.
> The file system as an abstraction is actually not that good at all beyond the basic use-cases. Imagine you need to find an email.
Unrelated to FUSE and MCP[1] agents, this scenario reminded me of using nmh[0] as an email client. One of the biggest reasons why nmh[0] is appealing is to script email handling, such as being able to use awk/find/grep/sed and friends.
> If you grep (via fuse) you will end up opening lots of files which will result in fetches to some API and it will be slow.
This is a limitation of the POSIX filesystem interface. If there were a grep() system call, it could delegate searches to the filesystem, which could use full text indices, run them on a remote server, etc
I've been getting into FUSE a bit lately, as I stole an idea that a friend had of how to add CoW features to an existing non-CoW filesystem, so I've been on/off been hacking on a FUSE driver for ext4 to do that.
To learn FUSE, however, I started just making everything into filesystems that I could mount. I wrote a FUSE driver for Cassandra, I wrote a FUSE driver for CouchDB, I wrote a FUSE driver for a thing that just wrote JSON files with Base64 encoding.
None of these performed very well and I'm sort of embarrassed at how terrible the code is hence why I haven't published them (and they were also just learning projects), but I did find FUSE to be extremely fun and easy to write against. I encourage everyone to play with it.
For ZeroFS [0], I went an alternate route with NFS/9P. I am surprised that it’s not more common as this approach has various advantages [1] while being much more workable than fuse.
Interesting! The network first point makes a lot of sense, especially bc you will most likely not access your actual datastore within the process running in the sandbox and instead just call some server that handles db access, access control etc.
It never left :p (I think that there are still active forks of Plan9 and Plan9 itself has definitely influenced some linux features or so I have heard)
I agree with most people who commented. This looks like an abstraction without a clear purpose, which is not a good thing. Particularly, using fuse as a wrapper for a REST API is ineffective and redundant, since an LLM can work with it more effectively using curl provided an API spec in any format.
Do you think most people under the age of 30 remember you can share a single computer between multiple users? When there was a single "home computer" or "PC" in the home, you learned about users and different rights. Unless you were a user back in those days or you've tinkered with any admin work, you wouldn't know this in 2026.
Or just implement something like storage-combinators [1][2].
Basically an abstraction that is filesystem-like, but doesn't require a filesystem. Though you can both export storage-combinators as filesystem and, of course, also access filesystems via storage-combinators.
I put together a spec for this where the entire LLM agent landscape adheres to the "Everything is a file" constraint. It uses the FUSE filesystem in the way described. I also created a possible limitation document to describe some areas where I thought it might be overengineered or locking in technical debt.
See my own https://github.com/matthiasgoergens/git-snap-fs which lets you expose all branches and all tags and all commits and all everything in a git repository as a static directory tree. No need to git checkout anything: everything is already checked out.
Implemented this a few years back by abstracting the upstream callbacks. You can mount pretty much any API endpoint as a filesystem with a bit of JavaScript glue. The FUSE layer is in Go: https://github.com/autovia/wfs
> My prediction is that one of the many sandbox providers will come up with a nice API on top of this that lets you do something like ... No worrying about FUSE, the sandbox, where things are executed, etc. This will be a huge differentiator and make virtual filesystems easily accessible to everyone.
I've done exactly that with Filestash [1] using its virtual filesystem plugin [2], which exposes arbitrary systems as a filesystem. It turns out the filesystem abstraction works extremely well even for systems that are not filesystems at all. There are connector for literally every possible storage (SFTP, S3, GDrive, Dropbox, FTP, Sharepoint, GCP, Azure Cloud, IPFS....), but also things like MySQL and Postgres (where the first level folder represent the list of databases, the second level is tables that belong to a database, and each row is represented as a form file generated from the schema), LDAP (where tree nodes are represented as folders and leaf are form files), ....
The whole filesystem is available to agents via MCP [3] and has been published to the OpenAI marketplace since around Christmas, currently pending review.
- agents tend to need (already have) a filesystem anyway to be useful (not technically required but generally true, they’re already running somewhere with a filesystem)
- LLMs have a ton of CLI/filesystem stuff in their training data, while MCP is still pretty new (FUSE is old and boring)
- MCP tends to bloat context (not necessarily true but generally true)
UNIX philosophy is really compelling (moreso than MCP being bad). if you can turn your context into files, agents likely “just work” for your use case
I am so sick of the ‘sandboxed’ AI-infra meme. A container is not a sandbox. A chroot is not a sandbox. A VM is also not a sandbox. A filesystem is also also not a sandbox. You can sandbox an application, you can run an application in a secure context, but this is not a secure context the author is describing, firstly, and secondly they haven’t described any techniques for sandboxing unless that part of the page didn’t load for me somehow.
Didn’t mean to say this is a sandbox, it certainly isn’t, this is just an illustration on how to bridge the gap and make things available in a file system from the source of truth of your application.
There is tons of more complexity to sandboxing, I agree!
I recently had a question about what AI sandboxes use and I think Modal uses gvisor under the hood and I think others use firecracker/generally favour it as well
Firecracker kind of ends up being in the VM categories and I would place gvisor in a similar category too under the VM
There is also https://github.com/Zouuup/landrun Run any Linux process in a secure, unprivileged sandbox using Landlock. Think firejail, but lightweight, user-friendly, and baked into the kernel.
Your mileage may vary but I consider firecracker to be the AI sandbox usually. Othertimes it can be that they abstract on a cloud provider and open up servers in that or similar (I feel E2B does this on top of gcp)
heavyset_go|1 month ago
Implementing a database abstraction as a file system for an LLM feels like an extra layer of indirection for indirection's sake: just have the LLM write some views/queries/stored procs and give it sane access permissions.
LLMs are smart enough to use databases, email, etc without needing a FUSE layer to do so, and permissions/views/etc will keep it from doing or seeing stuff it shouldn't. You'll be keeping access and permissions where they belong, and not in a FUSE layer, and you won't have to maintain a weird abstraction that's annoying/hampered with licensing issues if you want to deploy it cross platform.
Also, your simplified FUSE abstraction will not map accurately to the state of the world unless you're really comprehensive with your implementation, and at that point, you might as well be interacting directly in order to handle that state accurately.
jakobem|1 month ago
I think there is a gap between “real file systems” and “non file things in a database” where mapping your application representation of things to a filesystem is useful. Basically all those platforms that let users upload files for different purposes and work with them (ex Google Drive, notion, etc). In those cases representing files to an agent via a filesystem is the more intuitive and powerful interface compared to some home grown tools that the model never saw during training.
efitz|1 month ago
_pdp_|1 month ago
The file system as an abstraction is actually not that good at all beyond the basic use-cases. Imagine you need to find an email. If you grep (via fuse) you will end up opening lots of files which will result in fetches to some API and it will be slow. You can optimise this and caching works after first fetch but the method is slow. The alternative is to leverage the existing API which will be million times faster. Now you could also create some kind of special file via fuse that acts like a search but it is weird and I don't think the models will do well with something so obscure.
We went as much as implementing this idea in rust to really test it out and ultimately it was ditched because, well it sucks.
AdieuToLogic|1 month ago
Unrelated to FUSE and MCP[1] agents, this scenario reminded me of using nmh[0] as an email client. One of the biggest reasons why nmh[0] is appealing is to script email handling, such as being able to use awk/find/grep/sed and friends.
0 - https://www.nongnu.org/nmh/
1 - https://en.wikipedia.org/wiki/Model_Context_Protocol
skissane|1 month ago
This is a limitation of the POSIX filesystem interface. If there were a grep() system call, it could delegate searches to the filesystem, which could use full text indices, run them on a remote server, etc
tombert|1 month ago
To learn FUSE, however, I started just making everything into filesystems that I could mount. I wrote a FUSE driver for Cassandra, I wrote a FUSE driver for CouchDB, I wrote a FUSE driver for a thing that just wrote JSON files with Base64 encoding.
None of these performed very well and I'm sort of embarrassed at how terrible the code is hence why I haven't published them (and they were also just learning projects), but I did find FUSE to be extremely fun and easy to write against. I encourage everyone to play with it.
eru|1 month ago
Eikon|1 month ago
[0] https://github.com/Barre/ZeroFS
[1] https://github.com/Barre/ZeroFS?tab=readme-ov-file#why-nfs-a...
jakobem|1 month ago
fleshmonad|1 month ago
olav|1 month ago
Maybe the most mainstream incarnation is its use in the Windows Subsystem for Linux (WSL).
Imustaskforhelp|1 month ago
naushniki|1 month ago
nubskr|1 month ago
neilwilson|1 month ago
Agents are uses on a Unix-based computer that is capable of and indeed was designed for multi-user collaboration.
Why not go for the simple solution?
godzillabrennus|1 month ago
mpweiher|1 month ago
Basically an abstraction that is filesystem-like, but doesn't require a filesystem. Though you can both export storage-combinators as filesystem and, of course, also access filesystems via storage-combinators.
[1] https://dl.acm.org/doi/10.1145/3359591.3359729
[2] https://2019.splashcon.org/details/splash-2019-Onward-papers...
Jimmc414|1 month ago
https://github.com/jimmc414/AgentOS
rescrv|1 month ago
eru|1 month ago
diasp|1 month ago
everlier|1 month ago
It opens up absolutely bonkers capabilities.
mickael-kerjean|1 month ago
I've done exactly that with Filestash [1] using its virtual filesystem plugin [2], which exposes arbitrary systems as a filesystem. It turns out the filesystem abstraction works extremely well even for systems that are not filesystems at all. There are connector for literally every possible storage (SFTP, S3, GDrive, Dropbox, FTP, Sharepoint, GCP, Azure Cloud, IPFS....), but also things like MySQL and Postgres (where the first level folder represent the list of databases, the second level is tables that belong to a database, and each row is represented as a form file generated from the schema), LDAP (where tree nodes are represented as folders and leaf are form files), ....
The whole filesystem is available to agents via MCP [3] and has been published to the OpenAI marketplace since around Christmas, currently pending review.
ref:
[1]: https://github.com/mickael-kerjean/filestash
[2]: https://www.filestash.app/docs/guide/virtual-filesystem.html
[3]: https://www.filestash.app/docs/guide/mcp-gateway.html https://github.com/mickael-kerjean/filestash/tree/master/ser...
disdi89|1 month ago
ohnoesjmr|1 month ago
dkdcio|1 month ago
- agents tend to need (already have) a filesystem anyway to be useful (not technically required but generally true, they’re already running somewhere with a filesystem)
- LLMs have a ton of CLI/filesystem stuff in their training data, while MCP is still pretty new (FUSE is old and boring)
- MCP tends to bloat context (not necessarily true but generally true)
UNIX philosophy is really compelling (moreso than MCP being bad). if you can turn your context into files, agents likely “just work” for your use case
ainiro|1 month ago
You can test it here ==> https://ainiro.io/natural-language-api
michaelmior|1 month ago
glemmaPaul|1 month ago
jacob019|1 month ago
AmazingTurtle|1 month ago
jasdfawe|1 month ago
moonlet|1 month ago
jakobem|1 month ago
There is tons of more complexity to sandboxing, I agree!
tptacek|1 month ago
Imustaskforhelp|1 month ago
Firecracker kind of ends up being in the VM categories and I would place gvisor in a similar category too under the VM
So in my opinion, VM's are sandboxes.
Of course there is also libriscv https://github.com/libriscv/libriscv which is a sandbox (The fastest RISC-V sandbox)
There is also https://github.com/Zouuup/landrun Run any Linux process in a secure, unprivileged sandbox using Landlock. Think firejail, but lightweight, user-friendly, and baked into the kernel.
Your mileage may vary but I consider firecracker to be the AI sandbox usually. Othertimes it can be that they abstract on a cloud provider and open up servers in that or similar (I feel E2B does this on top of gcp)
lagniappe|1 month ago
ape4|1 month ago