CapROS: Capability-Based Reliable Operating System

[+] mfedderly|3 months ago|reply

I had the privilege of taking two classes with Dr Shapiro while I was in undergrad. The second class revolved around a related operating system named Coyotos. One of the most memorable classes was a 3 hour session where we worked through the boot sequence step by step [1]. The single lecture helped us all appreciate the delicate dance to bring up an x86 processor, a history lesson in the various features that had been bolted onto x86 over time, and a bunch of helpful debugging tips when your options are limited (it prints "Co" "yo" and "tos" in different stages!).

This was easily one of my most memorable lectures from undergrad, and it really helped to show me that even your operating system is just more software that you can read and understand.

1. https://github.com/vsrinivas/coyotos/blob/c68719b851e253aa11...

[+] ryanjshaw|3 months ago|reply

I was a nerdy kid living in the middle of nowhere in Africa. I think we’d had dialup for about 2 years at that, and I emailed him with some questions about how to understand the mathematical notations used in his EROS work. He was very kind and helpful in his response, even though my questions were probably very naive.

[+] kragen|3 months ago|reply

Coyotos and CapROS are two continuations of EROS.

[+] ajb|3 months ago|reply

Dr Shapiro has "open to work" on his LinkedIn right now, FWIW. Don't know what kind of work he's interested in today.

I followed his work on bitc for a while (it was his alternative to rust).

[+] kragen|3 months ago|reply

Seems like Charlie hasn't been merging pull requests in three years: https://github.com/capros-os/capros

And the list has been idle since then: https://sourceforge.net/p/capros/mailman/capros-devel/

I wonder if something has happened to him? I hope he's okay.

[+] btilly|3 months ago|reply

The fact that we went with access control lists instead of true capabilities has long been a disappointment to me.

For people who understand OO, capabilities are the simplest model in the world. You hand out objects. You can call methods on the object. What that method call has access to depends on the permissions on the object, not your permissions. Entire classes of security mistakes (most notably the "confused deputy" become impossible.

The only commercial success that was a true capability system was the AS/400. Not coincidently, single stand alone machines averaged 99.99%-99.999% uptime. And it never had a significant security compromise. (Individual systems did, of course, have problems due to weak passwords and poor configuration. But they were still remarkably resistant.

Capability systems work so well that when people wanted to improve security on Linux, they called it capabilities. Even though it wasn't.

Unfortunately, the world went with ACLs. That's baked in to the design of things like Windows and POSIX. Which means that all of the consumer software out there expects ACLs. In order to get them to run on a pure capability system, you have to do things like create a POSIX subsystem. At which point, you've just thrown away the whole reason to use capabilities in the first place.

[+] Findecanor|3 months ago|reply

The big problem is that you'd need to be able to change permissions over time. With ACLs that is simple and direct: if you have the access right, you just change the ACL. Traditional capabilities last forever, unless there is some sort of support for revoking already issued capabilities, and those mechanisms tend are far from straightforward.

Some systems have revocation as a core feature, but a cascading revocation (every delegation as a branch in a tree, and revoke a whole subtree of delegated capabilities) is often complex and takes time, especially if they are on disk. There have also been protocols (for EROS-like OS:es) for setting up systems with additional capabilities to revoke individual capabilities but they are even more complex IMHO. So, in most capability systems the only way to revoke capabilities to a resource is to remove the resource itself.

In CHERI, where every pointer is a capability, revocation of capabilities into a memory object relies on what is effectively a parallel garbage collector process that finds all pointers to revoked objects and overwrites them with an invalid pointer that traps on use. [0]

In the fantasy OS of my mind, ACLs have instead been promoted to "access-control trees" that include a "grant option", allowing a user to grant the permission she has to someone else. But once the first user's permissions are revoked, the sub-tree of re-granted permissions get revoked as well. I think that could be achieved with existing file systems ACLs, with added topology info and enforcement by the OS. Then actual capabilities would be created first when a file is opened, as file handles, but unlike Unix file handles they could be revoked, be revoked in a cascading manner, and revoked automatically if the underlying ACT gets changed.

Authorization Certificates (as in X.509) are a type of distributed cryptographic capabilities, but require complex distribution of "revocation lists". In recent years, there new types of distributed "authorization tokens" have been introduced such as e.g. "Biscuits" [1].

[0] https://www.semanticscholar.org/paper/Cornucopia-Reloaded%3A...

[1] https://www.biscuitsec.org

[+] ryanjshaw|3 months ago|reply

It’s bizarre to me that not one megawealthy tech nerd has thrown 8 figures at some smart people in an attempt to solve the capabilities-based OS UX problem. The payoff would be remarkable.

[+] mikewarot|3 months ago|reply

The thing that worries me about WASM is this exact conflict between compatibility with ACLs and security. It's like handing over your banking account authorization for every possible financial transaction, even if all you want to do is buy an ice cream cone. In the real world, capabilities based system, (aka a wallet with cash) you hand them a $5 bill, and wait for change.

[+] EGreg|3 months ago|reply

I guess you must really love Capnproto then: https://github.com/iguazio/go-capnproto2

[+] ahlCVA|3 months ago|reply

There is also a relatively modern capability-based kernel in the L4 family of microkernels, called Fiasco.OC: https://os.inf.tu-dresden.de/fiasco/overview.html

There are also a bunch of components for building a functional userspace (such as L4Re or Genode).

[+] NooneAtAll3|3 months ago|reply

what does L4 mean here?

[+] retrac|3 months ago|reply

I've written a little bit before about KeyKOS/GNOSIS, which is the capability operating system used by Tymshare to host their timesharing language services on IBM mainframes, in the 70s and 80s. From a comment 3 years ago I'll just repost the relevant part:

> KeyKOS (developed by Tymshare for their commercial computing services in the 1970s) - A capability operating system. If everything in UNIX was a file, then everything in KeyKOS was a memory page and capabilities (keys) to access those pages. The kernel has no state that isn't calculated from values in the virtual memory storage. The system snapshots the virtual memory state regularly. There are subtle consequences from this. Executing processes are effectively memory-mapped files that constantly rewrite themselves, with only the snapshots being written out. Snapshotting the virtual memory state of the system snapshots everything -- including the state of running processes. There's no need for a file system, just a means to map names to sets of pages, which is done by an ordinary process. After a crash, processes and their state are internally consistent, and continue running from their last snapshot. For those who are intrigued, there's a good introduction, written in 1979, by the system's designers available here: http://cap-lore.com/CapTheory/upenn/Gnosis/Gnosis.html (It was GNOSIS before being renamed KeyKOS.) And a later document written in the 90s aimed at UNIX users making the case: http://cap-lore.com/CapTheory/upenn/NanoKernel/NanoKernel.ht... Some work on capability systems continues, but it seems the lessons learned have largely been forgotten.

The core abstraction is simpler than the Unix process model or that of many other operating systems. Processes have keys which access virtual memory pages. All of storage including persistent secondary storage is just one big pool of virtual memory pages. These can be shared between processes. That's all that's necessary to implement things like filesystems and networking which are often thought to require special handling. A filesystem is just names and addresses of pages in storage. Give a process a capability to do shared memory with a process that maintains such a structure. I find the emphasis on minimizing process and kernel state, such that processes can be snapshot and frozen at any time and are inherently persistent, handled as the set of the relevant pages, to be genius. Though the architecture does have the classic microkernel/nanokernel performance penalties, as have been long debated.

[+] kragen|3 months ago|reply

Did they actually deploy GNOSIS at Tymshare? I hadn't heard that. I thought that the reason they spun out Key Logic was that Norm hadn't convinced Tymshare management to deploy GNOSIS.

For example, in https://conservancy.umn.edu/server/api/core/bitstreams/a39e5... p. 37, he says:

> That was their [Derwent's] idea. I thought it was very clever and we realized that we couldn’t do it with our current software but that software like that could be written. And KeyKOS was the outcome of that. Tymshare and another company, Key Logic, did not succeed in making that commercial. It would’ve been a high security system with novel features.

Later in the interview, he says Tymshare timesharing on the 370 (the IBM machine) started out on VM/CMS.

If you have conflicting information, I'd love to see it!

[+] pyrolistical|3 months ago|reply

https://en.wikipedia.org/wiki/Capability-based_security

It’s like sharing google doc link. You configure the link to be read only or read/write.

Now imagine you can create as many links as you want with all possible permission combinations. Then you have a capability based system

[+] Hexayurt|3 months ago|reply

Waterken was the same kind of logic, applied at web API scale.

https://shiftleft.com/mirrors/www.hpl.hp.com/techreports/201...

The failure of this system and the HP ESpeak system are what left the gap which the blockchain smart contract model filled.

I have complex thoughts about that.

[+] Hexayurt|3 months ago|reply

Specifically: a globally visible distributed database is a fantastic resource for managing namespaces, as demonstrated by DNS and SSL Certificate Authorities.

But when we start essentially doing _transactions_ by writing into such a database, it starts to look like buying a domain name every time you want to make a credit card payment.

There is an architectural problem here.

[+] iberator|3 months ago|reply

Intel did this is 1989 with iAPX 432. Super interesting and SUPER complex (just check out the documentation of cpu architecture), that's it failed hard.

Flat memory model always win vs Star Trek like architecture who bo one understands

[+] gnufx|3 months ago|reply

1970s-ish capability systems with support in hardware/firmware include CAP, Flex, System/38, Plessey System 250 (which a former colleague worked on) -- the last two commercial; see https://en.wikipedia.org/wiki/Capability-based_security.

I'd like to think their time has come, given vulnerabilities I see.

[+] silasdavis|3 months ago|reply

Most of the links seem to be broken on https://www.capros.org/overview.html

[+] contrarian1234|3 months ago|reply

Most of my "wtf is going on" moments on Linux have to do with permissions. I loath the industry move to even more security. I want a more Emacs-like experience. Multiuser systems have become the exception and most people have a personal computer with one user. Dealing with evil apps is a loosing battle b/c the attack surface is too large.

I think the counter argument to more security is Distro Repos. When was the last time you apt-get'ed some software and had your documents stolen?

If you add blocks then you need to somehow communicate to the use when it's failing and that's hard... You see the shitshow that is Android security where apps have mysterious access to some directories and not others and it's impossible to understand what's going on. Maybe capabilities will work better, it's unclear to me.

[+] iberator|3 months ago|reply

Just link statically compiled emacs into /sbin/init and you are done

[+] krautburglar|3 months ago|reply

Absolutely! Most of it is there to protect their moats from us, not us from “hackers”.

[+] mikewarot|3 months ago|reply

Why is it that every Capability based system seems to be a toolkit for running a single program instead of an OS ready for daily use? Is it just me?

[+] naasking|3 months ago|reply

Capability-based operating systems are sufficiently dissimilar to standard ACL operating systems that ordinary software cannot be directly ported without losing some or many of the capability advantages. Furthermore, they are typically very security focused, and so they they've spent a lot of time researching security-focused interfaces and idioms for end users, rather than just re-implementing the hodge-podge of poorly thought out user interfaces that seem to reintroduce the same security vulnerabilities again and again, eg. CSRF is just the "confused deputy" attack known since the 1980s.

I suggest reading some of their stuff [1], it's pretty interesting and accessible.

[1] The EROS Trusted Window System, https://srl.cs.jhu.edu/pubs/SRL2003-05.pdf

[+] spencerflem|3 months ago|reply

Check out Genode Sculpt for a vision of a workable desktop !

It’s capable of dynamic flows, adding and removing programs, has ports of Chromium and Virtual Box. The devs daily drive it :)

[+] wmf|3 months ago|reply

A lot of OS projects develop the kernel then run out of steam. It's especially hard for capabilities because there's no established standard like Unix/Posix to copy. Capability OSes are still a research topic.

[+] kragen|3 months ago|reply

It's just you. seL4, CheriBSD, etc., do not fit your description. Neither did KeyKOS itself. You're presumably looking at research prototypes.

59 comments