This article is just confused and wrong. Some examples:
"The Socket API, local IPC, and shared memory pretty much assume two programs"
There's no truth to this whatsoever - it's an utterly absurd statement because:
* The socket API as applied to networking is empirically designed for arbitrary numbers of peers -- there are unlimited real world examples. The socket API as applied to local IPC is of course similarly capable.
* Shared memory - mmap() is about as generic of a memory sharing interface as one could hope for. It certainly works just fine with arbitrary numbers of clients. There are again countless examples of using shared memory with arbitrary participants -- for example all the various local file databases: sqlite, leveldb, berkeley db, etc.
"We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it."
Yes, this is exactly what mmap() already does. mmap() is part of POSIX. Use it!
"Kode Vicious, known to mere mortals as George V. Neville-Neil,"
> * The socket API as applied to networking is empirically designed for arbitrary numbers of peers -- there are unlimited real world examples. The socket API as applied to local IPC is of course similarly capable.
I think "the [BSD] socket API" was designed for a thousand or so peers (FD_SETSIZE). That's why the API evolved. Now (2023) there are a lot of APIs and different choices you can make (blocking, polling readiness, sigio, aio, iocp, iouring), none of which is best at everything, but the non-POSIX apis are much faster than the POSIX api -- especially the ones that are harder to mix with the POSIX api.
> "We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it."
> Yes, this is exactly what mmap() already does. mmap() is part of POSIX. Use it!
This is not what mmap() does, but the opposite. mmap() sets up page-tables for demand-paging. Those little page-faults actually trigger a check (demand) to see if the page is "in memory" (this is what Linux calls the "block cache"), an update to the page-tables to point to where it is and returns (or kick off a read IO operation to the backing store). These page-faults add up. What the author is looking for in Linux is mremap() and Darwin is mach_vm_remap() and definitely not in POSIX.
> * Shared memory - mmap() is about as generic of a memory sharing interface as one could hope for. It certainly works just fine with arbitrary numbers of clients. There are again countless examples of using shared memory with arbitrary participants -- for example all the various local file databases: sqlite, leveldb, berkeley db, etc.
But none of those "local file databases" can handle hundreds of thousands of clients, and aren't great even for the low hundreds(!), that's why most "big" database vendors avoid "just" using mmap(), and instead have to perform complex contortions involving things like mach_vm_remap/mremap/O_DIRECT/sigaction+SIGSEGV. userfaultfd and memfd are other examples of recent evolutions in these APIs.
You are looking for truth? Evolving APIs are evidence that people are unhappy with these interfaces, and these newer APIs are better at some things than the old, demonstrating that the old APIs are not ideal.
So we have evidence our (programming) model is not ideal, are we to be like Copernicus and look for a better model (better APIs)? Or are we to emulate Tolosani and Ingoli?
Kids today: want to come along, throw away everything they don't understand, and rebuild what was already there but worse. Accomplishing negative impact.
Yeah, KV is GNN's vehicle for excessively controversial (i.e., what I might call "bad") takes.
I don't think he's wrong that the POSIX synchronous IO model isn't a great fit for modern hardware, though. He doesn't really go into it much but (Windows) NT's async-everything IOCP model really seems to be the winner, as far as generic abstractions that have stood the test of time.
I really disagree with the sockets bit. Sockets are designed for networking, with a particular focus on IP. Not IPC.
I for one think that the 'absurd thing' is that IPC is not built into OS as a core feature. That, and process isolation. Both sockets and shared memory are quite problematic and the challenge of 'true IPC' that works nicely with threads etc. is real.
Completely agree. In particular, POSIX is built around the inheritance of file descriptors by their children, which means that it is extraordinarily easy to have sockets going between multiple processes. Moreover, it's entirely possible to send file descriptors over other sockets (SCM_RIGHTS). POSIX has robust IPC. I'm currently messing around with IPC on Windows... and wow, at the end of the day, even in 2023, UNIX et al are simply more advanced than Windows. It's unfortunate there's been absolutely zero groundbreaking discoveries or inventions in this field (OS dev), but the idea that we should throw away the state of the art simply because is just silly.
POSIX IPC has withstood the test of time. There is no other system that offers as rich a set of primitives.
There are some legitimate issues here, and some ranting.
First, memory models. The author seems to be arguing for some way to talk about data independent of where it's stored. The general idea is that data is addressed not with some big integer address, but with something that looks more like a pathname. That's been tried, from Burroughs systems to LISP machines to the IBM System 38, but never really caught on.
All of those systems date from the era when disks were orders of magnitude slower than main memory, and loading, or page faulting, took milliseconds. Now that there are non-volatile devices maybe 10x slower than main memory, architectures like that may be worth looking at again. Intel tried with their Optane products, which were discontinued last year. It can certainly be done, but it does not currently sell.
The elephant in the room on this is not POSIX. It's C. C assumes that all data is represented by a unique integer, a "pointer". Trying to use C with a machine that does not support a flat memory model is all uphill.
Second, interprocess communication. Now, this is a Unix/Linux/Posix problem. Unix started out with almost no interprocess communication other than pipes, and has improved only slightly since. System V type IPC came and went. QNX type interprocess calls came and went. Mach type interprocess calls came and went. Now we have Android-type shared memory support.
Multiple threads on multiple CPUS in the same address space work fine, if you work in a language that supports it well. Go and Rust do; most other popular languages are terrible at it. They're either unsafe, or slow at locking, or both.
Partially shared memory multiprocessor are quite buildable but tough to program. The PS3's Cell worked that way. That was so hard to program that their games were a year late. The PS4 went back to a vanilla architecture. Some supercomputers use partially shared memory, but I'm not familiar with that space.
So those are the two big problems. So far, nobody has come up with a solution to them good enough to displace vanilla flat shared memory. Both require drastically different software, so there has to be a big improvement. We might see that from the machine learning community, which runs relatively simple code on huge amounts of data. But what they need looks more like a GPU-type engine with huge numbers of specialized compute units.
Related to this is the DLL problem. Originally, DLLs were just a way of storing shared code. But they turned into a kind of big object, with an API and state of their own. DLLs often ought to be in a different protection domain than the caller, but they rarely are. 32-bit x86 machines had hardware support, "call gates", for that sort of thing, but it was rarely used. Call gates and rings of protection have mostly died out.
That's sort of where we are in architecture. The only mainstream thing that's come along since big flat memory machines is the GPU.
> Some supercomputers use partially shared memory, but I'm not familiar with that space.
Some supercomputers do have some shared memory architecture, but a whole lot more use Message Passing Interface (MPI) for distributed memory architecture. Shared memory starts to make less sense when your data can be terabytes or larger in size. It is a lot more scalable to just avoid a shared memory architecture and assume a distributed memory one. It becomes easier to program assuming that each thread just does not have access to the entire data set and has to send data back and forth between threads (pass messages).
>> Multiple threads on multiple CPUS in the same address space work fine, if you work in a language that supports it well. Go and Rust do; most other popular languages are terrible at it.
I find OpenMP for C and C++ to be simple and effective. You do have to write functions that are safe to run in parallel, and Rust will help enforce that. But you can write pure function in C++ too and dropping a #pragma to use all your cores is trivial after that.
So this author is (rightfully) getting a lot of hate. But, "What if we replaced POSIX?" is an interesting question to me. Most people interact with POSIX through the "good parts." But, once you need to write C, which happens when you need to make low-level calls to the OS, it starts to get a little annoying. The biggest annoyance is related to memory management. A lot of the OS APIs have manual free functions, e.g. you call `create_some_os_struct` and then `free_some_os_struct`. Similarly, you end up writing a lot of your own call/free functions. This is because you need to write C to talk to the OS, but your other code is probably not in C and may not have access to the same libc your C uses. So, you need to provide an escape hatch back into your C code to free any allocated memory.
Another annoyance is that passing data between C and another language is hard. For instance, if you want to pass a Swift string in to C, you need to be careful that Swift doesn't free the string while C is using it. The "solution" to this is to have explicit methods in Swift that take a closure which guarantee the data stays alive for the duration of that closure. On the C side, you need to copy the data so that Swift can free the string if you need to keep it longer than that one block. Going from C to Swift is also a pain.
A cool thought is: what if the OS provided better memory management? What if it had a type of higher level primitive so that memory could be retained across languages? For instance, if I pass a string from C to Go, why do I need to copy it on the Go side? Why can I not ask the OS to retain the memory for me? Perhaps we need retain / release instead of malloc and free. Anyway, just a random thought.
The problem with this thought is that malloc/free are not OS primitives, they are strictly concepts that make sense to your own program. Languages like Swift and Go never use these calls at all, for example. When Swift "frees" a string that was still being referenced from C, it's very likely not the OS that will mess with it, but other parts of the Swift program.
The way programs actually interact with the OS for memory allocation is using sbrk() or mmap() (or VirtualAlloc() in the case of Windows) to get a larger piece of memory, and then managing themselves at the process level.
And having the OS expose a more advanced memory management subsystem is a no-go in practice because each language has its own notions of what capabilites are needed.
So POSIX is both the libraries as well as the general system design. If you want to eschew all the POSIX libraries on most *NIX systems today (at least the open source ones), you can simply ... do that. In particular, the Linux kernel (and the BSDs are similar) make no assumption as to how you're managing user memory. You can call mmap to map pages and allocate memory as you like.
In fact, languages such as Go widely disregard libc (IIUC) and just roll their own thing. They still benefit from the POSIX semantics built in to the kernels that go programs run on.
At the end of the day, the main interface between POSIX kernels and userspace is a 32-bit integer (the file descriptor).
To those bashing the author as uninformed -- this is George V. Neville-Neil. Member of FreeBSD Core Team who wrote the book on FreeBSD. He might know a thing or two about POSIX! [1]
It's a bad article because it's too vague and doesn't clearly relate to the questioner's problem, not because the author doesn't have the proper pedigree.
One could certainly write good articles about why the POSIX API is too limiting. For example: the filesystem API is awful in many ways. I'll try to be a bit more specific (despite having only a few minutes to write this):
* AFAICT, it has very few documented guarantees. It doesn't say sector writes are atomic, which would be very useful [1]. (Or even that they are linear as described in that SQLite page, but the SQLite people assume it anyway, and they're cautious folks, so that's saying a lot.) And even the ones that I think its language guarantees, like fsync guaranteeing all previously written data to that file has reached permanent storage, systems such as Linux [2] and macoS have failed to provide. [3]
* It doesn't provide a good async API. io_uring is my first real hope for this but isn't in POSIX.
* IO operations are typically uninterruptible (with NFS while using a particular mount operation as a rare exception). Among other problems, it means that a process that accesses a bad sector will get stuck until reboot!
* It doesn't have a way to plumb through properties you'd want for a distributed filesystem, such as deadlines and trace ids.
* It provides just numeric error codes, when I'd like to get much richer stuff back. Lots of stuff in distributed filesystem cases. Even in local cases, something like how where in particular path traversal failed. I actually saw once (but can't find in a very quick search attempt) a library that attempted to explain POSIX errors after the fact, by doing a bunch of additional operations after the fact to narrow it down. Besides being inherently racy, it just shouldn't be necessary. We should get good error messages by default.
It's a great article, and it raises many major issues with our current model of computing. But it's obviously triggering, and lots of people are rushing to defend their comfort zone.
Think outside the box people ... "files", what a charming but antiquated concept; "processes" and thus "IPC", how quaint!
- a windows-first programmer who sees interoperability and composition as an encumbrance that satisfies “nerds who like to do weird shit that i don’t understand in bash”
reading from stdin isn’t challenging, nor writing to stdout. if someone can’t imagine why that might be useful, then i’d argue their journey as a software engineer is either at its end, or right at its beginning.
at the cost of potentially sounding inflammatory, “get good”.
> reading from stdin isn’t challenging, nor writing to stdout
Every time I dabble in C, I need to look up what method I need to use these days. getline? scanf? Do I need to allocate a buffer? What about freeing it, is it safe to do so from another thread? What about Unicode support, can I just use a char array or do I need a string library for proper support? What's a wchar_t again and why is it listed in this example I found online? How do I use strtok to parse a string again?
Sure, these things become trivial with experience, but they're not easy. Other languages make them easier so we know it can be done, yet the POSIX APIs insist on using the more difficult version of everything for the sake of compatibility and programmer choice.
(Modern) C++ makes the entire process easier but there are still archaic leftovers you need to deal with on nix if you want to interact with APIs outside what the C++ standard provides. At that point, you're back to POSIX APIs and nix magic file paths with *nix IOCTLs. Gone are your exceptions, your unique_ptrs, and your std::string, back are errno and pointers.
Obviously an exaggeration, but when's the last time you checked the return value of printf? I know I don't. And that's not even a memory safety bug, just basic logic. I hope nobody trusts those guys around malloc and free :)
All of which is perfectly compatible with his being incompetent, or wrong about this in particular. (For what it's worth, I don't think he is incompetent.) But what he easily demonstrably isn't is "Windows-first", and I suggest that any mental process that led you to that conclusion needs reexamining.
I'm confused as to what exactly is wrong with the notion of _jobs_ and _files_? Scheduling is hard, but modern operating systems are definitely setup to do it. I think we could probably do a better job of using realtime features and maintaining update latency benchmarks, but so many of the cycles on my PCs/mobiles are wasted doing god damned animations and updating the screen without interaction that I don't think this is really the main issue.
Programming is basically always just a matter of loading data, transforming data, and then putting it somewhere. The simplest record keeping systems do that, and the fanciest search algorithms do that. Decode/encode/repeat.
EDIT: The beauty of UNIX to me is the interoperability of the text stream. Small components working together. Darwinesque survival of the fittest command.
The beauty of UNIX to me is the interoperability of the text stream
What interoperability? Look at the man page for any simple Unix utility (such as `ls`), and count up how many of the listed command line flags are there only to structure the text stream for some other program. "Plain text" is just as interoperable as plain binary. "Just use plain text" is the Original Sin of Unix.
The Unix Philosophy, as stated by Peter Salus [1] is
1. Write programs that do one thing and do it well
2. Write programs that work together
3. Write programs to handle text streams because text is a universal interface.
The problem is that, in practice, you can only pick two of those. If you want to write programs that work together, and do so using plain text, then, in addition to doing its ostensible task, each program is going to have to provide a facility to format its text for other programs, and have a parser to read the input that other programs provide, contradicting the dictum to "do one thing and do it well".
If you want programs that do one thing and do it well, and programs that work together, then you have to abandon "plain text", and enforce some kind of common data format that programs are required to read and output. It might be JSON. Or it might be some kind of binary format (like what PowerShell uses). But there has to be some kind of structure that allows programs to interchange data without each program having to deal with the M x N problem of having to deal with every other program's idiosyncratic "plain text" output format.
Maybe it was written as a parody and we just don't get the joke? (Not POSIX, the article. POSIX survives fine outside of UNIX in embedded systems and somewhat on Mac OS and Windows.)
The idea that your computer would be faster or easier to use if it didn't have any animations seems untrue. Providing physicality is good! Helps you understand how things are changing between two different states.
Plan 9 enhances all of these good Unix traits. Even the universal text stream: it adds support for arrays / lists, that is, streams with elements larger that one byte.
It is hard to believe that this bunch of drivel was actually available from acm.org. If a 16 year old programmer came to me with this nonsense, I might take the time to gently point them in a few directions. Being on the acm.org site ... unforgivable.
And not even the obvious faults, pointed out by others here. There's the question of an apparent complete ignorance of OS research, for example systems that rely on h/w memory protection and so can put all tasks (and threads) into a single address space. But what's the actual take home from such research? The take home is that these ideas have, generally speaking, not succeeded, and that to whatever extent they do get adopted, it is incremental and often very partial. If you don't understand why most computing devices today do not run kernels or applications that look anything like the design dreams of 1990-2010 (to pick an arbitrary period, but a useful one), then you really don't understand enough about computers to even write a useless article like this one.
That's a pretty bizarre answer to what was a pretty reasonable question. I don't even see how the question and answer are related honestly. Surely the question is more about finding a better storage format for the initial ingestion or data storage and has little to nothing to do with POSIX.
Most languages have a way to slurp a file into memory in a single function call, after all. The fact that files exist shouldn't be a barrier here.
They’re using python and C/C++ to speed up the slow bits. Can’t imagine anything more portable than that without even having to know what posix is doing under the library abstractions.
Software is so huge that it would take you a lifetime of programming from different perspectives to get a grip on what it really is. So we are all doomed to experience POSIX through whatever programming experience we end up getting deep in.
I feel the underlying problem most software is that it's just too damn complex, so you can't fit enough of it in your head to design it how you think it should go. An average person can't go "Oh, I think the kernel should be able to do this" and then go whip it up and having an experiment running in a little bit. That's an esoteric corner full of tons of specialized and arcane knowledge that, truth be told, is completely invented. And half of that invention is workarounds for other bad inventions.
I dunno enough to just pronounce doom on POSIX, but I do feel like the rickety C way of doing things (everything centered around intricately-compiled machine code executables, incredibly dainty fragile eggshells that shatter and spill their entire guts of complexity on the world) underpins a ton of the problem.
The number of years you would need to just read, let alone grok all the hundreds of millions of lines of code that run on our system is just beyond human lifetimes now.
Well that was far less controversial than the comments here suggested.
I read a call for innovation, the general thrust of which is based on the (obvious?) argument that if you build every new system to be compatible with the old one you limit the capabilities of the new.
Of course, the customer requesting that compatibility gets what they want -- an easier time building/porting software.
Alternate reading: ~POSIX (or just having a standard) is so useful that everyone wants it everywhere all of the time and I want something better.
Yay - I too want something better. It will be initially difficult, completely incompatible. But I hope that, someday, only computer historians will discusses "files" and "ports".
> there could be better primitives for communication than a byte stream. Why can't I just send another process a chunk of memory?
shm APIs have made this possible for decades.
> the posix byte stream API is bizarrely based on a flat buffer instead of a ring buffer, which is just the obviously wrong data structure IMO.
so obviously wrong that the terabytes of existing POSIX-compatible code is broken, or limited, or something?
> Too much of the threading API is preempted threads, when cooperative scheduling (aka yielding) is so much easier to implement and understand.
if you want/need co-routines use them, and leave threads for the domains of programming where preemption is a critical part of the model.
ps. this sounds a bit more personally critical than I intend. I'm only trying to point out flaws that I see with these 3 points, not trying to suggest anything about you as the person who made them.
> - there could be better primitives for communication than a byte stream. Why can't I just send another process a chunk of memory?
Isn't this just
write(fd, buffer, size_of_buffer)?
I'm not sure why everyone talks about byte streams being the basis here. If fd is a socket in SOCK_DGRAM mode, then buffer is received wholesale, never split. Bytestreams are not the fundamental abstraction. Files and sockets are.
> - the posix byte stream API is bizarrely based on a flat buffer instead of a ring buffer, which is just the obviously wrong data structure IMO.
Once again, this depends on the mode of the socket. you can put a Unix socket into a mode where it starts dropping packets when the queue is full.
> - Too much of the threading API is preempted threads, when cooperative scheduling (aka yielding) is so much easier to implement and understand.
Cooperative scheduling is part of POSIX. setcontext, makecontext, getcontext, and swapcontext.
It's time for these people to get off their high elephant and write or fund a substitute, instead of doing nothing and whinging constantly just for the sake of gathering imaginary internet points.
The author of this piece is a long time FreeBSD contributor[1]. Just maybe that person has relevant experience guiding their thoughts? He certainly is not "doing nothing and whinging constantly". Posix is not a religion - it's ok to look at other ideas, to listen to critiques. I mean Linux does it all the time - io_uring, all the ebpf stuff, a dozen forms of kernel bypass suggest to me that the posix interfaces aren't sufficient, or right for modern hardware, at least not in all cases.
On the other hand, I see quite a few projects challenging The Way Things Are Done (Rust, NeoVim, Fish, OilShell, etc) to which there is a lot of kicking and screaming that things are fine. No need to change.
You don't have to use just the POSIX-based APIs. There are io_uring, xdp, bpf, and more provided by Linux, for instance, exposing many more capabilities to userspace beyond POSIX-compliant APIs. You don't have to use sockets to do networking, you can get asynchronous local disk IO. What is the author waiting for, if it's the POSIX standard being too restrictive that's the root of all of their problems?
Kind of meta, but wow, "get the elephant off our neck" might be the first example of mixing three metaphors together. The only mention of "neck" that I can find is in the title, and the article only mentions getting it off our "backs, which is mixed with the metaphor of "elephant in the room". The only explanation I can think of for why "neck" would be used is the "albatross around your neck" metaphor.
Based on the other responses here, it sounds like the author should keep in mind that you can't get your elephant off your neck and eat it too, but maybe I shouldn't open that can of worms lest I attract some early birds.
Funny that the author calls out Windows as being separate somehow. Hasn't the last 20 years of Windows been synonymous with adding more Posix-compatibility?
> We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it. APIs for managing threads and access to shared memory should be re-thought with defaults created for many-core systems, and new schedulers have to be built to handle the fact that memory is not all one thing. Modern systems have fast caches near CPU, main memory, and flash memory. Soon we'll have even more memory, disaggregated, which is faster than disk but slower than main RAM.
Isn't this what Intel was trying to do with Optane? What happened with that? It seems like a great idea.
People love parroting the meme of "Unix is old therefore it must be bad, we need something newer and modern, redesigned from the ground up" but it's always hard to get a concrete proposal from these folks. This article does a lot of hand waving about what's wrong with Unix and what would be better.
For example, the part about how it would be nice to assume data exists in main memory, instead of accessing it through the filesystem? That's existed for nearly 40 years, it's called mmap, and every modern operating system implements it. You can trivially write a wrapper that automatically mmaps opened files if you want, but creating a MAP_FILE memory mapping is one of the easiest things you can do in C.
Likewise the stuff about how IPC, sockets, and shared memory assume that a resource is only going to be shared between two programs. The whole statement is weird because all of those things can be used by more than two programs, and it's not clear what the author means by saying they're designed for a single receiver/sender. The Berkeley sockets API is admittedly a bit intimidating at first but that's because it covers a lot of functionality that people actually need. It can handle UDP sockets, TCP sockets, but also other protocols and more exotic things like netlink sockets, sctp sockets, and can be extended to things like John Ousterhout's proposal about replacing TCP, as evidenced by the fact that John Ousterhout actually implemented Homa on Linux. Using shared memory on Unix isn't exactly simple but that's because it's doing something complicated. When multiple processes want to share memory there needs to be some coordination to establish what needs to be mapped, to grant privileges, etc. The SysV shared memory APIs suck but on Linux things are much better now with the memfd system calls. And the vast majority of people who need things like sockets or shared memory are going to be writing code in a higher level language anyway, using libraries that abstract the low level details. The article implies that it's a problem that people need to use these abstractions in the first place, but it seems asinine to me when there isn't an actual proposal of how things would be better, and the hand waving completely ignores things like the fact that realistically low-level C APIs are needed to have any hope of writing interfaces that can be accessed by multiple languages.
Furthermore the design of Unix allows new system calls to be added. If you have an idea for something that is way better than sockets you're free to implement it and add new system calls for it. This is exactly why there are many ways to do the same thing on Linux (e.g. establish shared memory regions), because people come up with new and better ideas and the ones that are actually performant, useful, and secure are the ones that get merged and added to Linux. Maybe there really are some brilliant ideas for new ways of doing things that need a ground-up redesign and can't be bolted onto Unix, but if that's the case people should be able to clearly elucidate what those ideas are and WHY they won't work with Unix.
One final thing I'll say about Unix is that for all its warts, it's very fast. This is a point that's been made a million times before, e.g. in "Worse Is Better" and "The UNIX Hater's Handbook". There have been lots of interesting alternatives to Unix developed, but the reason none of them have taken traction is that at the end of the day no one wants to use something that is newer, less well known, less well tested, and significantly slower. Plan9 and Fuchsia are cool, but they are much slower than Linux and it's not clear if it's really possible to fix them. Users want their computer to run quickly, they want their phone to run quickly, and they want good battery life. Big companies want their applications to run quickly and spend as little on hardware as possible. All of these things mean that pretty much any Unix alternative is a non-starter unless it is at least close to the performance of existing Unix implementations, including Linux and macOS I suppose. A big explanation of the success of Unix has been that it has co-evolved alongside modern hardware. It's hard to do something that is radically different and takes advantage of modern hardware when you're writing a general purpose operating system (there are absolutely exceptions though for operating systems serving niche use cases).
Weird how Windows is mentioned as one of the two programming models and then discarded. Obviously people have long-standing objections to commercial operating systems and Microsoft in particular, but it would be interesting to hear whether the many non-POSIX-y parts of Windows provide a better or worse model.
[+] [-] throwaway09223|3 years ago|reply
"The Socket API, local IPC, and shared memory pretty much assume two programs"
There's no truth to this whatsoever - it's an utterly absurd statement because:
* The socket API as applied to networking is empirically designed for arbitrary numbers of peers -- there are unlimited real world examples. The socket API as applied to local IPC is of course similarly capable.
* Shared memory - mmap() is about as generic of a memory sharing interface as one could hope for. It certainly works just fine with arbitrary numbers of clients. There are again countless examples of using shared memory with arbitrary participants -- for example all the various local file databases: sqlite, leveldb, berkeley db, etc.
"We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it."
Yes, this is exactly what mmap() already does. mmap() is part of POSIX. Use it!
"Kode Vicious, known to mere mortals as George V. Neville-Neil,"
Oh, ok.
[+] [-] geocar|3 years ago|reply
I think "the [BSD] socket API" was designed for a thousand or so peers (FD_SETSIZE). That's why the API evolved. Now (2023) there are a lot of APIs and different choices you can make (blocking, polling readiness, sigio, aio, iocp, iouring), none of which is best at everything, but the non-POSIX apis are much faster than the POSIX api -- especially the ones that are harder to mix with the POSIX api.
> "We should be able to assume that the data we want exists in main memory without having to keep telling the system to load more of it."
> Yes, this is exactly what mmap() already does. mmap() is part of POSIX. Use it!
This is not what mmap() does, but the opposite. mmap() sets up page-tables for demand-paging. Those little page-faults actually trigger a check (demand) to see if the page is "in memory" (this is what Linux calls the "block cache"), an update to the page-tables to point to where it is and returns (or kick off a read IO operation to the backing store). These page-faults add up. What the author is looking for in Linux is mremap() and Darwin is mach_vm_remap() and definitely not in POSIX.
> * Shared memory - mmap() is about as generic of a memory sharing interface as one could hope for. It certainly works just fine with arbitrary numbers of clients. There are again countless examples of using shared memory with arbitrary participants -- for example all the various local file databases: sqlite, leveldb, berkeley db, etc.
But none of those "local file databases" can handle hundreds of thousands of clients, and aren't great even for the low hundreds(!), that's why most "big" database vendors avoid "just" using mmap(), and instead have to perform complex contortions involving things like mach_vm_remap/mremap/O_DIRECT/sigaction+SIGSEGV. userfaultfd and memfd are other examples of recent evolutions in these APIs.
You are looking for truth? Evolving APIs are evidence that people are unhappy with these interfaces, and these newer APIs are better at some things than the old, demonstrating that the old APIs are not ideal.
So we have evidence our (programming) model is not ideal, are we to be like Copernicus and look for a better model (better APIs)? Or are we to emulate Tolosani and Ingoli?
[+] [-] cvccvroomvroom|3 years ago|reply
[+] [-] loeg|3 years ago|reply
I don't think he's wrong that the POSIX synchronous IO model isn't a great fit for modern hardware, though. He doesn't really go into it much but (Windows) NT's async-everything IOCP model really seems to be the winner, as far as generic abstractions that have stood the test of time.
[+] [-] jasmer|3 years ago|reply
I for one think that the 'absurd thing' is that IPC is not built into OS as a core feature. That, and process isolation. Both sockets and shared memory are quite problematic and the challenge of 'true IPC' that works nicely with threads etc. is real.
[+] [-] anon291|3 years ago|reply
POSIX IPC has withstood the test of time. There is no other system that offers as rich a set of primitives.
[+] [-] Animats|3 years ago|reply
First, memory models. The author seems to be arguing for some way to talk about data independent of where it's stored. The general idea is that data is addressed not with some big integer address, but with something that looks more like a pathname. That's been tried, from Burroughs systems to LISP machines to the IBM System 38, but never really caught on.
All of those systems date from the era when disks were orders of magnitude slower than main memory, and loading, or page faulting, took milliseconds. Now that there are non-volatile devices maybe 10x slower than main memory, architectures like that may be worth looking at again. Intel tried with their Optane products, which were discontinued last year. It can certainly be done, but it does not currently sell.
The elephant in the room on this is not POSIX. It's C. C assumes that all data is represented by a unique integer, a "pointer". Trying to use C with a machine that does not support a flat memory model is all uphill.
Second, interprocess communication. Now, this is a Unix/Linux/Posix problem. Unix started out with almost no interprocess communication other than pipes, and has improved only slightly since. System V type IPC came and went. QNX type interprocess calls came and went. Mach type interprocess calls came and went. Now we have Android-type shared memory support.
Multiple threads on multiple CPUS in the same address space work fine, if you work in a language that supports it well. Go and Rust do; most other popular languages are terrible at it. They're either unsafe, or slow at locking, or both.
Partially shared memory multiprocessor are quite buildable but tough to program. The PS3's Cell worked that way. That was so hard to program that their games were a year late. The PS4 went back to a vanilla architecture. Some supercomputers use partially shared memory, but I'm not familiar with that space.
So those are the two big problems. So far, nobody has come up with a solution to them good enough to displace vanilla flat shared memory. Both require drastically different software, so there has to be a big improvement. We might see that from the machine learning community, which runs relatively simple code on huge amounts of data. But what they need looks more like a GPU-type engine with huge numbers of specialized compute units.
Related to this is the DLL problem. Originally, DLLs were just a way of storing shared code. But they turned into a kind of big object, with an API and state of their own. DLLs often ought to be in a different protection domain than the caller, but they rarely are. 32-bit x86 machines had hardware support, "call gates", for that sort of thing, but it was rarely used. Call gates and rings of protection have mostly died out.
That's sort of where we are in architecture. The only mainstream thing that's come along since big flat memory machines is the GPU.
[+] [-] ayende|3 years ago|reply
[+] [-] atrettel|3 years ago|reply
Some supercomputers do have some shared memory architecture, but a whole lot more use Message Passing Interface (MPI) for distributed memory architecture. Shared memory starts to make less sense when your data can be terabytes or larger in size. It is a lot more scalable to just avoid a shared memory architecture and assume a distributed memory one. It becomes easier to program assuming that each thread just does not have access to the entire data set and has to send data back and forth between threads (pass messages).
[+] [-] pinewurst|3 years ago|reply
[+] [-] phkahler|3 years ago|reply
I find OpenMP for C and C++ to be simple and effective. You do have to write functions that are safe to run in parallel, and Rust will help enforce that. But you can write pure function in C++ too and dropping a #pragma to use all your cores is trivial after that.
[+] [-] rcme|3 years ago|reply
Another annoyance is that passing data between C and another language is hard. For instance, if you want to pass a Swift string in to C, you need to be careful that Swift doesn't free the string while C is using it. The "solution" to this is to have explicit methods in Swift that take a closure which guarantee the data stays alive for the duration of that closure. On the C side, you need to copy the data so that Swift can free the string if you need to keep it longer than that one block. Going from C to Swift is also a pain.
A cool thought is: what if the OS provided better memory management? What if it had a type of higher level primitive so that memory could be retained across languages? For instance, if I pass a string from C to Go, why do I need to copy it on the Go side? Why can I not ask the OS to retain the memory for me? Perhaps we need retain / release instead of malloc and free. Anyway, just a random thought.
[+] [-] tsimionescu|3 years ago|reply
The way programs actually interact with the OS for memory allocation is using sbrk() or mmap() (or VirtualAlloc() in the case of Windows) to get a larger piece of memory, and then managing themselves at the process level.
And having the OS expose a more advanced memory management subsystem is a no-go in practice because each language has its own notions of what capabilites are needed.
[+] [-] anon291|3 years ago|reply
In fact, languages such as Go widely disregard libc (IIUC) and just roll their own thing. They still benefit from the POSIX semantics built in to the kernels that go programs run on.
At the end of the day, the main interface between POSIX kernels and userspace is a 32-bit integer (the file descriptor).
[+] [-] justin66|3 years ago|reply
Precisely what about that seems right to you?
[+] [-] jsjohns2|3 years ago|reply
[1] https://www.amazon.com/Design-Implementation-FreeBSD-Operati...
[+] [-] scottlamb|3 years ago|reply
One could certainly write good articles about why the POSIX API is too limiting. For example: the filesystem API is awful in many ways. I'll try to be a bit more specific (despite having only a few minutes to write this):
* AFAICT, it has very few documented guarantees. It doesn't say sector writes are atomic, which would be very useful [1]. (Or even that they are linear as described in that SQLite page, but the SQLite people assume it anyway, and they're cautious folks, so that's saying a lot.) And even the ones that I think its language guarantees, like fsync guaranteeing all previously written data to that file has reached permanent storage, systems such as Linux [2] and macoS have failed to provide. [3]
* It doesn't provide a good async API. io_uring is my first real hope for this but isn't in POSIX.
* IO operations are typically uninterruptible (with NFS while using a particular mount operation as a rare exception). Among other problems, it means that a process that accesses a bad sector will get stuck until reboot!
* It doesn't have a way to plumb through properties you'd want for a distributed filesystem, such as deadlines and trace ids.
* It provides just numeric error codes, when I'd like to get much richer stuff back. Lots of stuff in distributed filesystem cases. Even in local cases, something like how where in particular path traversal failed. I actually saw once (but can't find in a very quick search attempt) a library that attempted to explain POSIX errors after the fact, by doing a bunch of additional operations after the fact to narrow it down. Besides being inherently racy, it just shouldn't be necessary. We should get good error messages by default.
[1] https://www.sqlite.org/atomiccommit.html
[2] https://wiki.postgresql.org/wiki/Fsync_Errors
[3] https://developer.apple.com/library/archive/documentation/Sy...
[+] [-] drpixie|3 years ago|reply
Think outside the box people ... "files", what a charming but antiquated concept; "processes" and thus "IPC", how quaint!
[+] [-] jeroenhd|3 years ago|reply
[+] [-] foxhill|3 years ago|reply
reading from stdin isn’t challenging, nor writing to stdout. if someone can’t imagine why that might be useful, then i’d argue their journey as a software engineer is either at its end, or right at its beginning.
at the cost of potentially sounding inflammatory, “get good”.
[+] [-] jeroenhd|3 years ago|reply
Every time I dabble in C, I need to look up what method I need to use these days. getline? scanf? Do I need to allocate a buffer? What about freeing it, is it safe to do so from another thread? What about Unicode support, can I just use a char array or do I need a string library for proper support? What's a wchar_t again and why is it listed in this example I found online? How do I use strtok to parse a string again?
Sure, these things become trivial with experience, but they're not easy. Other languages make them easier so we know it can be done, yet the POSIX APIs insist on using the more difficult version of everything for the sake of compatibility and programmer choice.
(Modern) C++ makes the entire process easier but there are still archaic leftovers you need to deal with on nix if you want to interact with APIs outside what the C++ standard provides. At that point, you're back to POSIX APIs and nix magic file paths with *nix IOCTLs. Gone are your exceptions, your unique_ptrs, and your std::string, back are errno and pointers.
[+] [-] ddulaney|3 years ago|reply
Unless you're Kernighan and Ritchie, who semi-famously wrote a buggy hello world program and used it to educate a generation of C programmers: https://blog.sunfishcode.online/bugs-in-hello-world/
Obviously an exaggeration, but when's the last time you checked the return value of printf? I know I don't. And that's not even a memory safety bug, just basic logic. I hope nobody trusts those guys around malloc and free :)
[+] [-] gjm11|3 years ago|reply
He cowrote a book about the innards of FreeBSD: https://www.oreilly.com/library/view/design-and-implementati... (ignore the first paragraph of the description, which is presumably a copy-and-paste error).
He has been on the FreeBSD Board of Directors: https://freebsdfoundation.org/blog/george-neville-neil-joins...
He's presented a bunch of papers at FreeBSD conferences: https://papers.freebsd.org/author/george-neville-neil/
All of which is perfectly compatible with his being incompetent, or wrong about this in particular. (For what it's worth, I don't think he is incompetent.) But what he easily demonstrably isn't is "Windows-first", and I suggest that any mental process that led you to that conclusion needs reexamining.
[+] [-] nixpulvis|3 years ago|reply
Programming is basically always just a matter of loading data, transforming data, and then putting it somewhere. The simplest record keeping systems do that, and the fanciest search algorithms do that. Decode/encode/repeat.
EDIT: The beauty of UNIX to me is the interoperability of the text stream. Small components working together. Darwinesque survival of the fittest command.
[+] [-] quanticle|3 years ago|reply
The Unix Philosophy, as stated by Peter Salus [1] is
1. Write programs that do one thing and do it well
2. Write programs that work together
3. Write programs to handle text streams because text is a universal interface.
The problem is that, in practice, you can only pick two of those. If you want to write programs that work together, and do so using plain text, then, in addition to doing its ostensible task, each program is going to have to provide a facility to format its text for other programs, and have a parser to read the input that other programs provide, contradicting the dictum to "do one thing and do it well".
If you want programs that do one thing and do it well, and programs that work together, then you have to abandon "plain text", and enforce some kind of common data format that programs are required to read and output. It might be JSON. Or it might be some kind of binary format (like what PowerShell uses). But there has to be some kind of structure that allows programs to interchange data without each program having to deal with the M x N problem of having to deal with every other program's idiosyncratic "plain text" output format.
[1]: https://en.wikipedia.org/wiki/Unix_philosophy
[+] [-] cvccvroomvroom|3 years ago|reply
[+] [-] astrange|3 years ago|reply
[+] [-] nine_k|3 years ago|reply
[+] [-] PaulDavisThe1st|3 years ago|reply
And not even the obvious faults, pointed out by others here. There's the question of an apparent complete ignorance of OS research, for example systems that rely on h/w memory protection and so can put all tasks (and threads) into a single address space. But what's the actual take home from such research? The take home is that these ideas have, generally speaking, not succeeded, and that to whatever extent they do get adopted, it is incremental and often very partial. If you don't understand why most computing devices today do not run kernels or applications that look anything like the design dreams of 1990-2010 (to pick an arbitrary period, but a useful one), then you really don't understand enough about computers to even write a useless article like this one.
[+] [-] Blackthorn|3 years ago|reply
Most languages have a way to slurp a file into memory in a single function call, after all. The fact that files exist shouldn't be a barrier here.
[+] [-] UncleEntity|3 years ago|reply
[+] [-] thrownawaydad|3 years ago|reply
https://wiki.lesswrong.com/wiki/Chesterton%27s_Fence
[+] [-] drpixie|3 years ago|reply
[+] [-] titzer|3 years ago|reply
Software is so huge that it would take you a lifetime of programming from different perspectives to get a grip on what it really is. So we are all doomed to experience POSIX through whatever programming experience we end up getting deep in.
I feel the underlying problem most software is that it's just too damn complex, so you can't fit enough of it in your head to design it how you think it should go. An average person can't go "Oh, I think the kernel should be able to do this" and then go whip it up and having an experiment running in a little bit. That's an esoteric corner full of tons of specialized and arcane knowledge that, truth be told, is completely invented. And half of that invention is workarounds for other bad inventions.
I dunno enough to just pronounce doom on POSIX, but I do feel like the rickety C way of doing things (everything centered around intricately-compiled machine code executables, incredibly dainty fragile eggshells that shatter and spill their entire guts of complexity on the world) underpins a ton of the problem.
The number of years you would need to just read, let alone grok all the hundreds of millions of lines of code that run on our system is just beyond human lifetimes now.
[+] [-] qwery|3 years ago|reply
I read a call for innovation, the general thrust of which is based on the (obvious?) argument that if you build every new system to be compatible with the old one you limit the capabilities of the new.
Of course, the customer requesting that compatibility gets what they want -- an easier time building/porting software.
Alternate reading: ~POSIX (or just having a standard) is so useful that everyone wants it everywhere all of the time and I want something better.
[+] [-] drpixie|3 years ago|reply
[+] [-] vitiral|3 years ago|reply
- there could be better primitives for communication than a byte stream. Why can't I just send another process a chunk of memory?
- the posix byte stream API is bizarrely based on a flat buffer instead of a ring buffer, which is just the obviously wrong data structure IMO.
- Too much of the threading API is preempted threads, when cooperative scheduling (aka yielding) is so much easier to implement and understand.
The author doesn't really bring up concrete alternatives like the above though, so it's hard to know what they are railing against
[+] [-] PaulDavisThe1st|3 years ago|reply
shm APIs have made this possible for decades.
> the posix byte stream API is bizarrely based on a flat buffer instead of a ring buffer, which is just the obviously wrong data structure IMO.
so obviously wrong that the terabytes of existing POSIX-compatible code is broken, or limited, or something?
> Too much of the threading API is preempted threads, when cooperative scheduling (aka yielding) is so much easier to implement and understand.
if you want/need co-routines use them, and leave threads for the domains of programming where preemption is a critical part of the model.
ps. this sounds a bit more personally critical than I intend. I'm only trying to point out flaws that I see with these 3 points, not trying to suggest anything about you as the person who made them.
[+] [-] anon291|3 years ago|reply
Isn't this just
write(fd, buffer, size_of_buffer)?
I'm not sure why everyone talks about byte streams being the basis here. If fd is a socket in SOCK_DGRAM mode, then buffer is received wholesale, never split. Bytestreams are not the fundamental abstraction. Files and sockets are.
> - the posix byte stream API is bizarrely based on a flat buffer instead of a ring buffer, which is just the obviously wrong data structure IMO.
Once again, this depends on the mode of the socket. you can put a Unix socket into a mode where it starts dropping packets when the queue is full.
> - Too much of the threading API is preempted threads, when cooperative scheduling (aka yielding) is so much easier to implement and understand.
Cooperative scheduling is part of POSIX. setcontext, makecontext, getcontext, and swapcontext.
[+] [-] loeg|3 years ago|reply
[+] [-] nineteen999|3 years ago|reply
[+] [-] sophacles|3 years ago|reply
[1] Here's his first commit: https://cgit.freebsd.org/src/commit/?id=026e67b69b612f90360a... with the most recent happening within the last year.
edit: fixed link
[+] [-] 0cf8612b2e1e|3 years ago|reply
[+] [-] yuliyp|3 years ago|reply
[+] [-] saghm|3 years ago|reply
Based on the other responses here, it sounds like the author should keep in mind that you can't get your elephant off your neck and eat it too, but maybe I shouldn't open that can of worms lest I attract some early birds.
[+] [-] ghelmer|3 years ago|reply
[+] [-] myself248|3 years ago|reply
[+] [-] Eisenstein|3 years ago|reply
Isn't this what Intel was trying to do with Optane? What happened with that? It seems like a great idea.
[+] [-] eklitzke|3 years ago|reply
For example, the part about how it would be nice to assume data exists in main memory, instead of accessing it through the filesystem? That's existed for nearly 40 years, it's called mmap, and every modern operating system implements it. You can trivially write a wrapper that automatically mmaps opened files if you want, but creating a MAP_FILE memory mapping is one of the easiest things you can do in C.
Likewise the stuff about how IPC, sockets, and shared memory assume that a resource is only going to be shared between two programs. The whole statement is weird because all of those things can be used by more than two programs, and it's not clear what the author means by saying they're designed for a single receiver/sender. The Berkeley sockets API is admittedly a bit intimidating at first but that's because it covers a lot of functionality that people actually need. It can handle UDP sockets, TCP sockets, but also other protocols and more exotic things like netlink sockets, sctp sockets, and can be extended to things like John Ousterhout's proposal about replacing TCP, as evidenced by the fact that John Ousterhout actually implemented Homa on Linux. Using shared memory on Unix isn't exactly simple but that's because it's doing something complicated. When multiple processes want to share memory there needs to be some coordination to establish what needs to be mapped, to grant privileges, etc. The SysV shared memory APIs suck but on Linux things are much better now with the memfd system calls. And the vast majority of people who need things like sockets or shared memory are going to be writing code in a higher level language anyway, using libraries that abstract the low level details. The article implies that it's a problem that people need to use these abstractions in the first place, but it seems asinine to me when there isn't an actual proposal of how things would be better, and the hand waving completely ignores things like the fact that realistically low-level C APIs are needed to have any hope of writing interfaces that can be accessed by multiple languages.
Furthermore the design of Unix allows new system calls to be added. If you have an idea for something that is way better than sockets you're free to implement it and add new system calls for it. This is exactly why there are many ways to do the same thing on Linux (e.g. establish shared memory regions), because people come up with new and better ideas and the ones that are actually performant, useful, and secure are the ones that get merged and added to Linux. Maybe there really are some brilliant ideas for new ways of doing things that need a ground-up redesign and can't be bolted onto Unix, but if that's the case people should be able to clearly elucidate what those ideas are and WHY they won't work with Unix.
One final thing I'll say about Unix is that for all its warts, it's very fast. This is a point that's been made a million times before, e.g. in "Worse Is Better" and "The UNIX Hater's Handbook". There have been lots of interesting alternatives to Unix developed, but the reason none of them have taken traction is that at the end of the day no one wants to use something that is newer, less well known, less well tested, and significantly slower. Plan9 and Fuchsia are cool, but they are much slower than Linux and it's not clear if it's really possible to fix them. Users want their computer to run quickly, they want their phone to run quickly, and they want good battery life. Big companies want their applications to run quickly and spend as little on hardware as possible. All of these things mean that pretty much any Unix alternative is a non-starter unless it is at least close to the performance of existing Unix implementations, including Linux and macOS I suppose. A big explanation of the success of Unix has been that it has co-evolved alongside modern hardware. It's hard to do something that is radically different and takes advantage of modern hardware when you're writing a general purpose operating system (there are absolutely exceptions though for operating systems serving niche use cases).
[+] [-] rm445|3 years ago|reply