top | item 16796531

Rationale: Or why am I bothering to rewrite nanomsg?

113 points| aleksi | 8 years ago |nanomsg.github.io | reply

58 comments

order
[+] elevation|8 years ago|reply
From the article:

> nanomsg has dozens of state machines, many of which feed into others, such that tracking flow through the state machines is incredibly painful.

> Worse, these state machines are designed to be run from a single worker thread.

Despite the negative tone, the author gives me the impression that nanomsg as a simple, consistent architecture that just needs to be documented better or perhaps refactored.

State machines are useful for precisely the reasons the author states: their behavior and performance are easy to reason about and state machines can enable concurrent processing of multiple tasks when when you're limited to a single execution thread.

The simplicity of state machines makes them useful for secure code or embedded processes where debug visibility can be poor. Embedded environments like microprocessors also benefit from the single-thread concurrency, but this can also be handy on more capable OSs if you want to cut the latency of fork() or pthread_create().

State machines are a really useful tool; there are worse complaints you could make about a code base.

[+] im_down_w_otp|8 years ago|reply
I came to the same conclusion more or less.

Does Go provide meaningful abstractions for writing and managing state machines? Using gen_fsm or gen_statem in Erlang is a critical tool in writing software in a way which _must_ follow some known protocol correctly. Likewise using Session-Types in Rust to go one better than what I get in Erlang (i.e. compile-time assurances vs. runtime ones).

So I was left thinking that Go may be missing meaningful abstractions or facilities for state machine modelling? Or, perhaps more of nanomsg needs to adhere even further to using state machines to define and operate its internal machinery (e.g. the `inproc` race-i-ness mentioned)?

In either case I was confused as to how the conclusion was that state machines were making everything more confusing and less deterministic. Because the point of them is the opposite of that. However, in languages or tools which have poor support for working with them I'm sure ad-hoc interaction with them can be obfuscating and confusing.

[+] blattimwind|8 years ago|reply
I have to admit, after all these years, that I take everything coming from that general direction with a huge grain of salt. Crossroads I/O was supposed to be the great zmq successor and failed entirely, nanomsg was supposed to be an even better redesign of zmq and failed and now nanomsg-ng is supposed to be an even better design iteration on nanomsg.

Meanwhile the old/bad/bloated/poorly designed zmq just kept working fine all these years and even got a bunch of useful new features along the way.

[+] tel|8 years ago|reply
Because ZMQ is vastly better managed from a human POV. The energy required to make a project like this succeed surpasses the individual.
[+] oneweekwonder|8 years ago|reply
> Meanwhile the old/bad/bloated/poorly designed zmq

Hah, zmq evolved because the developers felt amqp became to bloated.

[+] nitwit005|8 years ago|reply
> Sadly, this initial effort, while it worked, scaled incredibly poorly — even so-called "modern" operating systems like macOS 10.12 and Windows 8.1 simply melted or failed entirely when creating any non-trivial number of threads. (To me, creating 100 threads should be a no-brainer, especially if one limits the stack size appropriately. I’m used to be able to create thousands of threads without concern.

Both work just fine with hundreds of threads, and both offer built in thread pools, for that matter. I have 345 processes running on OS-X right now.

[+] _kp6z|8 years ago|reply
I think this is some kind of psychological projection/rationalization because the statement "Having been well and truly spoiled by illumos threading (and especially illumos kernel threads)" is actually quite laughable.
[+] dmitrygr|8 years ago|reply

   > even so-called "modern" operating systems
   > like macOS 10.12 and Windows 8.1 simply
   > melted or failed entirely when creating
   > any non-trivial number of threads. (To
   > me, creating 100 threads should be a no-brainer
This is nonsense. NT kernel happily creates thousands of threads. Just made a little test app to try. No issues at all.
[+] farazbabar|8 years ago|reply
Modern operating systems are very efficient at scheduling threads with negligible workloads compared to threads that are performing under load. I suspect that a lot of test code for toy examples and default workload for thousands of threads you see on an operating system has running at any given time is probably not cpu intensive enough to matter.

Compared to this, a library designed to perform actual work using hundreds of threads on a modern os with let's say 8 cpu cores - simply won't work due to the constant context switching overhead. Consider the library called LMAX disruptor as an example of this behavior where developers need to configure threads to be equal to number of available cpus if they wish to do busy_wait processing for inbound messages on the ring buffer. Chronicle Q actually uses JNI to pin message processing threads to individual CPU cores to make sure operating system scheduling does not end up flushing L1/L2 caches and that message processing for a given queue continues to occur on the same thread/cpu combo.

So it depends on your use-case and application.

[+] theincredulousk|8 years ago|reply
Of course no offense, but the author does seem to be somewhere shortly after the first peak on the Dunning-Kruger curve. Even truly embedded systems with an OS like QNX don't "melt" with 100 threads. Obviously he "doesn't know what he doesn't know" in expertise terms, and blamed a mistake on the operating systems.

That combined with the other extremely dubious reasoning - e.g. (see my other very long comment) how replacing state machines with callback functions is a good idea (tm). There is also the comment RE: "how bad Windows APIs are" and how IO-completion ports are '_actually_ pretty nice'. This is the kind of comment someone makes if their worried the "real" coders won't take them seriously if they use anything but Arch Linux.

Makes you think he is reasoning about things that he just isn't an expert in yet (embedded systems, operating system APIs, long-term framework design). Nothing wrong with that though - everyone has to build up experience somehow - but maybe a bit early to be on the front page of HN.

[+] laythea|8 years ago|reply
Can confirm. 155 processes, 1770 threads, and 60999 handles. According to Task Manager.
[+] api|8 years ago|reply
I hear the claim pretty often that threads won't scale and that even modern OSes can't handle many threads. It makes me wonder if people are doing something different, since I've found that threads are fine.
[+] jesseb|8 years ago|reply
I raised an eyebrow at that as well. My workstation running Windows 10 is currently chugging along with 236 Processes, 3564 Threads and 154915 Handles, which is pretty typical of this system.
[+] nfoz|8 years ago|reply
Do people generally consider nanomsg to be a "failed experiment"? Is anyone using it for their projects?

The author's tone makes it seem like I shouldn't use nanomsg. nanomsg in turn makes it seem like I shouldn't use ZeroMQ.

So what would people recommend me to use right now for a project? Are these issues all that serious?

My intended use would be for a simple friendly pub-sub API for programs to talk to each other, locally or across a network.

[+] rumcajz|8 years ago|reply
Original author of zmq/nanomsg here.

After all those years dealing with the problem of implementing network protocols I believe that this entire tangle of problems exists because we are dealing with something like 35 years of legacy in two different but subtly interconnected areas: concurrency/parallelism and network programming APIs.

The area of concurrency/parallelism started quite reasonably with the idea of processes. But then, at some point, people felt that processes are too heavy-weight and introduced threads (I'm still trying to find out who the culprit is, but it looks like they've covered their tracks well.) When even threads became too heavy-weight people turned to all kinds of callback-driven architectures, state-machine-driven architectures, coroutines, goroutines etc. Now we have all of those gand we are supposed to make them work together flawlessly, which is a doomed enterprise from the beginning.

At the side of network programming, BSD sockets (introduced in 1983) are the only common ground we have. They are long past their expiry date, they don't adapt to many use cases, but there's no alternative. There are more modern APIs there, but, AFAICS, none of them provides enough added value of top to become the new universal standard.

It should be also said that creation of new universal APIs is hindered by a host of weird network protocol designs out there in the wild. The API designer faces a dilemma: either they go for sane API and rule at least some weird protocols out or they try to support everything and end up with one mess of an API. Not a palatable choice to make.

Then there's the area where the two problems about interact. Originally, you were supposed to listen for incoming TCP connections, fork a new process for each one and access the socket using simple single-threaded program with no state machines, using only blocking calls. Today, you are supposed to have a pool of worker threads, listen on file descriptors using poll/epoll/kqueue, then schedule the work to the worker pool by hand. This raises the complexity of any network protocol implementation by couple of orders of magnitude. Also, you get a lot of corner cases, undefined behaviour, especially at shutdown, weird performance characteristics and I am not even speaking of the increased attack surface.

All in all, it's a miracle that with the tools we have we are able to write any network applications at all.

These days I am working on attacking the issue on both fronts. On the concurrency side it's http://libdill.org -- esentially not very interesting, just an reimplementation of goroutines for C, however, what's worth looking at is the idea of "structured concurrency", a system of managing the lifetimes of coroutines in a systemic manner: http://libdill.org/structured-concurrency.html

On the other front, the network programming, I am trying to put together a proposal for revamp of BSD socket API. The goal is to make it possible to layer many protocols on top of each other as well as one alongside the other. It's a work in progress, so take it with a grain of salt: https://raw.githubusercontent.com/sustrik/dsock/master/rfc/s...

[+] spacenick88|8 years ago|reply
So for me the really big question in all of this is "Are threads really too heavyweight?". This obviously needs the constraint "on a sane, modern OS".

For me the most sane C (non-datagram) networking model at least on Linux is threads, each calling accept concurrently (afaik few people know this is supported) and then handling the accepted connection until that is closed. For systems where you only want to handle a fixed number of connections like databases you keep your number of threads fixed for others you start a new thread once every other thread handles a connection already. It gets rid of Thread Pools (since you only do pthread_create() when your number of concurrent connections increases), Async- and callback-hell and makes all your handling code linear.

Every other day I keep seeing "Blabla uses async epoll so handles 10k connections" but a) what serious work can you do with 10k connections i.e. 125 kB/s per connection @ 10 Gbit/s. b) at what cost to readability/maintainability of code and c) are you sure you haven't just moved your bottleneck to something else? Also I've never seen any benchmark showing how this actually beats threads on a modern Linux box.

As for shitty OSs I say fuck them

[+] willtim|8 years ago|reply
> it's a miracle that with the tools we have we are able to write any network applications at all

Perhaps it is time to start using a higher-level language? One with a runtime that provides green threads?

[+] e12e|8 years ago|reply
Looks interesting, especially the Zerotier transport. Although I wonder why that's needed/what it means: with Zerotier you already have ip4/6 connectivity - what's the benefit of burying down below that?

Would that mean a "wireguard transport" would make sense as a default secure transport?

My other concern is that this starts to sound very big - have you been able to maintain clear modularisation of the code?

[+] kej|8 years ago|reply
>with Zerotier you already have ip4/6 connectivity - what's the benefit of burying down below that?

LibZT [1] provides a socket-like programming interface without requiring the full ZeroTier software and its system-wide virtual interfaces. I could see wanting to use something like nanomsg on top of that even though it's not an actual socket implementation.

[1] https://github.com/zerotier/libzt

[+] daurnimator|8 years ago|reply
> OpenSSL has it’s own struct BIO for this stuff, and I could not see an easy way to convert nanomsg's usock stuff to accomodate the struct BIO.

It's actually quite easy to write a custom BIO to work with your own state machines and buffering. This could have been a 1-day project....

[+] theincredulousk|8 years ago|reply
This is at least partially ill-advised. Comes off as an expertly done, but same old refactoring project that could be titled "I didn't understand this and would understand it better if I were designed based on my personal preferences". This is reinforced by the probably approaching 1 that nobody needs another "six of one half dozen of the other" message queue framework, and alluding to an belief that the C++ library is somehow too bloated for embedded environments. While that was at a time true, no reasonably modern embedded system that requires multi-threading to "100s" of threads, or uses 100s of live sockets for message queue I/O, has to avoid C++ for being too heavy. This is just ZeroMQ alternative #N - not anything objectively better, and certainly not "nano" for embedded systems. The "one true messaging framework" is a unicorn - everyone feels like it should exist but nobody can make it.

But for many cases this is not necessary. A simple callback mechanism would be far better, with the FDs available only as an option for code that needs them. This is the approach that we have taken with nng.

Replacing a state machine with callbacks... something something something you're gonna have a bad time. Esp. considering the gripes are about readability, following control flow, and race conditions. Callbacks are objectively worse for all of those things. Control flow is hard to read in state machine frameworks because the primary flow is dictated by something like "nextState(thisState, action)", so you can't follow it with code lookup.

The problem here (and almost always) is lack of documentation or visualization (or picking the wrong abstraction level for the formally defined states). The beauty is that the definition is almost by default naturally easy to parse (tables of states in a header file, etc.). It takes some extra effort, it is a one shot to write something that generates graphviz state chars or something similar from the state table definition. You could write a custom dot syntax generator from a C-style table definition in what, three hours? Doxygen already does this for much more complicated stuff. Googling reveals this is nothing new: https://gist.github.com/freqlabs/24d88ad8e687891c970a69f16f1...

All that said, State Machines are (currently) the one true abstraction for a given program because that is what a computer is, and every program is, to begin with. If you're not using them explicitly, it just means you have a poorly defined/documented state machine. Maybe someday there will be a better model of computing, or a better way to model programs. For now, the human brain isn't getting any better at keeping track of computer programs, and anything more than single-threaded functional-style code is almost certainly not any "absolute" improvement in readability.

I firmly believe that visualization has become necessary due to complexity, and it is past time to embrace it. There is some stigma that visualization is for fakers - "real programmers only use a bare text editor" - or that it is for children learning programming, or non-engineering folks that need pictures because they're dumb. To be a bit hyperbolic, if we want any chance at keeping up with "the machines", we're going to need a better general-purpose, more workable abstraction than text files. There is no more canonical example of this than the issues pointed out here - state machines are the right abstraction for complex systems, but complex state machines are incredibly difficult to follow in source code.

[+] Matthias247|8 years ago|reply
> Replacing a state machine with callbacks... something something something you're gonna have a bad time

Having written quite a few networking systems, I fully agree. Especially if it's coupled in this text with "replacing a single-threaded state-machine" with "a multithreaded system, which uses callbacks". IMHO the latter is almost guaranteed a recipe for all kinds of multithreading issues, from race-conditions over deadlock to memory issues. And even if the author of the library gets it right, the issues are often at library user level (didn't expect where the callback is executed and accessed their data without necessary synchronization).

The described model of nanopb sounds very sane compared to that. Pure multithreaded systems without callbacks (like in Go) also work reasonably well. But even those do not work without state-machines in all situations. In all cases where state is manipulated by more than one code-path it's mostly the most reasonable thing to handle this in some kind of state-machine which runs on a single thread, instead of trying to fight it with dozens of fine-grained locks. And just as an example: The HTTP2 implementation in Go uses a state-machine which runs on a goroutine per connection. And user code, frame reader and frame writer code communicate via messaging with that state-machine.

[+] adrianratnapala|8 years ago|reply
> There is some stigma that visualization is for fakers - "real programmers only use a bare text editor"

That stigma would have gone away long ago if the actually existing visual languages were less wretched. (I'm looking at you LabView!). No doubt there are better languages than LabView, but while I think a visual language can be good, creating one will involve many as-yet-unknown unknowns. So even good attempts will be toys at first.

A good stepping stone would be visual analysis of textual programs. I want a debugger for a data-flow graph where following edges answers the question "what caused this value to be what it is". And I want to visualise that graph as well as possible, so that I can get an overview of the different possible causes.

[+] tuukkah|8 years ago|reply
> All that said, State Machines are (currently) the one true abstraction for a given program because that is what a computer is, and every program is, to begin with.

I don't understand these claims. To me, a computer "is" processors that step through memory locations interpreting them as operations and operands - not a state machine. Equally, hardly any program is a state machine.

Is a Haskell program a state machine, or "poorly defined"? I'd suggest other models of computation such as typed lambda calculus provide a much better basis for defining and documenting computer programs.

[+] hinkley|8 years ago|reply
Aren’t there FST implementations that solve the problem of following the next() logic? I know I’ve heard of ones based on enumerated types but that is not the only way to solve this problem.
[+] larrik|8 years ago|reply
I think you meant "Rationale"
[+] senatorobama|8 years ago|reply
Can someone explain what's so special about these libraries that make then "better" than raw sockets?
[+] braywill|8 years ago|reply
They're not better than raw sockets, they're simply an abstraction layer on top of them.

For example, building a message queuing service using raw sockets that works on Windows, Mac OS, and Linux is quite the undertaking. With ZeroMQ (and I'm assuming nanomsg as well), it's quite simple.

ZeroMQ and nanomsg are like raw socket toolboxes.

[+] aidenn0|8 years ago|reply
Berkeley style TCP sockets let you connect to/listen from and endpoint and transfer bytes.

That's pretty cool, but what if you want to send messages rather than raw bytes. What if you want to do publish/subscribe rather than send/receive to a single endpoint? &c.

ZeroMQ implements simple things like that that are often implemented differently in an ad-hoc manner.

[+] dguaraglia|8 years ago|reply
One of the features I recall liking about the ZMQ "family" of implementations is that they handle a lot of the error handling/reconnection for you.

The ZMQ guide is actually a great document to read in general terms, I recommend reading it if you are curious about the architectural rationale (which you could, actually, re-implement in your own raw socket protocol): http://zguide.zeromq.org/page:all