What Color is Your Function? (2015)

[+] miracle2k|8 years ago|reply

See also this reply: The Function Colour Myth (https://lukasa.co.uk/2016/07/The_Function_Colour_Myth/).

[+] Shoothe|8 years ago|reply

I don't see how this reply addresses the core point, it just argues that async await is useful (that's fine but not novel by itself).

It is possible to work around the "color" by using generators [0], that way one algorithm can be used in synchronous and asynchronous way. Async await tightly bind to underlying Promises.

[0]: https://curiosity-driven.org/promises-and-generators#decoupl...

[+] skrebbel|8 years ago|reply

I agree with the article but not with the editorialization in the title.

In the last 2 years I've mostly used Elixir and JavaScript. In Elixir, generally all functions are synchronous. In JavaScript, there's the typical two-color functions that the author describes: synchronous and asynchronous.

Elixir is nicer by far for a lot of parallel programming. Our backend can handle large loads on a single VM simply because of this.

However, the world is asynchronous. It's very nice that Elixir abstracts it away, but sometimes that's not what you want and then Elixir makes things significantly harder than JavaScript.

For example, imagine that my backend gets many parallel requests for mildly different resources. Each resource needs to query some other system (some 3rd party API, a DB, whatever), but it turns out that many of them will do exactly the same request to that other system even though the user requests they're handling are all pretty much unique. A simple solution is to cache the requests every second/minute/whatever and combine them all. This is nontrivial because I want to cache the running requests, not the responses. If there's one request running, I don't want to launch an identical request.

In Elixir, I have to carefully design a GenServer to make this work properly. Or maybe use a library such as the fantastic `memoize`[0] which is great for this, and can even do it without spawning any additional processes at all. But in any way, I have to add complexity - either complexity that I write or complexity that I import. `memoize`'s core caching code is amazing, but not trivial[1].

In JavaScript, I can just cache a bunch of promises in a Map or an object. If the promise exists and hasn't expired, don't do a new request. Else, do the request and cache the promise. Then, in either case, await the promise and use the result. It's 10 lines of boring code.

I still like Elixir better for backends and I'm happy that our backend isn't Node. But I'm writing this to underline that explicit support for asynchronicity can be a feature too.

[0] https://github.com/melpon/memoize [1] https://github.com/melpon/memoize/blob/master/lib/memoize/ca...

[+] mercer|8 years ago|reply

I'm very much a beginner at this, so apologies if I'm missing something obvious, but I find this an interesting scenario to think about (so even just to learn how you would do it in Elixir would be nice).

Why not track the fetching processes in another managing process (or ETS)? Whenever a new request comes in, you check if it there's a process already busy fetching that particular resource. If there is, ask it to send the results back to this request process too, or use whatever storage that keeps this data to satisfy both requests.

In practice I'd probably be using some kind of managing process anyways, to deal with queuing multiple requests to 3rd party provider (because rate limiting and whatnot), and I'd perhaps also have some storage and 'inform requesting party of result' logic/process in place too, so I'd not be adding too much extra complexity.

Is this a valid approach to begin with? Or does this expose me as very much an Elixir/OTP beginner :)? And if the approach works, what complexity am I missing compared to keeping track of promises in a map? Or am I overestimating how much complexity you're talking about?

I had my first taste of doing some 'real' stuff exactly with caching responses and dealing with rate limiting and whatnot, and I found that to be quite nice to do in Elixir, but this particular 'problem' strikes me as one I'll probably run into fairly soon.

[+] he0001|8 years ago|reply

> However, the world is asynchronous.

Can you give me examples of sources that explains this concept? I really don’t think the world is asynchronous, because if it were the Big Bang could happen now. Aren’t you mixing the word ‘asynchronously’ with ‘independently’?

[+] Piezoid|8 years ago|reply

Facebook built Haxl[0], a library in Haskell, precisely for this use case: batching, caching and paralyzing requests against external sources.

[0] https://github.com/facebook/Haxl

[+] littlestymaar|8 years ago|reply

This article has already been posted 3 times on HN, and have been extensively discussed the first time here: https://news.ycombinator.com/item?id=8984648

[+] throwaway13337|8 years ago|reply

Async web servers are more complex than thread per connection ones.

I think that's the big thing the author is communicating.

It's true.

And most types of development don't need async io. It's worth talking about.

The author makes the mistake of calling this a language problem and not a web server problem.

With python, Java, c#, and hell, even js, you can have a thread per connection web server if you choose to. This will avoid many async woes.

It's just not trendy.

[+] eecc|8 years ago|reply

Except I remember my times as sysadmin, putting out fires on Tomcats brought to their knees by 1 poorly performing query and a few impatient users hitting F5.

Yes, async is indeed more complex but it is a measurable improvement over previous - thread based - designs, and it is trivial to hit scenarios where it will make a difference.

Just like anything though, it must be applied sensibly: not every single call must be decomposed into a choreography of async collaborators (yes, listen that, it’s “architecture talk”, it’s like that dark language used in Mordor that even Gandalf is scared to pronounce. Ever worked on a DDD/CQRS/ES project? ;)

[+] xtrapolate|8 years ago|reply

> "It's just not trendy."

The "thread-per-connection" model doesn't scale. It's simply not resource-efficient in any way.

[+] pjc50|8 years ago|reply

Ultimately the problem is "the system needs to retain enough state per connection to resume executing the 'next' step of the communication".

This state is currently split into different pieces and spread around the system. The operating system maintains the TCP state, for example. If you use threads to partition connections, the operating system looks after the instruction pointer to resume a non-running thread at. The program (whatever language) will usually have some kind of stream state and maybe buffers.

It's all a question of where you want to put the abstraction boundaries. Heavy threads? Light threads? Async? Continuations? Or go the other way and have process-per-core with userland drivers and TCP state, where each process is just listening for interrupts and handles everything from there?

[+] 2sk21|8 years ago|reply

Everytimee I have to wade into Node, I feel like they are reinventing the early days of programming with cooperative multi-tasking.

[+] bennofs|8 years ago|reply

With green threads, I think you can have async io but with threads. When your current thread is making an async call, suspend the green thread (cheap) and yield back to the event loop. Threads are woken up when their async call finishes. This avoids the overhead of real OS threads per connection.

[+] hliyan|8 years ago|reply

This seems like a good faith comment. I'm not really sure why it's getting downvoted. Someone care to shed some light?

[+] hyperion2010|8 years ago|reply

Just the other day I was looking over my notes from (horror of horrors) 3 years ago on core problems to avoid when designing a language, and this article came up. Still just as good a read as I remembered, made all the more pertinent by the intervening time in which I have repeatedly cursed python's implementation for the massive friction burns it has caused me. Probably needs a (2015) on it for context, but I consider it timeless!

[+] unknown|8 years ago|reply

[deleted]

[+] unknown|8 years ago|reply

[deleted]

[+] mcv|8 years ago|reply

  > Wanna know one that doesn’t? Java.

Unless you use Vert.x of course.

A year ago I helped a friend on a big project in Java for which he was using Vert.x. Suddenly I find myself doing Javascript-style programming in Java. Convenient for calls to other services (a payment provider, for example), but indeed a bit cumbersome if you're used to Java being synchronous.

[+] wokwokwok|8 years ago|reply

Please add a 2015 tag, this is old and has previously been extensively discussed.

[+] kabes|8 years ago|reply

Fibers (coroutines) solves most of the criticism for JS. Although not part of the language and you need to extend v8, it's a viable option for webservers. I believe that's how the Meteor framework has been doing it years before async/await or even promises was a thing.

[+] egnehots|8 years ago|reply

And a great sage would say that it all boils down to category theory.

45 comments