top | item 17044595

Haskell's Missing Concurrency Basics (2016)

114 points| DanielRibeiro | 7 years ago |snoyman.com

26 comments

fiorix|7 years ago

There's an open position at Facebook to work on GHC. If you're into Haskell and want to make it better, here's your opportunity: https://www.facebook.com/careers/jobs/a0I1H00000MoVjBUAV/

dnautics|7 years ago

How does erlang/elixir do it? I've never really had any problems.

SlySherZ|7 years ago

Elixir newbie here. If I remember correctly IO is implemented as a process, which means that different write requests are processed sequentially in the order they arrive to the process. The following link has more information: http://erlang.org/doc/apps/stdlib/io_protocol.html

leshow|7 years ago

They have a single process which manages IO, you communicate w/ it by passing messages. There is never any contention because it's never shared. Of course, there are trade-offs involved with this decision, but it's a really nice architecture.

chriswarbo|7 years ago

I had some sympathy for this situation, until I saw that the concurrency was being specified via a function called `mapConcurrently`.

IMHO this is perfectly acceptable behaviour for a `map` function, since that name has gained the connotation that its purpose is to transform one 'collection' (Functor; whatever) into another, by pointwise, independent applications of the given function. Providing a function/action which breaks this independence (by writing to the same handle) breaks this implicit meaning. Heck, I'd consider it a code smell to combine interfering actions like this using a non-concurrent `map` function; I would prefer to define a separate function to make this distinction explicit, e.g.

    -- Like 'map', but function invocations may interfere with each other (you've been warned!)
    runAtOnce = map

When using `map` functions (which is a lot!) I subconsciously treat it as if it will be executed concurrently, in parallel, in any order. Consider that even an imperative languages like Javascript provide a separate `forEach` function, to prevent "abuses" of `map`. Even Emacs Lisp, not the most highly regarded language, provides separate `mapcar` and `mapc` functions for this reason.

With that said, I recognise that there's a problem here; but the problem seems to be 'mapping a self-interfering function'. If we try to make it non-interfering, we see that it's due to the use of a shared global value (`stdout`); another code smell! Whilst stdout is append-only, it's still mutable, so I'd try to remove this shared mutable state. Message passing is one alternative, where we can have each call/action explicitly take in the handle, then pass it along (either directly, or via some sort of "trampoline", like an MVar). This way we get the "concurrent from the outside, single-threaded on the inside" behaviour of actor systems like Erlang. In particular, it's easy to make sure the handle only get passed along when we're 'finished' with it (i.e. we've written a complete "block" of output).

divs1210|7 years ago

Thread-unsafe `println` is one of Clojure's quirks too!

masklinn|7 years ago

Interesting, I think it's thread-safe in Rust because one of the common performance improvements for console applications with lots of output is to acquire the relevant stream's lock (and perform all writes against a never-released guard) otherwise it's going to be acquired and dropped on every write: https://doc.rust-lang.org/src/std/io/stdio.rs.html#448-461

heavenlyhash|7 years ago

I'm kind of surprised to hear that writing to stdout is a source of concurrency problems in a language that's considered to be functional.

Surely if you can pass your IO handles to all functions that need them, you can decide on a mutexing/buffering strategy at the top of your program, wrap the standard IO interface with a delegate that does so, and pass it on. Then, for all libraries called thereafter to use it consistently isn't just a no-brainer, it's an outright given, isn't it? There's no global (impure, non-functional) handle to stdout, is there?

foldr|7 years ago

Haskell's being functional is pretty much irrelevant here. The process has one stdout. If functions that write to file handles don't acquire a lock, then the output of different threads will get mixed up.

mitchty|7 years ago

Being functional doesn't mean interaction with the outside world is going to lose its difficulty. Stdout being buffered basically means you have to sort out how to get consistent output just like in most other languages.

Even if you do as you say, you can still bypass it by writing to stdout directly outside of your stable buffering mechanism. But at that point the language isn't to be blamed here.

clord|7 years ago

Use an STM channel or some other lock and put your messages for the shared resource (the terminal ui) through that channel. There’s no way to automatically figure out what granularity the programmer expects from the output so make them specify. Haskell makes specifying that staggeringly easy compared to other languages.

bitL|7 years ago

How can you get a performant language if your I/O granularity is 1 character? :-O

EDIT: this is an honest question, I was shocked to read what was in the article.

chriswarbo|7 years ago

Strings in Haskell are one of the language's sore points. It's something that's mostly a non-issue for those using Haskell day to day, but may be surprising to newcomers.

Haskell's built-in string type is a list of characters. This is mostly for historical reasons, but it's also handy in education (installing extra packages is a barrier for learners; list processing is common in introductory courses, but lists are polymorphic/generic in their element type; lists of characters are a nice concrete type, which follows on easily from "hello world"); also there are arguments about the theoretical elegance of linked lists, KISS for the builtins, whether there's concensus on what the best alternative is, etc.

Anyone who cares about Haskell performance will have hit this early on, and be using a different string implementation, as mentioned in the article. In particular there's ByteString for C-like arrays of bytes, and there's Text which is just a ByteString with extra metadata like character encoding. In fact, ByteString doesn't have to be a single contiguous array: it can be a list of "chunks", where each chunk contains a pointer to an array, an offset and a length; this speeds up many operations, e.g. we can append ByteStrings by adding chunks to the list (pointing to existing arrays), we can take substrings by manipulating the offsets and lengths, etc. This is all perfectly safe and predictable since the data is immutable, where other languages which allow mutation might prefer to make copies of the data to reduce aliasing.

The other aspect is the buffering mode of the handle, which is discussed a little in the article and its comments (e.g. line-based buffering, etc.).

bru|7 years ago

Title is missing a (2016).