top | item 35348174

(no title)

acmiyaguchi | 2 years ago

The language is not a limiting factor here. Python is an excellent scripting language, and works plenty fine in distributed computation. The Python interface to Spark is a wrapper on the underlying Scala API. You don't lose out on performance when you're building up a lazy chain of computation that's executed by an engine written in a more performant language.

Fugue is a layer to abstract out these distributed computation backends, and it looks like a nice programming interface.

discuss

order

kvnkho|2 years ago

Well said! Python can push down to other languages like Rust and C to speed things up. Python can serve as a great end-user interface.

chrisjc|2 years ago

Trying to wrap my head around Fugue and the comment explaining how Python is a good wrapper.

Does Fugue take advantage of each sublayer that already uses Arrow?

crabbone|2 years ago

Python is not excellent in any domain it's used. But, yes, it's a problem with the language, on which I'll comment later.

First, present day users of Python need to understand how and why Python came under the spotlight. There was always a standoff between programmers who created unimaginative huge programs full of drool and red-taping, and programmers who wanted larger freedom of expression, less strings attached. The later group was usually the more savvy ones.

In a way very similar to how an art student might be spending months studying a model, using a whole bunch of pencils starting from 10B and ending in 10H, various chalks, coal sticks and so on... and would still produce a... "study of a model #80907", which is ugly, anatomically incorrect and just boring. And there's an accomplished artist who can just stick her finger into the chimney, grab some soot, and in a matter of minutes make a great drawing, which will be lively, expressive, you name it.

So... Python, and Perl before it were the soot. The junk languages a more experienced programmer would go to just to show those boring Java programmers "how it's done". But, Python programmers who came in the next wave thought that soot is the good tool to learn how to make good drawings. And, today, we have academies full of students trying very hard to draw models with materials and instruments which are very inappropriate for the task. (Unless you know anything about art education today, the example isn't that big of a stretch of what happened in it around 70s-80s.)

---

I don't care if Python is a glue code for Scala or C or Rust: it doesn't matter. Python, as a language, is inadequate for dealing with concurrency. It needs to remove a bunch of stuff before it can start adding stuff that can be used to that end. It's a language with a lot of mutation semantics which are hard to interpret / implement correctly (what would that even mean?) in distributed context. It's a language with a lot of implicit stuff going on that is somewhat useful (but is not useful enough) if you want to have a quick and dirty "sketch" quality code, but will be devastating in distributed context.

Things like decorators, context managers, imperative loops with break and continue, error handing mechanism, threading -- all of this must go before Python can start to think about becoming a decent language for distributed systems. But, probably more: I would need to research this in much depth to tell for sure if things like method calls would work well for example.

It's a waste of time to try today to fit Python into distributed computation. You will either have to put a humongous effort purging a better half of the language (making all of the famed support libraries useless), or you will end up with a defective hodge-podge mess (which is all those famed support libraries are, including those which aim to do distributed programming in Python).

goodwanghan|2 years ago

I guess you want to say MPI and C++ are better for distributed computation?