top | item 44736240

(no title)

henrydark | 7 months ago

Dask for python satisfies exactly these requirements, just in the python ecosystem. A pattern I have been using at multiple workplaces for the last decade was to start a dask cluster (maybe 10 years ago I would start an ipyparallel cluster) on any node that does computation, then as I needed I would spin new nodes and connect them. This gives dynamic infinite scalability, with almost no overhead or even code debt - the dask interfaces are great even without using distributed computing. When I wasn't allowed to use containers, I would sneakily add code to other machines to join my dask clusters. I would connect any and all computing devices. One company pushed us to use databricks and spark, and I never got it - why would we commit to a cluster size before we started a computation?

discuss

No comments yet.