Dask for python satisfies exactly these requirements, just in the python ecosystem. A pattern I have been using at multiple workplaces for the last decade was to start a dask cluster (maybe 10 years ago I would start an ipyparallel cluster) on any node that does computation, then as I needed I would spin new nodes and connect them. This gives dynamic infinite scalability, with almost no overhead or even code debt - the dask interfaces are great even without using distributed computing.
When I wasn't allowed to use containers, I would sneakily add code to other machines to join my dask clusters. I would connect any and all computing devices.
One company pushed us to use databricks and spark, and I never got it - why would we commit to a cluster size before we started a computation?
No comments yet.