I had similar issues with the Dask scheduler a few months ago. The docs say it encourages depth first behavior in the computation graph but in my case it kept running out of memory on a large ETL task by first trying to load all the input files into memory before moving on to the next stage.
No comments yet.