It's a very different system. Hadoop was structured to compute over massive data sets, especially at a time when hardware forced moving compute to the data to optimize overall performance. (That's less true today; networks sometimes are faster than disk access!) In Ray, when you annotate a function to make it a remote task or annotate a class to make it a remote actor, _both_ the data passed to the function, or the data and object state for the actor, serialized together and moved to the node where execution will occur. Then results (or object/actor state) are retrieved with the `ray.get` method. So, the "chunks" of data are relatively small, compared to Hadoop, much more fine grained.
Jupe|6 years ago
I like this design, it seems much more "manageable" from a mere human's perspective.