The test uses 4 partitions (4 threads) and is not single-threaded. Of course, distributing over network will have network costs which can be significant for sub-second queries but will become insignificant for larger data and more involved queries. The Spark execution plan will be exactly same on laptop or in cluster, and these specific queries will scale about linearly.
No comments yet.