top | item 13610923

(no title)

plamb | 9 years ago

Our impression was that when Databricks released the billion-rows-in-one-second-on-a-laptop benchmark, readers were pretty awed by that result. We wanted to show that when you combine an in-memory database with Spark so it shares the same JVM/block manager, you can squeeze even more performance out of Spark workloads (over and above Spark 's internal columnar storage). Any analytics that require multiple trips to a database will be impacted by this design. E.g. workloads on a Spark + Cassandra analytics cluster will be significantly slower, barring some fundamental changes to Cassandra.

discuss

No comments yet.