(no title)
plamb | 8 years ago
You can imagine GemFire/Gridgain as an apples-to-apples comparison. Both are "enterprise" in-memory data grids originally intended for managing data in low-latency OLTP applications which later added analytics/OLAP features. Geode/Ignite are the open source options for these two IMDGs and also a good apples-to-apples comparison. (Hazelcast also has enterprise/OSS verisons I would compare accordingly)
I can't speak to the current comparison between these systems, but I can compare them to SnappyData. SnappyData deeply integrates GemFire with Spark to bring high concurrency, high availability and mutability to Spark applications. In the world of combining Spark with a datastore over a connector (cassandra, hive, mysql, mongo etc) to enable "database-like" features in Spark, SnappyData has taken the next step of integration. In Snappy, the database (GemFire) and the Spark executors share the same block manager and VM so the systems no longer communicate over a "connector." This, along with our database optimizations, provides the best performance for Spark applciations in what I like to call the "Spark Database Ecosystem."
As such, comparing SnappyData to GemFire/Hazelcast/Gridgain does not make much sense unless you are trying to use Spark in conjunction with these systems. In that case, the main difference I would point out is that SnappyData will necessarily perform better as any of them would need to use a connector to interact with Spark. The better comparison would be between SnappyData and Ignite, as Ignite contains a direct Spark abstraction called "IgniteRDD." That said, the majority of the comparisons/benchmarks we've run have been against MemSQL+Spark and Cassandra+Spark, so I don't have much to say about Ignite vs SnappyData.
User manigandham mentions SnappyData's Approximate Query Processing features (called Synopses Data Engine) which is unique within this space, but a discussion of which would take this too far afield.
No comments yet.