top | item 7967304

Cassandra driver for Spark

28 points| tjake | 11 years ago |github.com

3 comments

[+] anko|11 years ago|reply

I'm really interested in spark, but know next to nothing about Hadoop. What's the best way for me to get started?

[+] pkolaczk|11 years ago|reply

You don't really need to know anything about Hadoop Map/Reduce to start using Spark. Spark has its own, more powerful "map-reduce".

You need familiarity with one of the storage platforms supported by Spark - currently these are Hadoop File System and Apache Cassandra. The easiest way to play with Cassandra is:

1. grab a copy of DSE (free to test or develop) and install it (download here: http://www.datastax.com/download)

2. launch 'cqlsh', create a Cassandra keyspace and a table and insert a few rows into it

3. launch 'dse spark' and query your data with e.g. sc.cassandraTable("keyspace", "table").toArray

Doing it with Apache Cassandra (not DSE) is going to be slightly harder, because besides installing Cassandra, you'll have to set up standalone Spark cluster (see Spark docs), then follow the instructions in README.md of the driver.

[+] bonchibuji|11 years ago|reply

You could also look at Shark, which is basically Hive on Spark.

http://shark.cs.berkeley.edu/