top | item 9500512

(no title)

ivansmf | 10 years ago

I wrote last year's benchmark. The clusters are completely different, and so is the workload. Last year's cluster had 300 VMs, which was a much higher price point, and the workload was write only. This benchmark uses YCSB workloads A and B, which we though matches the usage we'll have on BigTable. The cluster is much smaller as well. I shared my scripts from last year, it is pretty easy (although a bit expensive) to repro the numbers. Let me check if we can share this year's benchmark scripts as well.

discuss

bbromhead|10 years ago

I'm pretty surprised about the difference in latency though, throughput as you say will be different due to number of nodes.

For any given replication factor in Cassandra, overhead remains the pretty much the same irrespective of whether you have 300 or 3 nodes. So should the latency.

On top of that both BigTable and Cassandra use SSTables to store the data on disk (with all the compactiony goodness that goes with them), so I'm even more surprised that the difference in latency is so huge.

Would love to see the scripts for the benchmarks! I don't want to take away from a great product launch and I'm sure BigTable kicks arse in certain areas that Cassandra doesn't... I'm just surprised at the differences in latency.