top | item 29207807

(no title)

maslam | 4 years ago

Databricks broke the record by 2x) and is 10x more cost effective, in an audited benchmark. Snowflake should participate in the official, audited benchmark. Customers win when businesses are open and transparent…

discuss

order

mst|4 years ago

Databricks and snowflake should pay an independent third party to re-run these. In-house benchmarks by either company don't count with results this different.

cmhill|4 years ago

Databricks didn't run the Snowflake comparison in-house. From their article it says: "These results were corroborated by research from Barcelona Supercomputing Center, which frequently runs TPC-DS on popular data warehouses. Their latest research benchmarked Databricks and Snowflake, and found that Databricks was 2.7x faster and 12x better in terms of price performance."

jiggawatts|4 years ago

Audited how? If you look at the Snowflake response the numbers being posted by Databricks look outright faked or otherwise false.

rxin|4 years ago

There's an official TPC process to audit and review the benchmark process. This debate can be easiest settled by everybody participating in the official benchmark, like we (Databricks) did.

The official review process is significantly more complicated than just offering a static dataset that's been highly optimized for answering the exact set of queries. It includes data loading, data maintenance (insert and delete data), sequential query test, and concurrent query test.

You can see the description of the official process in this 141 page document: http://tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v3....

Consider the following analogy: Professional athletes compete in the Olympics, and there are official judges and a lot of stringent rules and checks to ensure fairness. That's the real arena. That's what we (Databricks) have done with the official TPC-DS world record. For example, in data warehouse systems, data loading, ordering and updates can affect performance substantially, so it’s most useful to compare both systems on the official benchmark.

But what’s really interesting to me is that even the Snowflake self-reported numbers ($267) are still more expensive than the Databricks’ numbers ($143 on spot, and $242 on demand). This is despite Databricks cost being calculated on our enterprise tier, while Snowflake used their cheapest tier without any enterprise features (e.g. disaster recovery).

Edit: added link to audit process doc

maslam|4 years ago

Hey jiggawatts - TPC is the official way to audit benchmarks in the database industry. They’ve been around for a bit, but let me know if you want more info, I’m happy to share more about them.

Spivak|4 years ago

The results are so crazy different that either Snowflake or Databricks are wrong or outright lying.