Databricks broke the record by 2x) and is 10x more cost effective, in an audited benchmark. Snowflake should participate in the official, audited benchmark. Customers win when businesses are open and transparent…
Databricks and snowflake should pay an independent third party to re-run these. In-house benchmarks by either company don't count with results this different.
Databricks didn't run the Snowflake comparison in-house. From their article it says: "These results were corroborated by research from Barcelona Supercomputing Center, which frequently runs TPC-DS on popular data warehouses. Their latest research benchmarked Databricks and Snowflake, and found that Databricks was 2.7x faster and 12x better in terms of price performance."
There's an official TPC process to audit and review the benchmark process. This debate can be easiest settled by everybody participating in the official benchmark, like we (Databricks) did.
The official review process is significantly more complicated than just offering a static dataset that's been highly optimized for answering the exact set of queries. It includes data loading, data maintenance (insert and delete data), sequential query test, and concurrent query test.
Consider the following analogy: Professional athletes compete in the Olympics, and there are official judges and a lot of stringent rules and checks to ensure fairness. That's the real arena. That's what we (Databricks) have done with the official TPC-DS world record. For example, in data warehouse systems, data loading, ordering and updates can affect performance substantially, so it’s most useful to compare both systems on the official benchmark.
But what’s really interesting to me is that even the Snowflake self-reported numbers ($267) are still more expensive than the Databricks’ numbers ($143 on spot, and $242 on demand). This is despite Databricks cost being calculated on our enterprise tier, while Snowflake used their cheapest tier without any enterprise features (e.g. disaster recovery).
Hey jiggawatts - TPC is the official way to audit benchmarks in the database industry. They’ve been around for a bit, but let me know if you want more info, I’m happy to share more about them.
mst|4 years ago
cmhill|4 years ago
jiggawatts|4 years ago
rxin|4 years ago
The official review process is significantly more complicated than just offering a static dataset that's been highly optimized for answering the exact set of queries. It includes data loading, data maintenance (insert and delete data), sequential query test, and concurrent query test.
You can see the description of the official process in this 141 page document: http://tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v3....
Consider the following analogy: Professional athletes compete in the Olympics, and there are official judges and a lot of stringent rules and checks to ensure fairness. That's the real arena. That's what we (Databricks) have done with the official TPC-DS world record. For example, in data warehouse systems, data loading, ordering and updates can affect performance substantially, so it’s most useful to compare both systems on the official benchmark.
But what’s really interesting to me is that even the Snowflake self-reported numbers ($267) are still more expensive than the Databricks’ numbers ($143 on spot, and $242 on demand). This is despite Databricks cost being calculated on our enterprise tier, while Snowflake used their cheapest tier without any enterprise features (e.g. disaster recovery).
Edit: added link to audit process doc
maslam|4 years ago
Spivak|4 years ago