(no title)
rxin | 4 years ago
Consider 4 queries. Two run for 1sec, and the other two 1000sec. If we look at arithmetic mean, then we are really only taking into account the large queries. But improving geometric mean would require improving all queries.
Note that I'm on the opposite side (Databricks cofounder here), so when I say that Snowflake didn't make a mistake here, you should trust me :)
bjornsing|4 years ago
No. Improving the geometric mean only requires reducing the product of their execution times. So if you can make the two 1 ms queries execute in 0.5 ms at the expense of the two 1000 ms queries taking 1800 ms each then that’s an improvement in terms of geometric mean.
So… kind of QED. The geometric mean is not easy to reason about.
ttmahdy|4 years ago
One of the benefits of geometric mean is that all queries have "equal" weight in the metric, this keeps vendors from focusing on the long running queries and ignoring the short running ones. It is one way to balance between long and short query performance.
A similar concept is applied to TPC-DS for data load, single user run (Power), multi user run (Throughput) and data maintenance (Concurrent Delete and Inserts).
Check clause 7.6.3.1 in the TPC-Ds spec in http://tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v3....