top | item 42322600

(no title)

steepben | 1 year ago

Yes it's a large chunk, but not everything! Marc had a comment on bluesky regarding this:

> Many SQL aggregations are monotonic operations (e.g. MAX, SUM, etc) that can be partially completed on each node and then post-merged. Some (e.g. DISTINCT) can be transformed into monotonic ops with some effort. Some aren't possible to do this way. (Ref on monotonicity: arxiv.org/pdf/1901.01930)

The benefit of this is that a lot more work is done _close_ to the data. The trend is that bandwidth is getting larger in data centers, but latency isn't improving at the same rate. Reducing the number of round trips between QP and storage greatly improves the overall query latency, even if you have to do more work on the storage.

discuss

order

zokier|1 year ago

> The benefit of this is that a lot more work is done _close_ to the data.

But isn't that fundamentally at odds with the central idea of disaggregation

> At a fundamental level, scaling compute in a database system requires disaggregation of storage and compute. If you stick storage and compute together, you end up needing to scale one to scale the other, which is either impossible or uneconomical.

So either you can get good perf by doing the work close to data, or get good scalability by separating compute and data. But I can't see how you can do both.