(no title)
tango12 | 2 years ago
Hm…ignoring the specific design decision here on using json agg - I think the open question is - is that specific sentiment practically still the truth in 2023? Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?
Do we need to spend time writing boring code when we could instead just use DBs better? And then instead we can spend time where it matters in the app server.
> And in most situations you want to "expand" the data as close as possible to the final location.
What do you mean by “expand”?
From a perf pov: Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles? There’s a significant db perf, intermediate layer ser/de, and network latency difference right?
sgarland|2 years ago
I would be surprised if the DB performed better with an aggregation than just returning the results. Assuming you have reasonable indices, it’s a trivial join.
Network latency, maybe. Assuming 20 bytes per name, with no compression that’s an extra 2 MB.
You could write the query with a CONCAT as a subquery as a poor man’s aggregation, but then you’re very much at the mercy of the planner deciding to do all of that in one pass. I’m not sure that it would. On mobile, or else I’d check.
tango12|2 years ago
But for the DB to be sending less data over the network means that the DB is doing less CPU work in helping process that data.
Benchmarking should be straightforward for a particular use-case.
ElectricalUnion|2 years ago
That is a very good question indeed, it probably depends on whatever compute or the DB offering is cheaper to scale for you; from my very anecdotal experience using both, usually DB is much more expensive to scale up.
> What do you mean by “expand”?
Getting more data thru the pipes, getting closer to the limit of the network pipes.
That doesn't sound like a realistic scenario to me - pulling Y articles from X authors.
If you're not pulling directly from articles, and you're pulling from X authors, X>1, you most likely don't actually want the articles, more likely you want instead some stat of each article like article score, subject, tags or comment counts. Then I don't see neither the JOIN expansion or the JSON ser/de being a serious issue at all compared to the DB retrieval costs.