(no title)
thesz | 6 days ago
Imagine three joins of three queries A,B and C, where first join J1 joins A and B, second join J2 joins A and C and third join J3 joins J1 and J2. Note that I said "queries," not "tables" - these A, B and C can be complex things one would not want or be able to compute more than once. Forget about compute, A, B and C can be quite complex to even write down and the user may really do not want to repeat itself. Look at TPC-DS, there are subqueries in the "with" sections that are quite complex.
This is why pipeline replacements for SQL are more or less futile efforts. They simplify simple part and avoid touching complex one.
I think that something like Verse [1] is more or less way to go. Not the Verse itself, but functional logic programming as an idea, where you can have first class data producers and effect system to specify transactions.
data_ders|6 days ago
> SQL is not a pipeline, it is a graph.
Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?
the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.
For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).
thesz|6 days ago
Yes, there will always be hard-to-express queries, the question is how far can we go?
snthpy|5 days ago
I haven't seen anyone make the point about graphs before. FWIW PRQL allows defining named subqueries that can be reused, like J1 and J2 in your example.
jnpnj|6 days ago
lloydatkinson|6 days ago
data_ders|6 days ago
thesz|6 days ago