top | item 36872444

(no title)

mikea1 | 2 years ago

> the pipelined nature of PRQL really maps much better to how people should think about queries

I disagree. Database engines take SQL and transform it into an execution plan that takes into consideration database metadata (size, storage, index analytics, etc.). Queries should be thought of with a _set based_ instead of _procedural_ approach to maximize the benefits of this abstraction - diving into the implementation details to guide the execution plan formation only when necessary.

Also, the pipeline approach could be achieved with common table expressions (CTEs), right?

That said, I think PRQL looks promising because it is a solid attempt to make RDBMS development more approachable. I also like that `from` comes before `select`: it is far more readable. A solid and modern IDE experience for PRQL could be a "killer app".

discuss

order

wodenokoto|2 years ago

I disagree. I find it extremely hard to reason about large queries as set transformations, whereas it is much easier to break it down to "first this, then that". And this is long before I've even started writing my first line of SQL.

So let me write it procedurally and have the optimization engine fix it for me, just like how it fixes my SQL.

Even SQL queries are often better understood procedurally. Take this one [1]:

    SELECT article, dealer, price
    FROM   shop s1
    WHERE  price=(SELECT MAX(s2.price)
                  FROM shop s2
                  WHERE s1.article = s2.article)
    ORDER BY article;
That inner WHERE clause doesn't make sense in my opinion, unless you think of it procedurally as for each row in s1, ask do a search for the highest price amongst all items that share article number.

[1] https://dev.mysql.com/doc/refman/8.0/en/example-maximum-colu...

hn_throwaway_99|2 years ago

Completely agree, thanks for putting it better than I could have, with an excellent example. Correlated subqueries like the example you give, or similarly lateral joins in postgres, fundamentally are treated like for loops by DB engines anyway.

Semi-related, but the example you give is also why I love Postgres' "DISTINCT ON" functionality (I don't know if other DBs have something similar) - it makes it so much easier to reason about these "give me the 'first' one from each group" type queries without having to resort to correlated subqueries.

slt2021|2 years ago

queries like these are best suited for window functions, although I am not sure Mysql supports it:

  SELECT article, dealer, price FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY article ORDER BY price DESC) as rnk
    FROM   shop s1
  ) sub 
  WHERE sub.rnk=1
  ORDER BY article; 
this query will be a single pass over table without loops/joins

tgulacsi|2 years ago

This is the "set based" approach for the MAX: there does not exist a bigger element:

  SELECT article, dealer, price
    FROM   shop s1
    WHERE  NOT EXISTS (SELECT 1 FROM shop s2 
                         WHERE s2.price > s1.price AND
                               s2.article = s2.article)
    ORDER BY article;

imtringued|2 years ago

Unpopular opinion.

The uncorrelated example should have been rewritten with a CTE and should have been aliased as 'article_max_price' as if it was a computed property and where price = amp.price