top | item 39474727

(no title)

cmdlineluser | 2 years ago

Nice article.

In Python, I have been finding Polars nicer to use:

  (purchases
     .filter(pl.col("amount") <= pl.col("amount").median().over("country") * 10)
     .group_by("country")
     .agg(total = (pl.col("amount") - pl.col("discount")).sum())
  )
Not as compact as the R example but gets a bit closer compared to the pandas approach.

- https://pypi.org/project/polars/

- https://github.com/pola-rs/polars/

discuss

order

d0mine|2 years ago

Why not SQL for pure declarative queries? Here's llm-hallucinated sql query of the polars example:

    SELECT country, SUM(amount - discount) AS total
    FROM purchases
    WHERE amount <= (
        SELECT MEDIAN(amount) * 10
        FROM purchases
        WHERE country = purchases.country
    )
    GROUP BY country;
It might be just an issue of familiarity but sql seems the most straightforward and easy to understand for me.

anakaine|2 years ago

Probably because the article wasn't about comparing to SQL, or any other database, but rather looked at the R vs Python debate specifically?

d0mine|2 years ago

It looks like llm hallucinated the query that doesn't group by country to get the median. Here's version generated after asking to fix it:

    SELECT p.country, SUM(p.amount - p.discount) AS total
    FROM purchases p
    JOIN (
        SELECT country, MEDIAN(amount) *  10 AS median_amount
        FROM purchases
        GROUP BY country
    ) m ON p.country = m.country
    WHERE p.amount <= m.median_amount
    GROUP BY p.country;

wodenokoto|2 years ago

You get into a lot of other problems that are straightforward in pandas/R but very difficult in SQL.