top | item 45127717

(no title)

lucasyvas | 5 months ago

I understand the user pool comment but don’t understand why you wouldn’t be able to have a rust layer that’s the same as the Python one API-wise.

I say this as a user of neither - just that I don’t see any inherent validity to that statement.

If you are saying Rust consumers want something lower level than you’re willing to make stable, just give them a higher level one and tell them to be happy with it because it matches your design philosophy.

discuss

order

bobbylarrybobby|5 months ago

The issue with Rust is that as a strict language with no function overloading (except via traits) or keyword arguments, things get very verbose. For instance, in python you can treat a string as a list of columns as in `df.select('date')` whereas in Rust you need to write `df.select([col('date')])`. Let's say you want to map a function over three columns, it's going to look something like this:

``` df.with_column( map_multiple( |columns| { let col1 = columns[0].i32()?; let col2 = columns[1].str()?; let col3 = columns[3].f64()?; col1.into_iter() .zip(col2) .zip(col3) .map(|((x1, x2), x3)| { let (x1, x2, x3) = (x1?, x2?, x3?); Some(func(x1, x2, x3)) }) .collect::<StringChunked>() .into_column() }, [col("a"), col("b"), col("c")], GetOutput::from_type(DataType::String), ) .alias("new_col"), ); ```

Not much polars can do about that in Rust, that's just what the language requires. But in Python it would look something like

``` df.with_columns( pl.struct("a", "b", "c") .map_elements( lambda row: func(row["a"], row["b"], row["c"]), return_dtype=pl.String ) .alias("new_col") ) ```

Obviously the performance is nowhere close to comparable because you're calling a python function for each row, but this should give a sense of how much cleaner Python tends to be.

quodlibetor|5 months ago

> Not much polars can do about that in Rust

I'm ignorant about the exact situation in Polars, but it seems like this is the same problem that web frameworks have to handle to enable registering arbitrary functions, and they generally do it with a FromRequest trait and macros that implement it for functions of up to N arguments. I'm curious if there are were attempts that failed for something like FromDataframe to enable at least |c: Col<i32>("a"), c2: Col<f64>("b")| {...}

https://github.com/tokio-rs/axum/blob/86868de80e0b3716d9ef39...

https://github.com/tokio-rs/axum/blob/86868de80e0b3716d9ef39...