top | item 35432982

(no title)

brahbrah | 2 years ago

So something like this?

    def add(df1, df2, meta_cols, val_cols=None):
        # join on meta cols
        # add val cols (default to all non meta cols if None)
        # return df with all meta and val cols selected
In theory I think that's fine. The problem is that in practice this will cause a lot of visual noise in your models, since for every operation you would need to specify, at least, your meta columns, and potentially value columns too. If you change the dimensionality of your data, you would need to update everywhere you've specified them. You could get around this a bit by defining the meta columns in a constant, but that's really only maintainable at a global module level. Once you start passing dfs around, you'll have to pass the specified columns as packaged data around with the df as well. There's also the problem that you'd need to use functions instead of standard operators.

One thing that would be nice to do is set an (and forgive me, I understand the aversion to the word "index") index on the polars dataframe. Not a real index, just a list of columns that are specified as "metadata columns". This wouldn't actually affect any internal state of the data, but what it would do is affect the path of certain operations. Like if an "index" is set, then `+` does the join from above, rather than the current standard `+` operation.

In any case I realize this is a major philosophical divergence from the polars way of thinking, so more just shooting shit than offering real suggestions.

discuss

order

No comments yet.