(no title)
ryanmonroe | 4 years ago
mutate(df, b = .env$a + 1)
And if you have a string (contained in a_var) which identifies a variable you can do mutate(df, b = .data[[a_var]] + 1)
You could argue these feel clumsy, but I wouldn’t say it’s “hard” to do either of these things with dplyr.
krumbie|4 years ago
However, both patterns are another special case how identifiers are resolved in the expression. Aren't `.env` and `.data` both valid variable and column names? So what happens if I have a column named `.data`?
Another example, which is the reason why we chose the `:column` style to refer to columns in `DataFramesMeta.jl` and `DataFrameMacros.jl`:
What happens if you have the expression `mutate(df, b = log(a))`. Both `log` and `a` are symbols, but `log` is not treated as a column. Maybe that's because it's used in a function-like fashion? Maybe because R looks at the value of `log` and `a` in their scope and sees that `log` is a function an `a` isn't?
In Julia DataFrames, it's totally valid to have a column that stores different functions. With the dplyr like syntax rules it would not be possible to express a function call with a function stored in a column, if the pattern really is that function syntax means a symbol is not looked up in the dataframe anymore.
In Julia DataFrameMacros.jl for example, if you had a column named `:func` you could do `@transform(df, :b = :func(:a))` and it would be clear that `:func` resolves to a column.
This particular example might seem like a niche problem, but it's just one of these tradeoffs that you have to make when overloading syntax with a different meaning. I personally like it if there's a small rule set which is then consistently applied. I'd argue that's not always the case with dplyr.
ryanmonroe|4 years ago
Personally I'll happily take not being able to use those as column names if it means I can avoid always typing : before every in-data variable, but your comment gave me a better understanding of why it would be bad for some other person or scenario, perhaps where short term ease-of-use is lower on the list of priorities.
For your second example, it doesn't come up in R because a data frame column cannot be a function. Columns must be vectors (including lists) and you could have a vector where one or all elements are functions, but the column itself cannot not be a function (functions are not vectors), so there's no ambiguity there. To call a function stored in your data frame you'd have to access an element of the column, and any access method, e.g. `[[` or `$` would make the resulting set of characters invalid as the name of an object (without backticks, which would then disambiguate the intent)
Separate from dplyr, in R when you use `(` to call a function it searches only for functions by that name.pdeffebach|4 years ago