-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
column names of aggregated DataFrame with anonymous functions #1276
Comments
I was able to reproduce this julia> using DataFrames
julia> d = DataFrame(g = [1,1,2,2], v=1:4)
4×2 DataFrames.DataFrame
│ Row │ g │ v │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 2 │
│ 3 │ 2 │ 3 │
│ 4 │ 2 │ 4 │
julia> println(names(aggregate(d, :g, x->sum(x))))
Symbol[:g, Symbol("v_#1")]
julia> println(names(aggregate(d, :g, x->sum(x))))
Symbol[:g, Symbol("v_#3")] The names of the columns are the function identifiers. Here's another fresh session to show the identifiers of the anonymous functions, which you'll see match the column names, and are again reproducible. julia> x->sum(x)
(::#1) (generic function with 1 method)
julia> x->sum(x)
(::#3) (generic function with 1 method)
The previously used lambda syntax has no relation (aside from order) to the actual functions that were used to create the data and hence was removed. Unfortunately, it is not yet possible to extract the original code of an anonymous function by its identifier (see JuliaLang/julia#2625 (comment)), although in principle if that is added as a language feature then using the identifiers of the anonymous functions will provide both a stable identifier as well as a way to recover the function associated with that identifier. Currently, only the "stable identifier" part is supported while the lambda syntax cannot support either. If you would like the columns to have specific names, you simply need to use named functions. For example, in the case that you provided using the anonymous function isn't recommended and you can simply use the julia> names(aggregate(d, :g, sum))
2-element Array{Symbol,1}:
:g
:v_sum alternatively, you can retain the lambda naming by giving your anonymous functions that name julia> λ1(x) = sum(x)
λ1 (generic function with 1 method)
julia> names(aggregate(d, :g, λ1))
2-element Array{Symbol,1}:
:g
:v_λ1 You are correct that the documentation for that section is out of date, it will be corrected after #1252 is merged. The |
The presence of a |
You got the point - the To change it to something else (e.g. |
So you mean, just use |
Closing as #1576 fixed this |
When using an anonymous functions to generate an aggregated DataFrame the column names are not reproducible and inconvenient for further usage:
This code produces:
although the last two commands are identical.
The documentation is promising names like
v_\lambda1
at this point. I am using Julia v0.6.0.IMO the function
_fnames
in https://github.com/JuliaData/DataFrames.jl/blob/master/src/other/utils.jl should be adjusted for new versions of julia.The text was updated successfully, but these errors were encountered: