-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions for column- and row-wise processing #956
Comments
@cjprybol, do you think we cover this pretty well now? |
I think these would be nice to have, thank you for the suggestion @abbradar! I couldn't think of an obvious way to do this without writing a for-loop, which I imagine would be alienating for users who prefer to use
So, 👍 If you are still interested in opening a PR with these changes @abbradar, please do so! |
I'm not sure this would be a good idea, as these functions would encourage users to write vectorized code when an in-place element-wise operation would be possible. Julia is much powerful than R and Pandas (which require vectorized functions for performance) in that regard, so we don't need to implement the same APIs. Can you give examples of cases where you would like to use these functions? That would be helpful to see what would be the best API to do this both conveniently and efficiently. |
@cjprybol I have very little time on my hands lately but I'm interested. Just don't expect anything in a month at least :D (if anybody else wants to implement this, I'm of course okay!) @nalimilan I can't find an example right now but I bet it was something similar to:
except I don't like for-loops. P.S. Notice that my Julia is a bit rusty now so this code might not work -- but I hope you got the idea. |
Thanks for the example. For this kind of use case, a more efficient in-place version can currently be written like this (assuming the columns are already floating point): df = DataFrame(a=[1.0, 2.0], b=[3.0, 4.0])
foreach(col -> scale!(col[2], 1/sum(col[2])), eachcol(df))
# Or:
foreach(col -> col[2] .= col[2] ./ sum(col[2]), eachcol(df)) Of course it's not ideal. I guess something like # In-place
mapcol!(col -> scale!(col, 1/sum(col)), df)
mapcol!(col -> col .= col ./ sum(col), df)
# Copying
mapcol!(col -> col/sum(col)), df) So maybe that would be useful. It's annoying that the in-place version is longer than the copying version, which means people will probably use the latter by default even if it's less efficient. Yet another approach would be to use the # In-place
broadcast!(/, df, df, colwise(sum, df))
df .= df ./ colwise(sum, df)
# Copying
df2 = df ./ colwise(sum, df) Both approaches could be implemented at the same time of each as its merits. |
It seems like a
This is also useful because it reduces the urge to treat a dataframe like a matrix, since I will try to put together a PR. Though I wonder if this implementation is most efficient with an |
Maybe define |
Let's move this discussion to #1459. Are you giving the go-ahead for |
I'm not sure. I think I'd prefer that we develop a comprehensive API proposal for this whole issue of functions operating on columns vs. on rows, and whether we want to use the same API as matrices or something completely different. That's very similar to the question of whether |
We now have |
Now that |
Hi,
It would be nice to have a family of functions for row- and column-wise processing of a
DataFrame
. This can be useful e.g. for various normalizing operations. What I envision:, where
f
isArray -> Array
. They would be trivial to implement, but also very useful. If we agree on details (i.e. names and general interest in this) I can provide a PR. Similar functions exist in R and Python's pandas, butDataFrame
is always two-dimensional so I think two distinct functions would serve us better. Example implementation ofmapcol!
to show my idea (credits to Ismael-VC from Julia's Gitter room where I asked if there is already such a function):Last but not least, I'm a newcomer and may have just missed some way that already exists to do this. If so, I apologize!
EDIT: A little bikeshedding: perhaps we want them to be named
apply*
, notmap*
.The text was updated successfully, but these errors were encountered: