-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] preserve_order for apply_rows and merge #4997
Comments
We've had numerous conversations about sorting in joins and every time we've come to the conclusion that we do not want to sort by default. This will not change without strong community feedback from a large group of users, so if you require sorted output from joins I would suggest you pass
|
See #1781 |
re: I'll keep an eye open for apply_rows, ended up backing out the most recent code that had the issue again (0.13). |
That is not true... What was asked of the Pandas developers is what was the expected behavior of combining defining both join column keys as well as index level(s) which Pandas seemed to have very strange behavior for. The sorting has always consistently been on the join keys in the order you specified. Given there's no reproducer and I don't see a way that the order is not being preserved with |
Just did a quick test for the
=>
Some reason I thought it was doing |
Is your feature request related to a problem? Please describe.
x.apply_rows
andx.merge(how=left)
do not preserve the order ofx
. This is often bug incurring and workarounds cause performance drops. We'll often want to take an output col and append to another df, and thus have to undo the damage. Presumably, it'd be way faster to do the op deeper within the lib.Current
merge
does sort some really twisted sort semantics from pandas, but it has nothing to do with any of our actual uses in a lot of code, and appears to be just sucking up developer time for everyone at this point.Describe the solution you'd like
Add a kw arg
preserve_order
with defaultTrue
. Speed demons can explicitly flippreserve_order=False
if they're ok with adding non-determinism to their output.Sorted:
x.merge(y, how='left', on='id')
Unsorted:
x.merge(y, how='left', on='id', preserve_order=False
Same thing for
apply_rows
Describe alternatives you've considered
sort_by
, but harder to capture order-preserving.The text was updated successfully, but these errors were encountered: