Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't guarantee left join ordering. #19576

Open
ritchie46 opened this issue Nov 1, 2024 · 4 comments
Open

Don't guarantee left join ordering. #19576

ritchie46 opened this issue Nov 1, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@ritchie46
Copy link
Member

Description

This shouldn't have been guaranteed, but left as an implementation detail.

A left join preserves the row order of the left DataFrame.

Link

No response

@ritchie46 ritchie46 added the documentation Improvements or additions to documentation label Nov 1, 2024
@ritchie46 ritchie46 added this to the 2.0.0 milestone Nov 1, 2024
@s-banach
Copy link
Contributor

s-banach commented Nov 1, 2024

Hoo boy, this one is going to break some code.

@orlp
Copy link
Collaborator

orlp commented Nov 1, 2024

There should be a preserve_order attribute added, defaulting to None, which can be set to "left" or "right".

@s-banach Without breaking this promise the streaming join will be slow by default, because you can't do a partitioned join if you must preserve order. Or at least, it would require a slow re-combining and re-sorting step afterwards.

And if order is preserved we can't switch which side of the join is a build and probe side either, in streaming. That's something we'd like to be able to do in the future as you'd much rather have a small build side.

@orlp
Copy link
Collaborator

orlp commented Nov 4, 2024

I think we can already add this preserve_order parameter and implement it before 2.0 hits.

@ritchie46
Copy link
Member Author

Yes, maintain_order it's called then. We already use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants