Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not pin pandas down to patch level #3371

Closed
h-vetinari opened this issue Aug 24, 2021 · 15 comments · Fixed by #6211
Closed

Do not pin pandas down to patch level #3371

h-vetinari opened this issue Aug 24, 2021 · 15 comments · Fixed by #6211
Assignees
Labels
bug 🦗 Something isn't working dependencies 🔗 Issues related to dependencies External Pull requests and issues from people who do not regularly contribute to modin P1 Important tasks that we should complete soon

Comments

@h-vetinari
Copy link

From conda-forge/modin-feedstock#21:

@h-vetinari: Regarding the pandas pin, I understand that it is tempting to pin against the exact version being tested, but it is not scalable (what if another library pins to pandas 1.3.1, then skips to 1.3.3 in the next update?, etc). Pandas follows SemVer pretty strictly (so even minor-releases should stay fully backward compatible), and IMO pinning down to patch-level is an antipattern that should be removed upstream.

@devin-petersohn: Understandable, would you open an issue to discuss this on the Modin repo?

Xref #3369

@gshimansky
Copy link
Collaborator

These are PRs that updated pandas 1.3.0->1.3.1: #3290 and 1.3.1->1.3.2: #3343. There were some functional changes and changes in tests. Making sure that modin code works with all 1.3.x may require making it more flexible and running CI testing for all of the pandas patch-level releases.

@h-vetinari
Copy link
Author

Fair enough, where there are actual pandas-bugs that affect the test suite, patch releases make a "difference". I put this in quotes, because most times, the failing tests are just skipped or worked around, so mostly the test suite was just pretending to be compatible.

In any case, I think the least that should be done is to leave the pin open, i.e. pandas>=1.3.2. And I'm not even speaking about the environment_dev.yml here, but about the packaging in conda-forge.

@devin-petersohn
Copy link
Collaborator

Correct me if I'm wrong, but this only really becomes a problem in one of two situations:

1.) Someone wants to pin both a Modin and a pandas version to incompatible versions
2.) There is no compatible version of Modin that supports a pinned version of pandas

Is there anything else?

Fair enough, where there are actual pandas-bugs that affect the test suite, patch releases make a "difference". I put this in quotes, because most times, the failing tests are just skipped or worked around, so mostly the test suite was just pretending to be compatible.

@h-vetinari This may be a little bit of a harsh interpretation of what's happening. For regressions in pandas, rather than introduce those regressions in Modin we skip those specific tests until the release that fixes it. I would not say this is pretending to be compatible.

And I'm not even speaking about the environment_dev.yml here, but about the packaging in conda-forge.

Yes, but I believe you're also saying to remove the pin from the setup.py.

A decent portion of our users are installing from deprecated Python versions (e.g., Python 3.5), and are installing an older version of Modin. We want to ensure that those users still have a library that works on install, and leaving the pin open with a >= would not allow for that.

@h-vetinari
Copy link
Author

I did not mean to be harsh, my apologies. Phrased differently, the differences in compatibility between patch versions are miniscule (for a library following SemVer). I would argue miniscule enough that it makes no difference in practice (though users should of course also strive to be up-to-date with their versions)

The point that libraries pinning so tightly does not work on an ecosystem level when many others is your point 1.; multiplied by other libraries also pinning pandas to patch-level. The resolver is already one of the most painful parts of working with environments, and such restrictions could ultimately end up making modin uninstallable in combination with other packages.

Yes, but I believe you're also saying to remove the pin from the setup.py.

That's not necessary. I'm only speaking about the conda-forge packaging (and the warning on start-up; we could also patch this directly in conda-forge). Something like: - pandas >=1.3.0,<1.4

@devin-petersohn
Copy link
Collaborator

Something like: - pandas >=1.3.0,<1.4

@h-vetinari That would be fine with me starting in pandas 1.4 (at least I don't see any immediate issues). I think we can probably patch out the warning message as well. Would this cause any problems with Omnisci @modin-project/modin-omnisci ?

@gshimansky
Copy link
Collaborator

Just two points if we are going to support all Pandas patchlevel versions.

  1. We will need CI to test all Pandas patchlevel version in a range individually to make sure that they all work, not the latest one, e.g. 1.3.0, 1.3.1, 1.3.2, etc.
  2. Be ready to work on older releases if necessary. For example if we move on on release 0.11, but suddenly Pandas releases version 1.3.3 that doesn't work with 0.10.2, we will need to return to 0.10.2 to fix this.

@RehanSD
Copy link
Collaborator

RehanSD commented Oct 12, 2022

Hi @gshimansky @h-vetinari @vnlitvinov and @devin-petersohn has this issue been resolved?

@RehanSD RehanSD added Needs more information ❔ Issues that require more information from the reporter P3 Very minor bugs, or features we can hopefully add some day. labels Oct 12, 2022
@h-vetinari
Copy link
Author

No this issue has not been resolved. Modin still pins pins pandas down to patch level.

@h-vetinari
Copy link
Author

h-vetinari commented Nov 14, 2022

We will need CI to test all Pandas patchlevel version in a range individually to make sure that they all work, not the latest one, e.g. 1.3.0, 1.3.1, 1.3.2, etc.

This is overkill and not scalable (especially for a library/framework). You are not responsible for pandas-bugs, pandas is (and your users should be free to have the bugs in pandas fixed independently of which version you test in CI; if pandas releases a new patch release tomorrow, you're actively preventing your users from getting those fixes, because you claim you need to vet them for them first, which is paternalistic at best).

A more reasonable approach (more coverage, less CI jobs) would be to test against pandas 1.{x=current}.{z=latest for x} and 1.{y=current-1}.{z=latest for y}, and carry a dependency of pandas >=1.4,<1.6 (since x=5 ATM). And that's already a POV that distrusts pandas (and robs users of their agency), since pandas promises SemVer stability (and so even new minor releases should be fine), but that at least would be a more reasonable trade-off between (actual) stability and intrusiveness, because the API surface can change a fair bit between minor releases.

Still, realistically no-one is going to want to use 1.5.1 if 1.5.3 is available. If they so choose regardless, then they should be free to keep the pieces live with the minimal1 fallout - "consenting adults" and all that. The upshot is that not pinning so hard is what would allow modin to live peacefully alongside other libraries and their dependencies (which it currently doesn't), without creating a monstrous CI matrix here.

Footnotes

  1. since modin gets tested (and made compatible) reasonably quickly against whatever is the newest pandas (patch) version

@mvashishtha
Copy link
Collaborator

@h-vetinari I agree with you and we have consensus from a few modin contributors (including @vnlitvinov and @devin-petersohn offline) that we should stop pinning to patch version. So I think at this point we're ready for a PR.

@mvashishtha mvashishtha added dependencies 🔗 Issues related to dependencies P1 Important tasks that we should complete soon and removed Needs more information ❔ Issues that require more information from the reporter P3 Very minor bugs, or features we can hopefully add some day. labels Nov 15, 2022
@vnlitvinov
Copy link
Collaborator

I wonder if just running with >=1.5.3 would work for now, it seems to be the minimal change (and would allow users to upgrade), or should we just state >=1.5,<1.6? opinions @h-vetinari @devin-petersohn

@h-vetinari
Copy link
Author

1.5.3 doesn't exist yet, I just used that version to make my point.

I'd still prefer to just specify pandas down to only the minor version, but even so, >=1.5.1,<1.6 would be an improvement (though somewhat marginal; covering 1.4 and/or 1.6.dev would be much more substantial - though to be sure I'd already be much happier with the mentioned loosening)

@vnlitvinov
Copy link
Collaborator

Loosening the requirements to >=1.4,<1.7 (this seems to be what you're asking for) would most likely require us to do some amount of compatibility code, as we're using some internal functions (yes, we shouldn't, but, as we're trying to mimic pandas API 100%, it's usually better to use some internal function than to copy-paste its whole body), and some interfaces diverge a little (an example would be changing the default value of some parameter from None to no_default).

@h-vetinari
Copy link
Author

Fair enough if supporting more than one pandas minor version is too much effort for now (considering how closely modin sticks to pandas); again, >=1.5.1,<1.6 would already be an improvement.

@devin-petersohn
Copy link
Collaborator

At some point we have made some backwards progress on this. I agree with what has been said here and propose the following concrete steps:

  • Loosen the requirements as @h-vetinari mentions >=1.5.1,<1.6 (is 1.5.0 a problem?)
  • Take out the exception for pandas version compatibility and change it back to a warning (this probably shouldn't have been changed)
  • Consider loosening the pin even more (>=1.0?) and just stick with the warning, letting users choose to upgrade (or not)

@anmyachev anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023
@mvashishtha mvashishtha self-assigned this May 25, 2023
mvashishtha pushed a commit to mvashishtha/modin that referenced this issue May 25, 2023
mvashishtha pushed a commit to mvashishtha/modin that referenced this issue Jun 8, 2023
YarShev pushed a commit that referenced this issue Jun 9, 2023
Co-authored-by: Anatoly Myachev <[email protected]>
Signed-off-by: mvashishtha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working dependencies 🔗 Issues related to dependencies External Pull requests and issues from people who do not regularly contribute to modin P1 Important tasks that we should complete soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants