-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP 518 build requirements cannot be overriden by user #4582
Comments
Some reasons why the current behavior is bad for Numpy especially:
|
I responded on the scipy issue before seeing this one: scipy/scipy#7309 (comment) The most correct solution to the abi issue is to build against the lowest supported numpy version. Building against the currently installed version is a hack that will fail in a number of cases; I mentioned one of them there, but another is that due to pip's wheel caching feature, if you first install scipy into an environment that has the latest numpy, and then later install scipy into an environment that has an older numpy, pip won't even invoke scipy's build system the second time, it'll just install the cached build from last time. |
Yes, the ABI issue indeed can be handled with specifying the earliest numpy version. |
The lowest supported version is normally Python version dependent (now numpy 1.8.2 is lowest supported, but clearly not for Python 3.6 because 1.8.2 predated Python 3.6 by a long time). So the specification will then have to be:
I have the feeling not many projects are going to get this right .... |
That still leaves a question of what to do for a not-yet-released Python version. Would you do:
or
I'd suspect the first one, but either way you have to guess whether or not an existing version of numpy is going to work with a future Python version. You have to think about it though, if you don't specify anything for Python 3.7 then a build in an isolated venv will break (right?). So then you'd have to cut a new release for a new Python version. |
I guess the not-yet-released-Python issue is sort of the same as anything else about supporting not-yet-released-Python. When developing a library like (say) scipy, you have to make a guess about how future-Python will work, in terms of language semantics, C API changes, ... and if it turns out you guess wrong then you have to cut a new release? I'm not sure there is a really great solution beyond that. Something that came up during the PEP 518 discussions, and that would be very reasonable, was the idea of having a way for users to manually override some build-requires when building. This is one situation where that might be useful. |
It's a little different in this case - here we use E.g. if Python 3.7 breaks |
I don't think there's any way to do "use earliest compatible version" with pip; would it be something useful in this situation? |
@pradyunsg I think in principle yes. Are you thinking about looking at the PyPI classifiers to determine what "earliest compatible" is? |
TBH, I'm not really sure how this would be done. For one, I don't think we have anything other than the PyPI Classifiers for doing something this and I'm skeptical of using those for determining if pip can install a package... |
Yeah that's probably not the most robust mechanism. |
There is a way to specify earliest compatible python version in package metadata. Not the trove classifiers – those are just informational. The IPython folks worked this out because they needed to be able to tell pip not to try to install new IPython on py2. The problem with this though is that old numpy packages can't contain metadata saying that they don't work with newer python, because by definition we don't know that until after the new python is released. (Also I think the current metadata might just be "minimum python version", not "maximum python version".) |
The current metadata is not minimum or maximum, but a full version specifier which supports
Maximum versions that don't represent a version that already exist are basically impossible to get right except by pure chance. You pretty much always end up either under or over specifying things. I had always presumed that Numpy would change the wheel requiremnts to reflect the version that it was built against, so that the dependency solver then (theortically until #988 is solved) handles things to ensure there is no version incompatibility related segfaults. I think the worse case here is you end up installing something new that depends on Numpy and end up having to also install a new Numpy because now you have something that has a Am I missing something? |
@dstufft: the specific concern here is how to handle build requires (not install requires) for downstream packages like scipy that use the numpy C API. The basic compatibility fact that needs to be dealt with is: if you build scipy against numpy 1.x.y, then the resulting binary has a requirement for In the past, this has been managed in one of two ways:
But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments. So it's the second use case above, but technically it's implemented acts more like the first, except now the expert's manual judgement has been replaced by an algorithm. The policy that experts would traditionally use for building a scipy wheel was: install the latest point release of the oldest numpy that meets the criteria (a) scipy still supports it, and (b) it works on the python version that this wheel is being built for. This works great when implemented as a manual policy by an expert, but it's rather beyond pip currently, and possibly forever... And @rgommers is pointing out that if we encode it manually as a set of per-python-version pins, and then bake those pins into the scipy sdist, the resulting sdists will only support being built into wheels on python versions that they have correct pins for. Whereas in the past, when a new python version came out, if you were doing route (1) then the expert would pick an appropriate numpy version at the time of build, and if you were doing route (2) then you'd implicitly only ever build against numpy versions that work on the python you're installing against. That's why having at least an option for end users to override the pyproject.toml requirements would be useful: if you have a scipy sdist that says it wants |
@njsmith I don't understand why it's bad for SciPy to implicitly get built against a newer NumPy though. When we install that build SciPy anything already installed will still work fine, becuase NumPy is >= dependency and a newer one is >= an older one, and we'll just install a newer NumPy when we install that freshly built SciPy to satisify the constraint that SciPy's wheel will have for a newer NumPy. |
Sorry to butt in here, but are we? I don't see that at all as what's happening. I would still expect the vast majority of installs to be from published wheels, built by the project team by their experts (your item 1). The move to pyproject.toml and PEP 517 allows projects to use alternative tools for those builds, which hopefully will make those experts' jobs easier as they don't have to force their build processes into the setuptools mould if there's a more appropriate backend, but that's all. It's possible that the changes we're making will also open up the possibility of building their own installation to people who previously couldn't because the setuptools approach was too fragile for general use. But those are people who currently have no access to projects like scipy at all. And it's possible that people like that might share their built wheels (either deliberately, or via the wheel cache). At that point, maybe we have an issue because the wheel metadata can't encode enough of the build environment to distinguish such builds from the more carefully constructed "official" builds. But surely the resolution for that is simply to declare such situations as unsupported ("don't share home-built wheels of scipy with other environments unless you understand the binary compatibility restrictions of scipy"). You seem to be saying that |
To be clear, I understand why it's bad for that to happen for a wheel you're going to publish to PyPI, because you want those wheels to maintain as broad of compatibility as possible. But the wheels that pip is producing implicitly is generally just going to get cached in the wheel cache for this specific machine. |
That's the whole point of this issue, that wheel built on a user system can now easily be incompatible with the numpy already installed on the same system. This is because of build isolation - pip will completely ignore the one already installed, and build a scipy wheel against a new numpy that it grabs from pypi in its isolated build env. So if Hence @njsmith points out that an override to say something like
would be needed. |
@rgommers But why can't pip just upgrade the NumPy that was installed to match the newer version that the SciPy wheel was just built against? I'm trying to understand the constraints where you're able to install a new version of SciPy but not a new version of NumPy. |
It can, but currently it won't. The
For the majority of users this will be fine. Exceptions are regressions in numpy, or (more likely) not wanting to upgrade at that point in time due to the extra regression testing required. |
Agreed that in general we are not moving in that direction. That third scenario is becoming more prominent though when we're moving people away from
Totally agreed that PEP 517 and the direction things are moving in is a good one. The only thing we’re worried about here is that regression for build isolation - it’s not a showstopper, but at least needs an override switch for things in the pyproject.toml |
For SciPy and other things that link against NumPy it probably should be right? I understand that in the past it was probably painful to do this, but as we move forward it seems like that is the correct thing to happen here (independent of is decided in pip) since a SciPy that links against NumPy X.Y needs NumPy>=X.Y and X.Y-1 is not acceptable.
To be clear, I'm not explicitly against some sort of override flag. Mostly just trying to explore why we want it to see if there's a better solution (because in general more options adds conceptual overhead so the fewer we have the better, but obviously not to the extreme where we have no options). One other option is for people who can't/won't upgrade their NumPy to switch to building using the build tool directly and then provide that wheel using find-links or similar. I'm not sure which way I think is better, but I suspect that maybe this might be something we would hold off on and wait and see how common of a request it ends up being to solve this directly in pip. If only a handful of users ever need it, then maybe the less user friendly but more powerful/generic mechanism of "directly take control of the build process and provide your own wheels" ends up winning. If it ends up being a regular thing that is fairly common, then we figure out what sort of option we should add. |
Yeah, scipy and other packages using the numpy C API ought to couple their numpy install-requires to whichever version of numpy they're built against. (In fact numpy should probably export some API saying "if you build against me, then here's what you should put in your install-requires".) But that's a separate issue. The pyproject.toml thing is probably clearer with some examples though. Let's assume we're on a platform where no scipy wheel is available (e.g. a raspberry pi). Scenario 1
Before pyproject.toml: this fails with an error, "You need to install numpy first". User has to manually install numpy, and then scipy. Not so great. After pyproject.toml: scipy has a build-requires on Scenario 2
Before pyproject.toml: scipy is automatically built against the installed version of numpy, all is good After pyproject.toml: scipy is automatically built against whatever version of numpy is declared in pyproject.toml. If this is just OTOH, you can configure pyproject.toml like Scenario 3
Before pyproject.toml: You have to manually install numpy, and you might have problems if you ever try to downgrade numpy, but at least in this simple case all is good After pyproject.toml: If scipy uses OTOH, if scipy uses SummaryScipy and similar projects have to pick how to do version pinning in their |
Maybe we should open a separate issue specifically for the idea of a
|
I don't think we need a new issue, I think this issue is fine I'll just update the title because the current title isn't really meaningful I think. |
Agreed that the original title was not meaningful, but there are two conceptually distinct issues here. The first is that pyproject.toml currently causes some regressions for projects like scipy – is there anything we can/should do about that? The second is that hey, user overrides might be a good idea for a few reasons; one of those reasons is that they could mitigate (but not fully fix) the first problem. Maybe the solution to the first problem is just that we implement user overrides and otherwise live with it, in which case the two discussions collapse into one. But it's not like we've done an exhaustive analysis of the scipy situation and figured out that definitely user overrides are The Solution, so if someone has a better idea then I hope they'll bring it up, instead of thinking that we've already solved the problem :-) |
@njsmith It's interesting to me that you think that I suspect that for a hypothetical It also suffers from the same problem that a lot of our CLI options like this tend to hit, which is there isn't really a user friendly way to specify it. If you have At some point the answer becomes "sorry your situation is too complex, you're going to have to start building your own wheels and passing them into --find-links" but at a basic level parameterizing options by an individual package inside the entire set of packages is still somewhat of an unsolved problem in pip (and so far each attempt to solve it has been met with user pain). So part of my... hesitation, is that properly figuring out the right UX of such a flag is non trivial and if we don't get the UX to be better than the base line of building a wheel and chucking it into a wheelhouse then it's a net negative. |
Regarding the part of this problem that is blocked by the lack of a proper dependency resolver for pip: the beta of the new resolver is in pip 20.2 and we aim to roll it out in pip 20.3 (October) as the default. So if the new resolver behavior helps this problem (or makes it worse) now would be a good time to know. |
I think we're hitting this issue as well. We have an in-house package whose code is compatible with NumPy version 1.11.2 and up. We need to maintain some legacy remote production environments where we can't upgrade NumPy up from 1.11.2, but in other environments we want to stay up-to-date with newest NumPy. In our package, we migrated to using [build-system]
requires = ["Cython", "numpy", "setuptools>=40.8.0", "wheel>=0.33.6"] When building the package for the legacy environment, we use one this constraints file:
For modern environments we have e.g.
When running tests in CI for our package, we do the equivalent of either pip install --constraint constraints.legagy.txt --editable .
pytest or pip install --constraint constraints.new.txt --editable .
pytest However, in both cases the newest NumPy available is installed and compiled against, and running our package in the old environment miserably fails:
What we would like pip to do is respect the pinned versions from |
To be clear, pip never supported overriding dependencies anywhere, either build or run-time. The “trick” people used to use depends on a quirky behaviour of pip’s current (soon legacy) dependency resolver that should (eventually) go away. In that sense, it makes perfect sense that requirements specified from the command line does not override build dependencies in Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies There are more than one way to solve the build ABI issue, and introducing dependency overriding for it feels like falling into the XY problem trap to me. Dependency overriding is a much more general problem, and whether that should be possible (probably yes at some point, since pip is progressively making the resolver stricter, and people will need an escape hatch eventually) is an entirely other issue, and covered in other discussions. |
+1 this is a healthy idea in general, and I don't see serious downsides. Note that for |
Something like the situations discussed here has happened today -- setuptools has started rejecting invalid metadata and users affected by this have no easy workarounds. @jaraco posted #10669, with the following design for a solution.
|
Some more thoughts I’ve had during the past year on this idea. Choosing a build dependency matching the runtime one is the easy part; the difficult part is the runtime dependency version may change during resolution, i.e. when backtracking happens. And when that happens, pip will need to also change the build dependency, because there’s no guarantee the newly changed runtime dependency has ABI compatibility with the old. And here’s where the fun part begins. By changing the build dependency, pip will need to rebuild that source distribution, and since there’s no guarantee the rebuild will have the same metadata as the previous build, the resolver must treats the two builds as different candidates. This creates a weird these-are-the-same-except-not-really problem that’s much worse than PEP 508 direct URL, since those builds likely have the same name, version (these two are easy), source URL (!) and wheel tags (!!) It’s theoratically all possible to implement, but the logic would need a ton of work.
And to come back to the “change the build dependency” thing. There are fundamentally two cases where an sdist’s build dependencies need to be overridden:
|
I'm not sure I agree with that. Yes, it's technically true that things could now break - but it's a corner case related to the ABI problem, and in general
A few thoughts I've had on this recently:
|
I agree it should mostly work without the rebuilding part, but things already mostly work right now, so there is only value to doing anything for the use case if we can go beyond mostly and make things fully work. If a solution can’t cover that last mile, we should not persue it in the first place because it wouldn’t really improve the situation meaningfully. I listed later in the previous comment the two scenarios people generally want to override metadata. The former case is what “mostly works” right now, and IMO we should either not do anything about it (because what we already have is good enough), or persue the fix to its logical destination and fix the problem entirely (which requires the resolver implementation I mentioned). The latter scenario is what we don’t currently have a solution that even only “mostly” works, unlike the former, so there’s something to be done, but I’m also arguing that something should not be directly built into pip entirely. |
Looking at this issue and the similar one reported in #10731, are we looking at this from the wrong angle? Fundamentally, the issue we have is that we don't really support the possibility of two wheels, with identical platform tags, for the same project and version of that project, having different dependency metadata. It's not explicitly covered in the standards, but there are a lot of assumptions made that wheels are uniquely identified by name, version and platform tag (or more explicitly, by the wheel filename). Having scipy wheels depend on a specific numpy version that's determined at build time, violates this assumption, and there's going to be a lot of things that break as a result (the pip cache has already been mentioned, as has portability of the generated wheels, but I'm sure there will be others). I gather there's an If we want to properly address this issue, we probably need an extension to the metadata standards. And that's going to be a pretty big, complicated discussion (general dependency management for binaries is way beyond the current scope of Python packaging). Sorry, no answers here, just more questions 🙁 Footnotes
|
I think being able to provide users with a way to say "I want all my builds to happen with setuptools == 56.0.1" is worthwhile; even if we don't end up tackling the binary compatibility story. That's useful for bug-for-bug compatibility, ensuring that you have deterministic builds and more. I think the "fix" for the binary compatibility problem is complete rethink of how we handle binary compatibility (which is a lot of deeply technical work) which needs to pass through our standardisation process (which is a mix of technical and social work). And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs. If there is interest and we think the value is sufficient, I'm afraid I'm still not quite sure how tractable the problem even is and where we'd want to draw the line of what we want to bother with. I'm sure @rgommers, @njs, @tgamblin and many other folks will have thoughts on this as well. They're a lot more familiar with this stuff than I am. As for the pip caching issue, I wonder if there's some sort of cache busting that can be done with build tags in the wheel filename (generated by the package). It won't work for PyPI wheels, but it should be feasible to encode build-related information in the build tag, for the packages that people build themselves locally. This might even be the right mechanism to try using existing semantics of toward solving some of the issues. Regardless, I do think that's related but somewhat independent of this issue. |
To be clear, build tags are a thing in the existing wheel file format: https://www.python.org/dev/peps/pep-0427/#file-name-convention |
@pfmoore those are valid questions/observations I think - and a lot broader than just this build reqs issue. We'd love to have metadata that's understood for SIMD extensions, GPU support, etc. - encoding everything in filenames only is very limiting.
This is true, but it's also true for runtime dependencies - most users won't know how that works or if/when to override them. I see no real reason to treat build and runtime dependencies in such an asymmetric way as is done now.
Agreed. It's not about dependency management of binaries though. There are, I think, 3 main functions of PyPI:
This mix of binaries and from-source builds is the problem, and in particular - also for this issue - (3) is what causes most problems. It's naive that we expect that from-source builds of packages with complicated dependencies will work for end users. This is obviously never going to work reliably when builds are complex and have non-Python dependencies. An extension of metadata alone is definitely not enough to solve this problem. And I can't think of anything that will really solve it, because even much more advanced "package manager + associated package repos" where complete metadata is enforced don't do both binary and from-source installs in a mixed fashion.
I have an interest, and some budget, for thoroughly documenting all the key problems that we see for scientific & data-science/ML/AI packages in the first half of next year. In order to be at least on the same page about what the problems are, and can discuss which ones may be solvable and which ones are going to be out of scope.
agreed |
I agree that being able to override build dependencies is worthwhile, I just don't think it'll necessarily address all of the problems in this space (e.g., I expect we'll still get a certain level of support questions from people about this, and "you can override the build dependencies" won't be seen as an ideal solution - see #10731 (comment) for an example of the sort of reaction I mean).
Hmm, yes, we might be able to use them somehow. Good thought.
I think it's a significant issue for some of our users, who would consider it justified. The problem for the pip project is how we spend our limited resources - even if the packaging community1 develops such a standard, should pip spend time implementing it, or should we work on something like lockfiles, or should we focus on critically-needed UI/UX rationalisation and improvement - or something else entirely?
Agreed. This is something I alluded to in my comment above about "UI/UX rationalisation". I think that pip really needs to take a breather from implementing new functionality at this point, and tidy up the UI. And one of the things I'd include in that would be looking at how we do or don't share options between the install process and the isolated build environment setup. Sharing requirement overrides between build and install might just naturally fall out of something like that. But 🤷, any of this needs someone who can put in the work, and that's the key bottleneck at the moment. Footnotes
|
/cc @s-mm since her ongoing work has been brought up in this thread! |
I think this is relevant as we (well, mostly @alalazo and @becker33) wrote a library and factored it out of Spack -- initially for CPU micro-architectures (and their features/extensions), but we're hoping GPU ISA's (compute capabilities, whatever) can also be encoded. The library is
We have gotten some vendor contributions to More here if you want the gory details: archspec paper |
Happy to talk about how we've implemented "solving around" already-installed stuff and how that might translate to the pip solver. The gist of that is in the PackagingCon talk -- we're working on a paper on that stuff as well and I could send it along when it's a little more done if you think it would help. I think fixing a particular package version isn't actually all that hard -- I suspect you could implement that feature mostly with what you've got. The place where things get nasty for us are binary compatibility constraints -- at the moment, we model the following on nodes and can enforce requirements between them:
The big thing we are working on right now w.r.t. compatibility is compiler runtime libraries for mixed-compiler (or mixed compiler version) builds (e.g., making sure libstdc++, openmp libraries, etc. are compatible). We don't currently model compilers or their implicit libs as proper dependencies and that's something we're finally getting to. I am a little embarrassed that I gave this talk on compiler dependencies in 2018 and it took a whole new solver and too many years to handle it. The other thing we are trying to model is actual symbols in binaries -- we have a research project on the side right now to look at verifying the compatibility of entry/exit calls and types between libraries (ala libabigail or other binary analysis tools). We want to integrate that kind of checking into the solve. I consider this part pretty far off at least in production settings, but it might help to inform discussions on binary metadata for Anyway, yes we've thought about a lot of aspects of binary compatibility, versioning, and what's needed as far as metadata quite a bit. Happy to talk about how we could work together/help/etc. |
Thanks @tgamblin. I finally read the whole paper - looks like amazing work. I'll take any questions/ideas elsewhere to not derail this issue; it certainly seems interesting for us though, and I would like to explore if/how we can make use of it for binaries of NumPy et al. |
FTR one workaround that hasn't been mentioned in the thread is supplying a constraints file set via the |
If the target computer already has a satisfactory version of numpy, then the build system should use that version. Only if the version is not already installed should pip use an isolated environment.
Related: scipy/scipy#7309
The text was updated successfully, but these errors were encountered: