-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate --extra-index-url argument because of vulnerable definition #9612
Comments
More info: The CVE is currently marked as "disputed", because this is the "intended behavior". I suppose the remarks by I wonder... given the fact that we have now seen large-scale exploitation of this problem, including at companies supposedly employing the world's best engineers, would this a good time to re-evaluate that position? |
My comments on that bug are about Red Hat's reaction to this. I think the place to fix the issue is here and that downstream redistributors shouldn't patch and alter pip's behavior. |
Many seem to simplify Alex’s research to “ A better approach to the issue, in my opinion, is to
Also note that few (none?) of the core pip maintainers are in the position to do any of the above (except almost anyone can do the first if they want to, pip maintainers or not), since the issue is, by its nature, in direct conflict to the tool that is designed to encourage community code sharing. |
Actually I've not proposed deletion for My opinion is that's really important to show users security warnings about vulnerable design of |
I suggest that anyone who is genuinely interested in improving things in this area starts by doing some user research into how people use pip's existing index options. Until we have the underlying information, we cannot realistically find a good solution that handles how people want to use pip, while avoiding the situation where we make a change to "fix" this issue and in the process we break other workflows that weren't vulnerable to the problem in the first place (e.g., they read the documentation, and acted on it...) In particular, spewing out warnings on every use of |
The documentation does in no way explicitly indicate that you're about to set yourself up for a huge security issue. Sure, with careful analysis one might understand the implications of one's actions, but the article by mr. Birsan makes clear precisely that many people are not able to connect the dots in practice. This is true for both the location linked above, and the inline In an attempt to move the discussion forward, I'd say the underlying problem with using This is a good fit for when there are multiple repositories which are trusted to be basically mirrors of each other, perhaps differing in details such as the availability of wheels or expected repository uptime. Unfortunately, using In terms of a solution, I would say
*mixing fully private and fully public repositories is not the only use case that breaks the implicit assumptions of |
As a data-point in the above-mentioned "user research", the discussion about unsquatting casatools shows that for some people (in this case: a pypa contributor), even after the problem has been pointed out to them, they persist in believing that "extra" index is a fallback that will only be used when no package is found in the "main" index. I would take that as evidence that the existing documentation (including the names of the parameters) is insufficiently clear. |
Maybe it's also good to enumerate the legitimate use cases more explicitly. One that has not been explicitly mentioned in the above yet is something like piwheels Note that (provided that you trust piwheels) the implicit assumptions in the above hold, because all piwheels does is mirroring your packages for an architecture that is often not supported on pypi. |
PRs with documentation improvements are always welcome 🙂 |
I guess this feature is used a lot to combine a private index with the public pypi. But like @vanschelven said, it's a bad idea to mix sources of different trust levels together (the same thing you do with a virtual repository that relays pypi and adds packages). |
The main designed usage of I think the main misunderstanding people have is to think |
I wonder whether it is possible to come up with good heuristics to detect misuse of 2 questions remain:
|
Just to clarify: this would remediate the issue when configuring the "main index" to be the private one, and the "extra index" to be the public one. |
I thought about having a heuristic as well, but didn’t manage to come up with a working logic (hopefully someone else would be able to). The main issue is that the “extra file index” and “extra name index” usages are almost in direct conflict with each other by design. I think it’s a good idea to emit a warning (and probably turn it into an error after a deprecation period) when a package exists only on an extra index and not the main one, but then we’ll need to offer an alternative to the usage. I’m wondering whether it’d make sense to turn |
+1 on this |
While this fixes the issue of "vulnerable definition", I would still argue, that it could be beneficial to introduce something that also alles to use multiple indexes without the risk of the aforementioned dependency confusion. I get that this is the intended behavior, but tbh it does not feel like a good solution. |
As we’ve said (multiple times), we are not opposed to having that, but few/none of the existing pip maintainers are in the position to propose a good design for it. And from what we see, people that have done significant research on the topic prefer to do this outside of pip (building an in-house index and avoid direct reliance on PyPI), and we are willing to follow their advice, and assist by accepting documentation updates that clearly point users to the correct direction. |
So... I agree with the idea here -- getting rid of the ability to tell pip to use multiple indexes. We'd push this problem to folks who wanna do anything involving multiple indexes/cascading/whatever to tools like devpi, Artifactory etc who are in a significantly better position than us to push specific opinionated choices. In other words, this simplifies the model for what-component-is-responsible here -- pushing the problem of managing "hierarchy of indexes" to the tools are already built for solving those issues and can make opinionated choices. We have the hammer of "it is a security concern" -- our position can then be "we can't keep this around in it's current form, and there's better tools for your workflows". So... thoughts? :)
NOPE. This is more complexity -- I'd prefer to have less. :) |
In principle I agree with this. More than that, I think it's an excellent idea. It fits in with my general preference, to move self contained functionality that's proven to be problematic and/or specialised out of pip and into tools dedicated to handling those cases well. But it's a non-trivial breakage, and will without doubt cause quite a lot of complaint from users who are happily using the current functionality without problems (they are either using it as intended, or they have mitigated or don't care about the risks). I have no idea how we can publicise this sufficiently well to avoid fallout - we've already used up a lot of user goodwill with all of the (necessary) churn around the new resolver and the desupport of Python 2. I sort of feel that this is something we should be reaching out to the user community to get their views. We could put up a survey asking for opinions (maybe options like "hard break", "add warnings but retain the existing options", "do nothing, users are happy to take the view that it's user error and not pip's issue"). But I don't know how to contact the "user resource group" that got set up for the UX work, and I've no idea how we'd extend that group to include views from some of the "big corporates" caught by the dependency confusion work (Apple, Microsoft, etc). Nor do I have a real feel what we'd do with the results beyond a heavy-handed "do whatever gets the most votes". @nlhkabu @ei8fdb Any views on this?
Definitely. Given that the current approach is getting misunderstood (and as a result misused) I have little confidence that something more complex will be clearer. |
Consolidating this into #8606. |
What's the problem this feature will solve?
According to the research of Alex Birsan internal infrastructure could be attacked with injection the same-name package into the public pip.
For example if I use package
my_super_secret_package
that available only in my internal repo everything is ok. But after some attacker create the packages with the same name into the public pip repo but with the higher version my installation would be compromised by external code :(Describe the solution you'd like
The idea is deprecate
--extra-index-url
argument and print danger warnings each time when this argument is used.You always can use
--index-url
with your own proxy pip installation (which one will not return the public available versions for your internal packages).Also it's possible add new one argument
--internal-index-url
(or something like this) with more clear and secure semantic:--index-url
for package resolving strategy.Alternative Solutions
It's possible to rethink
--extra-index-url
behaviour for more secure semantic, but my understanding is that is incorrect to change semantic for existing parameters. It's always better introduce new one for transparent transition.The text was updated successfully, but these errors were encountered: