Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer upper bounds when resolving/backtracking #13017

Closed

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Oct 14, 2024

Fixes: #12993
Fixes: #12990
Fixes: #12430
Fixes: #13030

This PR is built on top of #12982 so that the unit tests can be expanded, either that PR can be reviewed first, or this PR can supplant that PR.

I have developed some benchmark scripts to ensure that changes to pip's resolution algorithm don't regress common real world requirements: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks.

I plan to keep building out more scenarios, you can see the current ones so far here: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/tree/main/scenarios

Upon testing this PR compared to pip 24.2 I see one small regressions and two big improvements:

Difference for scenario scenarios/problematic.toml - autogluon:
    	Success: False -> True.
    	Failure Reason: Build Failure -> None.

Difference for scenario scenarios/problematic.toml - boto3-urllib3-transient:
    	Number of packages processed: 869 -> 871

Difference for scenario scenarios/big-packages.toml - apache-airflow-all:
    	Number of requirements processed: 593 -> 592
    	Number of packages processed: 681 -> 661

The fact that autogluon can resolve is a big improvement, apache-airflow[all] gets a noticeable improvement in how many packages it has to process (and this has real time improvement, as the number of packages processed can have O(n^2) complexity) , and a scenario involving boto3 and urllib3 as transient requirements gets a small regression in having to process 2 more packages.

I am hoping to find more real world scenarios where this has a noticeable difference, but I think these results are sufficient to show this approach is a net positive.

@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 14, 2024

Very tentatively adding this to the 24.3 milestone on the basis of:

But I understand if no maintainer is available to review.

@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 15, 2024

Added more problematic scenarios in: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/blob/main/scenarios/problematic.toml

And found this also fixes #12430 (which was merged into another issue, but the specific resolution the user had is now solved by this).

@potiuk
Copy link
Contributor

potiuk commented Oct 15, 2024

I do not know pip resiolution internals - but the rules explained make sense and might improve a number of cases indeed.

@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 18, 2024

I took a look to see whether it made any difference to put upper bound preference above or below backtracking cause preference, and at least in the scenarios I currently have in https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/blob/main/scenarios it didn't make any significant difference (there was a very slight regression of apache-airflow-beam putting it below, as it visited 1 extra package).

So I consider this good in its current position, and if I find a scenario in the future, or a user reports one, where it does make a significant difference, then it can be changed.

@notatallshaw
Copy link
Member Author

Found a minor improvement, in acryl-datahub[all] which has over 300 total dependencies, it visited 1 less requirement, 6 less packages, and produced a slightly better solution: notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks#2 (comment)

@sbidoul
Copy link
Member

sbidoul commented Oct 26, 2024

While this looks very reasonable I'd prefer to have another resolver expert (which I am not, unfortunately) to look into this. So postponing.

@sbidoul sbidoul modified the milestones: 24.3, 25.0 Oct 26, 2024
@notatallshaw
Copy link
Member Author

While this looks very reasonable I'd prefer to have another resolver expert (which I am not, unfortunately) to look into this. So postponing.

I knew this one was pretty unlikely but I thought I'd give it a shot since the recent real world issues raised that this solves.

@notatallshaw
Copy link
Member Author

notatallshaw commented Nov 10, 2024

Going to make a single follow up PR once #13001 lands, I'll comment here once done.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2024
@notatallshaw notatallshaw removed this from the 25.0 milestone Dec 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.