Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: only instantiate conda prefix for pypi solve on request #3029

Merged
merged 28 commits into from
Feb 4, 2025

Conversation

nichmor
Copy link
Contributor

@nichmor nichmor commented Jan 29, 2025

Intro

Pixi has had PyPI dependency support for a while now, we do this by integrating with uv. As you might be aware, python has the concept of source dependencies and also for some older source distributions these might not contain static metadata. In general when static metadata is available, then a python interpreter is only needed at install time not at resolve time, however when projects use a setup.py an interpreter is needed during resolve, because the metadata needs to be built.

Additionally, projects like: https://github.com/facebookresearch/detectron2 even need certain libraries to be available (other than just python) during the solve. With pixi, one can make use of the instantiated conda prefix for this. This is one of the benefits when compared to a PyPI only solve, we can install the system dependencies we need. However, this is only needed in a select number of cases. What we did, however, is instantiate a conda prefix if that environment contained PyPI dependencies.

If your environments were big, this can take a significant amount of time.

What changed?

This PR changes that behavior, only when source distributions (i.e git, non-editable path, or sdist) is encountered, do we actually instantiate the prefix. If you do not have any source dependencies, we instantiate no prefix before doing a PyPI solve. You can further enforce this by adding no-build = true to your [pypi-options]. Also when a previously built sdist is cached, i.e. the built metadata is available, we also skip instantiation.

So we expect a much faster solve for tomls with little source dependencies and heavy environments. This verification can be seen in the comments. This will be the first step though, there are also repositories like rerun-io/rerun that actually do have numerous source dependencies but only require python for the most of them. A further PR could be introduced where we also only instantiate a partial prefix on solve, this does require some user-input, as we need to know which packages need to be available.

While this changes a lot of the pre-solve behavior, I (@tdejager) expect that not a lot should change, and that most cases should become faster. We could also enforce no-build = true in our CI (that have no source dependencies), so that we are sure we never built when do not need to. This again is out of scope for this PR.

How to test it?

You can use this pixi.toml. Running pixi install should install only default env, and py39 because it request sdist.

Do not forget to clean uv-cache! Otherwise, it is possible to skip entire installation of py39

[project]
authors = ["nichmor <[email protected]>"]
channels = ["conda-forge"]
description = "Add a short description here"
name = "partial-env"
platforms = ["osx-arm64"]
# platforms = ["linux-64"]
version = "0.1.0"

[tasks]

[dependencies]
python = "*"
rust = "*"
pytorch = "*"

[pypi-dependencies]
boltons = "*"
# pillow = { git = "https://github.com/python-pillow/Pillow.git" }
# sdist = "*"



[feature.py39.dependencies]
python = "3.9"

[feature.py39.pypi-dependencies]
boltons = "*"
sdist = "*"

[feature.py310.dependencies]
python = "3.10"

[feature.py310.pypi-dependencies]
boltons = "*"

[feature.py311.dependencies]
python = "3.11"

[feature.py311.pypi-dependencies]
boltons = "*"



[environments]
py39 = ["py39"]
py310 = ["py310"]
py311 = ["py311"]

@ruben-arts ruben-arts added ⏩ performance An issue related to performance 🐍 pypi Issue related to PyPI dependencies labels Jan 30, 2025
@ruben-arts
Copy link
Contributor

ruben-arts commented Jan 30, 2025

Just a quick test of this PR 😉

gh repo clone skops-dev/skops
cd skops
pixi g i pixi==0.40.3 -e pixi40_3 --expose pixi40_3=pixi
hyperfine -r 3 -w 1 -p 'pixi clean && rm -f pixi.lock ' 'pixi list' 'pixi40_3 list'

Results in:

Benchmark 1: pixi list
  Time (mean ± σ):     17.414 s ±  3.818 s    [User: 7.603 s, System: 15.594 s]
  Range (min … max):   14.126 s … 21.601 s    3 runs
 
Benchmark 2: pixi40_3 list
  Time (mean ± σ):     41.604 s ±  7.046 s    [User: 11.093 s, System: 297.530 s]
  Range (min … max):   33.746 s … 47.359 s    3 runs
 
Summary
  pixi list ran
    2.39 ± 0.66 times faster than pixi40_3 list

I think we can still make this way faster but we need to figure out why this still takes time. (its spend on building the editable)

@tdejager tdejager added the test:extra_slow Run the extra slow tests label Jan 31, 2025
src/lock_file/build_dispatch.rs Outdated Show resolved Hide resolved
src/lock_file/build_dispatch.rs Outdated Show resolved Hide resolved
src/lock_file/build_dispatch.rs Outdated Show resolved Hide resolved
src/lock_file/build_dispatch.rs Outdated Show resolved Hide resolved
src/lock_file/build_dispatch.rs Outdated Show resolved Hide resolved
let non_isolated_packages = OnceCell::new();
let python_env = OnceCell::new();

let pixi_build_dispatch = PixiBuildDispatch::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the build dispatch can now panic when getting the interpeter, we might need to catch this and re-throw to something nice. I guess we could just wrap that around the actual resolve call, ditto for the LookaheadSolver.

src/project/mod.rs Show resolved Hide resolved
tests/integration_rust/add_tests.rs Outdated Show resolved Hide resolved
@tdejager tdejager requested a review from baszalmstra February 1, 2025 12:58
@tdejager
Copy link
Contributor

tdejager commented Feb 1, 2025

@baszalmstra did an initial review myself and pointed out some improvements. Would be good if you can take a look.

Also note to self @tdejager, let's test this on rerun-io repo as well, in that case some prefixes should still be instantiated, cause its using a lot of source deps.

@tdejager tdejager marked this pull request as ready for review February 1, 2025 13:01
@tdejager
Copy link
Contributor

tdejager commented Feb 1, 2025

Ok all seems good on https://github.com/rerun-io/rerun, pixi install succeeds and for a, heavily cached comparison (I ran install a couple of times), it takes:

pixi clean && rm pixi.lock && pixi install:
This PR: 16s
latest release: 32s
Timings are just the pixi install part.

So no performance regressions there. @jleibs this might be interesting for you guys as well :)

@tdejager tdejager changed the title feat: add first implementation of lazy dispatching feat: only instantiate python prefix when needed Feb 3, 2025
@tdejager tdejager changed the title feat: only instantiate python prefix when needed feat: only instantiate conda prefix for pypi solve on request Feb 3, 2025
Copy link
Contributor

@ruben-arts ruben-arts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some small remarks! Looking forward to this change! 🚀

pub(crate) async fn update(
&self,
records: Arc<PixiRecordsByName>,
) -> miette::Result<CondaPrefixUpdated> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make an error type

Copy link
Contributor

@tdejager tdejager Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, will not be able to this in this PR as it would result in needing to refactor a lot of miette types in update. We need to this seperately. I can take the message from miette for update_prefix_conda and find_installed_packages and put them in an error but I do not believe this to be advantageous at this point.

let build_context = self.build_context.clone();
let group_name = group_name.clone();
let has_existing_packages = !installed_packages.is_empty();
let python_status = environment::update_prefix_conda(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a good idea to move this function here? Looks like it's in the wrong location now.

Copy link
Contributor

@tdejager tdejager Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing thsi refactor but this exploded into a major refactor, you've noticed a good point for sure. We should do that refactor afterwards. But I think this needs to be out of scope for this PR.

src/lock_file/update.rs Outdated Show resolved Hide resolved
@ruben-arts
Copy link
Contributor

I've continued testing, and found nothing wrong. I've got one question.

It's now faster even when you do have sdist as a dependency in my pypi-dependencies. Any idea why that would be?

@tdejager
Copy link
Contributor

tdejager commented Feb 4, 2025

I've continued testing, and found nothing wrong. I've got one question.

It's now faster even when you do have sdist as a dependency in my pypi-dependencies. Any idea why that would be?

Could be that sdist is cached for that version and if that is the case, it does no instantation of the prefix during solve at all. Unsure what your test parameters were :)

@tdejager tdejager requested review from ruben-arts and removed request for baszalmstra February 4, 2025 13:57
@tdejager tdejager merged commit a9f4927 into prefix-dev:main Feb 4, 2025
30 checks passed
@tdejager tdejager mentioned this pull request Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏩ performance An issue related to performance 🐍 pypi Issue related to PyPI dependencies test:extra_slow Run the extra slow tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants