Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build string when resolving with --lock-file from environment #363

Open
2 tasks done
tdejager opened this issue Feb 24, 2023 · 14 comments
Open
2 tasks done

Add build string when resolving with --lock-file from environment #363

tdejager opened this issue Feb 24, 2023 · 14 comments

Comments

@tdejager
Copy link

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

It seems that build that I assume corresponds to the build_string is not filled in when resolving an environment.yml file with mamba/micromamba/conda.

...snip..
package:
- category: main
  dependencies: {}
  hash:
    md5: d7c89558ba9fa0495403155b64376d81
    sha256: fe51de6107f9edc7aa4f786a70f4a883943bc9d39b3bb7307c04c41410990726
  manager: conda
  name: _libgcc_mutex
  optional: false
  platform: linux-64
  url: https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
  version: '0.1'
- category: main
  dependencies: {}
  hash:
    md5: ff9f73d45c4a07d6f424495288a26080
    sha256: 8f6c81b0637771ae0ea73dc03a6d30bec3326ba3927f2a7b91931aa2d59b1789
  manager: conda
  name: ca-certificates
  optional: false
  platform: linux-64
  url: https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2022.12.7-ha878542_0.conda
  version: 2022.12.7
- category: main
  dependencies: {}
  hash:
    md5: 7aca3059a1729aa76c597603f10b0dd3
    sha256: f6cc89d887555912d6c61b295d398cff9ec982a3417d38025c45d5dd9b9e79cd
  manager: conda
  name: ld_impl_linux-64
  optional: false
  platform: linux-64
  url: https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.40-h41732ed_0.conda
  version: '2.40'

None of the locked packages have the build attribute. Moreover, conda_solver.py:183 where the structure is being created does not fill it at all. I think the mamba json does return this.

Why is this needed?

Because it is an identifying feature for a package.

What should happen?

The code should be modified to fill in the build string where possible. We can now retro-actively extract it from the URL if we want, but I guess this would be more correct.

Additional Context

We are using the lock-files at prefix, if you want we could take a stab at a PR :)

@maresb
Copy link
Contributor

maresb commented Feb 24, 2023

Sounds good to me, perhaps under build_name? Note that this involves a (non-breaking, I think) change to the unified lockfile spec.

@mariusvniekerk, now that we're post-graduation, is there a procedure for maintaining and updating the spec?

The initial brainstorming of the spec occurred on mamba-org/mamba#1209.

@tdejager
Copy link
Author

Ah I supposed that:

build: Optional[str] = None

build attribute was meant for this, maybe it's something else?

@maresb
Copy link
Contributor

maresb commented Feb 24, 2023

Ah, indeed! Given that, from my perspective, I think this should be straightforward to get merged.

@tdejager
Copy link
Author

tdejager commented Feb 26, 2023

There is another thing I should have mentioned before, but there is also a build number for conda packages but that is not included in the format at all.

We could just put it in the build attribute (it's at the end of the build string) but it's better to be explicit I suppose.

WDYT? @maresb

@maresb
Copy link
Contributor

maresb commented Feb 26, 2023

We get build number from micromamba; see #338 (comment). The format from Conda is a bit different though. I don't remember off the top of my head.

@maresb
Copy link
Contributor

maresb commented Feb 26, 2023

Conda/Mamba:

      {
        "base_url": "https://conda.anaconda.org/conda-forge",
        "build_number": 0,
        "build_string": "pyhd8ed1ab_0",
        "channel": "conda-forge",
        "dist_name": "pip-23.0.1-pyhd8ed1ab_0",
        "name": "pip",
        "platform": "noarch",
        "version": "23.0.1"
      }

Micromamba:

            {
                "build": "pyhd8ed1ab_0",
                "build_number": 0,
                "build_string": "pyhd8ed1ab_0",
                "channel": "https://conda.anaconda.org/conda-forge/noarch",
                "constrains": null,
                "depends": [
                    "setuptools",
                    "wheel",
                    "python >=3.7"
                ],
                "fn": "pip-23.0.1-pyhd8ed1ab_0.conda",
                "license": "MIT",
                "md5": "8025ca83b8ba5430b640b83917c2a6f7",
                "name": "pip",
                "sha256": "e1698cbf4964cd60a2885c0edbc654133cd0db5ac4cb568412250e577dbc42ad",
                "size": 1366466,
                "subdir": "noarch",
                "timestamp": 1676670714,
                "track_features": "",
                "url": "https://conda.anaconda.org/conda-forge/noarch/pip-23.0.1-pyhd8ed1ab_0.conda",
                "version": "23.0.1"
            }

So build_number should be straightfoward.

Also on my wishlist for a long time has been the package upload date, or timestamp. It exists in the repodata, but is for some reason not included by Conda in the dry-run json.

@baszalmstra
Copy link

For me, it would be ideal if the conda-lock file contains all the identifying properties that enable matching a MatchSpec. This would mean that all properties found in the repodata should also be present in the conda-lock file, since MatchSpec can technically match against any of those properties.

I would like to have this feature so we can check if MatchSpecs in an environment.yml file already match a conda-lock file. If they do, we can skip the solving process altogether.

After reviewing model.py and repodata.json, I found some diverging or missing fields:

  • In conda lock, platform is equivalent to subdir in repodata.json.
  • In repodata.json, platform refers to something else (win, linux, etc.), which is probably fine.
  • arch is missing but can be derived from subdir.
  • build_number is missing.
  • license and license_family are missing, which seems important because it means you could upload different binaries that only differ by license.
  • timestamp is also missing.
  • Some additional fields like preferred_env, package_type, and date can also be present according to the Conda model but I’ve not seen them used ever.

Although not all properties may be exposed by Conda/Mamba, we should consider adding these fields to the model. WDYT?

@maresb
Copy link
Contributor

maresb commented Feb 26, 2023

I'm in favor of adding all the available data.

One potential challenge is that we work with PyPI dependencies, and I'm not sure if or how much we use the MatchSpec for this purpose.

Another challenge is the divergence between conda and libmamba. Indeed, this weekend I've been trying to improve the reliability of the conda-lock CI tests against apparent race conditions. One annoyance about Micromamba is the lack of pkgs_dirs in micromamba info --json. I'd like to be able to locate repodata.json so that I get access to missing data, and it would make things much easier if I could do this with Micromamba. Do either of you have ideas for how to compute pkgs_dirs without Conda?

Prefix looks like a very exciting venture!!! I'm really looking forward to what comes out of it. Also, I really wish I knew Rust! 😄

@tdejager
Copy link
Author

Thanks! @maresb :) We are as well! 😄

Do you think we would need to introduce a new version if we add any extra fields? I suppose the version is still at 1 but I'm unsure when you want to bump it.

I've asked about the pkg_dirs on our zulip :)

@maresb
Copy link
Contributor

maresb commented Feb 27, 2023

Thanks!!!

To clarify, are you asking about the lockfile version? If so, then my understanding (meaning what I say should be verified with people like Marius and Wolf) is that this integer for version is semantically major so that backwards compatible changes don't require an increment. My understanding is also that adding a new optional field is backwards compatible. So I believe that as long as nothing is renamed we can stay at 1.

AFAIK there is no formal spec, so it feels a bit silly to be discussing as if there were one. But we should probably formalize it so that versions and changes are concrete.

@tdejager
Copy link
Author

Ah okay! Yeah that is what I meant :)

@wolfv
Copy link

wolfv commented Feb 27, 2023

@maresb it's maybe not as nice but you can get this as YAML:

╰─$ micromamba config list pkgs_dirs
pkgs_dirs:
  - /Users/wolfv/micromamba/pkgs
  - /Users/wolfv/.mamba/pkgs

@wolfv
Copy link

wolfv commented Feb 27, 2023

It should be quite trivial to add this info to the info --json output as well :)

@maresb
Copy link
Contributor

maresb commented Feb 27, 2023

Awesome, thanks for the tip!!! This will be an enormous help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants