Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetching pip dependencies with Cachi2 #295

Merged
merged 19 commits into from
Feb 11, 2025
Merged

Prefetching pip dependencies with Cachi2 #295

merged 19 commits into from
Feb 11, 2025

Conversation

ccronca
Copy link
Collaborator

@ccronca ccronca commented Feb 5, 2025

Continuation of #285

This PR updates the "pull request" pipeline to prefetch pip dependencies.

  • Renamed requirements.txt to requirements.in, which better matches its purpose. It lists direct dependencies that are manually managed
  • requirements.txt contains the full set of dependencies (direct + transitive) and is generated using pip-compile
  • Konflux recommends using hashes for pip dependencies, but that’s not possible due to py-nessus-pro, which can’t be obtained this way (pip doesn’t support hashing for version-controlled repos). This means that dependencies won't be verified against the expected hash value

requirements.txt Outdated
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generating this file using --generate-hashes, a comment will be added containing the following warning:

# WARNING: pip install will require the following package to be hashed.
# Consider using a hashable URL like https://github.com/jazzband/pip-tools/archive/SOMECOMMIT.zip
py-nessus-pro @ git+https://github.com/sfowl/py-nessus-pro.git@41abc315c0e7e1320bb60732830eaf8e8ed3ba84

This will cause Cachi2 to fail with the following error:

2025-02-05 12:19:29,188 ERROR PackageRejected: Hash is required, dependency does not specify any: py-nessus-pro @ git+https://github.com/sfowl/py-nessus-pro.git@41abc315c0e7e1320bb60732830eaf8e8ed3ba84
Error: PackageRejected: Hash is required, dependency does not specify any: py-nessus-pro @ git+https://github.com/sfowl/py-nessus-pro.git@41abc315c0e7e1320bb60732830eaf8e8ed3ba84

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a vague intention to submit the changes in my fork of py-nessus-pro upstream, but it fell off my radar. I'll create a ticket to follow up on it. If that's successful, then we can go back to consuming this from pypi.

pyproject.toml Outdated
@@ -1,2 +1,13 @@
[project]
name = "rapidast"
version = "0.0.1"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cachi2 reads the project version when parsing pip dependencies, so I added a placeholder value. As far as I can tell, this version is only used in the generated Cachi2 SBOM when referencing the rapidast Python package.

    {
      "name": "rapidast",
      "purl": "pkg:pypi/[email protected]?vcs_url=git%2Bssh://git%40github.com/RedHatProductSecurity/rapidast.git%40f1bdcc969e21164c8ee70ca9bc70ad47826e5099",
      "version": "0.0.1",
      "properties": [
        {
          "name": "cachi2:found_by",
          "value": "cachi2"
        }
      ],
      "type": "library"
    },

If necessary, we could set a more meaningful version (e.g., based on releases), but it will require some maintenance as 'Cachi' doesn't seem to support dynamic versions. Another option is to omit the version entirely, Cachi2 would assign a default one, but this will make the file invalid since version is a required property

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If necessary, we could set a more meaningful version (e.g., based on releases), but it will require some maintenance Just wondering what the maintenance would include, aside from updating the version in this file per a release?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly that! 😃 My point is that this version isn't currently being used for anything, so there's no need to add an extra step to the release process, even if it's automated.

Copy link
Collaborator Author

@ccronca ccronca Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing offline, we decided that using a dummy version might give end-users the impression that the project is not properly managed. To address this, I've updated it to use a dynamic version instead. However, since Cachi2 does not support dynamic versions, the BOM generated by Cachi2 will not include any version. This will be transparent to end-users, as they will only see the project.toml file pointing to the latest Git tag via a dynamic version. Since this BOM is not used anywhere in Rapidast, this change should have no impact.

We could set this to a valid version, but that would require managing these versions as part of the release process, even though we are not publishing Rapidast as a Python package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SBOM made by konflux I see versions based on commits like this:

      "name": "rapidast",
      "externalRefs": [
        {
          "referenceLocator": "pkg:pypi/rapidast?vcs_url=git%2Bhttps://github.com/RedHatProductSecurity/rapidast%401572b0df8bfa0854dd32f1573e8153cc1854278d",

Which seems good to me, but not sure if you intended something more like a git tag? (e.g. like a Go pseudo-version)

Copy link
Collaborator Author

@ccronca ccronca Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this change should only impact the Rapidast package version in the SBOM generated by Cachi2, which won’t display any version for the Rapidast python package because Cachi2 doesn't support dynamic version in project.toml

Konflux uses Syft to generate the SBOM (while also merging the one from Cachi2), so the Rapidast package should show up there but without a version.

Cachi2-generated SBOM using a hardcoded version vs. a dynamic version:

    {
      "name": "rapidast",
      "purl": "pkg:pypi/[email protected]?vcs_url=git%2Bssh://git%40github.com/RedHatProductSecurity/rapidast.git%40f1bdcc969e21164c8ee70ca9bc70ad47826e5099",
      "version": "0.0.1",
      "properties": [
        {
          "name": "cachi2:found_by",
          "value": "cachi2"
        }
      ],
      "type": "library"
    },
    {
      "name": "rapidast",
      "purl": "pkg:pypi/rapidast?vcs_url=git%2Bssh://git%40github.com/RedHatProductSecurity/rapidast.git%400abaa8937a8192cff581ed3214c5f6489c39c43a",
      "properties": [
        {
          "name": "cachi2:found_by",
          "value": "cachi2"
        }
      ],
      "type": "library"
    },


This will regenerate `requirements.txt` with the newly added dependency and its pinned versions.

### Build dependencies
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cachi2 suggest installing a pre-commit hook to automate this process. We can implement it if needed, although these dependencies should not change frequently

containerize/Containerfile Outdated Show resolved Hide resolved
@ccronca ccronca marked this pull request as ready for review February 5, 2025 20:44
@ccronca ccronca requested a review from a team as a code owner February 5, 2025 20:44
@sfowl
Copy link
Collaborator

sfowl commented Feb 10, 2025

Do similar changes need to be made in the e2e tests, since they also do a pip install? I suppose this isn't an option for the github unit tests, as they can't leverage the cachi2 prefetched components.

@ccronca
Copy link
Collaborator Author

ccronca commented Feb 10, 2025

Do similar changes need to be made in the e2e tests, since they also do a pip install? I suppose this isn't an option for the github unit tests, as they can't leverage the cachi2 prefetched components.

I'm not familiar with E2E tests, but from what I know, it shouldn’t really be necessary. The buildah task works with Cachi2 by injecting variables and swapping out dependency locations with the prefetched artifacts, and this all happens within buildah itself. So, it shouldn’t affect the E2E tests.

But if the question is whether we should do prefetching in the E2E tests, I’d say yes. It would help make sure the test environment matches what we expect. The key thing is integrate Cachi2 with the E2E testing task to inject the variables and replace the dependencies with the prefetched artifacts

Copy link
Collaborator

@sfowl sfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ccronca ccronca merged commit caa1c80 into development Feb 11, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants