Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for plugins #12985

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

facutuesca
Copy link

@facutuesca facutuesca commented Oct 1, 2024

This is a PR with an initial implementation of the plugin support discussed in #12766.

High-level description

Plugins will be detected and loaded via registered entrypoints. For example, a Python package that contains the following in pyproject.toml:

[project.entry-points."pip.plugins"]
example-distinspector = "pip_plugin_example"

will register the example-distinspector plugin by providing the pip_plugin_example module object. This module object will be loaded by pip and its methods will be called at different places while pip runs.

The methods that will be called (and the places where they will be called from) depend on the plugin type. For now, this implementation only supports a single plugin type: dist-inspector, which should provide three functions:

def plugin_type() -> str:
    return "dist-inspector"

def pre_download(url: str, filename: str, digest: str) -> None:
    # contract: `pre_download` raises `ValueError` to terminate
    # the operation that intends to download `filename` from `url`
    # with hash `digest`
    pass

def pre_extract(dist: Path) -> None:
    # contract: `pre_extract` raises `ValueError` to terminate
    # the operation that intends to unarchive `dist`
    pass

These functions (provided by the plugin in module pip_plugin_example) will be called by pip right before downloading or extracting a distribution file.

A repository with an example plugin package is here.

Implementation details

  • I focused on putting most of the plugin logic in new, separate files (models/plugin.py and utils/plugins.py). The only change to existing files is adding a function call in the specific places the plugin should run (before download and before extraction).
  • Plugin loading is conservative: if any of the expected functions is missing from the loaded module, we log a warning and skip using the plugin.
  • Since plugins should only raise ValueError, in case of other exception types I put a defensive check that logs a warning and converts the exception to a ValueError. This means that plugins that don't follow the contract will still work (with a warning). If we want to avoid that and just abort when a plugin misbehaves, this check needs to be changed.

Open questions

  • Are the places where we call the hooks (pre-download and pre-extract) correct?
  • Would loading the plugins on-demand (during the first call to a plugin hook) be better than loading them (always) during startup?

TODO

  • Once discussion is done, add tests and docs

cc @woodruffw

from pathlib import Path
from typing import Iterator, List

from pip._vendor.pygments.plugin import iter_entry_points
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using an undocumented helper function from pygments here. It could disappear without warning. If we need the functionality we should implement our own copy. Although isn't this available in importlib.metadata anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! I copied the logic to this file. The reason why we need it is because the API for EntryPoints changed in Python 3.10, so the way to get entry points matching a specific group name depends on the Python version:

groups = entry_points()
if hasattr(groups, "select"):
    # New interface in Python 3.10 and newer versions of the
    # importlib_metadata backport.
    return groups.select(group=group_name)
else:
    assert hasattr(groups, "get")
    # Older interface, deprecated in Python 3.10 and recent
    # importlib_metadata, but we need it in Python 3.8 and 3.9.
    return groups.get(group_name, [])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sick of the instability of the importlib.{metadata, resources} APIs, but we have to work with what we have, I guess :-(

Thanks for doing this.

continue
plugin = plugin_from_module(entrypoint.name, module)
if plugin is not None:
_loaded_plugins.append(plugin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a great fan of running complex code at import time. Couldn't we have a load_plugins() function that gets called at a suitable point in pip's initialisation process? Or better still, load plugins on demand, because we don't want to impact performance loading plugins we don't need.

Copy link
Author

@facutuesca facutuesca Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I moved it into a load_plugins and called it right before command.main(cmd_args).

For loading plugins on demand, maybe we can change utils.plugins.plugin_pre_download_hook and utils.pugins.plugin_pre_extract_hook to check if plugins have been loaded, and load them if not.
Then the plugins should only be loaded when the first hook is called (and never if the code path does not include any hooks).
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll do for now, but let's keep on-demand loading as an option, and do some proper testing at some point. I'm very aware that uv has demonstrated that performance is an important characteristic in an installer, and I want to make sure we avoid making pip slower and slower as we add features.

@pfmoore
Copy link
Member

pfmoore commented Oct 1, 2024

I'm still not clear on the status of the dist-inspector plugin type. Is it intended as an actual, supported plugin type for pip, and more to the point, is it intended to be the only supported plugin type (until someone submits PRs providing other plugin types)? What evidence do we have that this is plugin type is important? And where are the use cases (issues, feature requests) providing evidence that people need this plugin type and will use it?

I'm still trying to make the point that we shouldn't be adding a plugin mechanism in isolation. Rather, we should have a real world use case that justifies being implemented as a plugin, and we should create a plugin mechanism in support of implementing a solution for that use case. Specifically, with a concrete use case, we can decide whether it warrants something as general as a plugin approach, or if a builtin feature specific to that use case would be a better solution.

At the moment, this still feels like a solution looking for a problem.

@facutuesca
Copy link
Author

facutuesca commented Oct 1, 2024

I'm still not clear on the status of the dist-inspector plugin type. Is it intended as an actual, supported plugin type for pip?
....
Rather, we should have a real world use case that justifies being implemented as a plugin

Yes, it's intended as an actual plugin type for pip. One of the motivations behind the dist-inspector plugin type (that I'm aware of) is enabling verification of PEP 740 attestations during package installation. Since verifying these attestations requires sigstore-python, doing it inside pip is not possible (sigstore-python depends on pyca/cryptography, which AFAIK can't be vendored in).

Instead, we could use a plugin (of type dist-inspector) with a pre_download hook that downloads the provenance file for the artifact about to be downloaded, and verifies that provenance against the artifact's digest. If verification fails, we can interrupt the installation process before the user even downloads the artifact.

(edit: Support for downloading the PEP-740 attestations from PyPI is almost complete. Once it's finished, I'll create a repo with a plugin that does the above)

is it intended to be the only supported plugin type (until someone submits PRs providing other plugin types)?

Unless there is a strong need for other types, I think starting with just one is a good idea to keep things simple

@pfmoore
Copy link
Member

pfmoore commented Oct 1, 2024

One of the motivations behind the dist-inspector plugin type (that I'm aware of) is enabling verification of PEP 740 attestations during package installation.

Thank you. That had been mentioned before, but I'd forgotten. The sigstore dependency is a good example of why a plugin is needed.

@facutuesca facutuesca force-pushed the add-plugin-support branch 2 times, most recently from de5dc8a to 0b0e276 Compare October 22, 2024 14:36
@facutuesca
Copy link
Author

facutuesca commented Nov 8, 2024

After some thought, I think having plugin "types" would cause more pain than necessary. For example, let's say at some point someone comes up with a valid usecase for a "distribution inspector" plugin that requires a hook different than the ones defined originally for dist-inspector (pre-download and pre-extract).
That means pip would have to either:

  • Change the definition of dist-inspector to include the new hook (potentially breaking existing plugins).
  • Create another plugin type that is also a "dist inspector", but with an extra hook and a different name (increasing complexity and confusion).

Given that, I propose that instead of having plugin types, each plugin reports which hooks it wants to implement:

def provided_hooks() -> List[str]:
    return ["pre_download"]

def pre_download() -> None:
   ...

That way, when pip loads the plugin, it checks which hooks it wants to be connected to, and calls it when appropriate. This allows for plugins with a more minimal interface (they only need to implement the hooks they use), and frees pip from having to add, maintain or change plugin types, which is just an abstraction on top of a subset of supported hooks.

I've pushed a commit that makes that change.

I've also created a repo for a plugin that verifies PEP-740 attestations before installing packages. It implements only the pre-download hook, and uses sigstore-python to verify those packages that were uploaded with an attestation: https://github.com/trailofbits/pip-plugin-pep740.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants