Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets via entry points #517

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

seanmacavaney
Copy link
Collaborator

This work-in-progress allows datasets to be defined in external packages through the entry point pyterrier.dataset_provider.

The existing datasets have been moved into two providers: builtin (for default set of datasets), and irds (for the ones provided by ir-datasets).

This sets the stage for a few things:

  • Allowing other packages to provide datasets without the current workaround (first import the package, then have the import add the datasets to pyterrier.datasets.DATASET_MAP).
  • Dropping the dependency to ir-datasets (by allowing it to provide the pyterrier-compatible interface itself)

A side benefit is that dataset objects don't need to be created until needed. This could reduce the import time of the core pyterrier package.

The diff looks messy due to moving stuff around. The key new bits are in pyterrier/datasets/_core.py.

@seanmacavaney seanmacavaney changed the title WIP: Datasets via entry points Datasets via entry points Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant