Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive list of classifiers from a public, version-controlled source #3786

Closed
waldyrious opened this issue Apr 22, 2018 · 1 comment · Fixed by #7582
Closed

Derive list of classifiers from a public, version-controlled source #3786

waldyrious opened this issue Apr 22, 2018 · 1 comment · Fixed by #7582
Labels
developer experience Anything that improves the experience for Warehouse devs feature request needs discussion a product management/policy issue maintainers and users should discuss

Comments

@waldyrious
Copy link

Per discussions in #1300, in particular the following comments, quoted below for convenience:

ncoghlan:

I think @ timofurrer's question does raise an interesting UX question: would a file in the specifications section of the PyPUG and/or some other PyPA repo that anyone can submit a PR to be a better primary data source for this information than the PyPI database?

Then PyPI would just be a consumer of that file (presenting it via the web URL), rather than the source of the official list.

jonparrott:

I agree with @ ncoghlan - we should have some canonical location for this that's easy to modify.

Could we pull these classifiers out of the warehouse database and just store them in a datafile in this repository?

Alternatively, I could see us establishing a new project that holds the canonical list that warehouse and pypug depends on.

Related: #3028

dstufft:

Easy to modify isn't exactly ideal here, there are a few different types of modifications you can make:

  • Addition: This is easy to cope with, since it's purely addition, Warehouse can simply add it.
  • Deletion: This is less easy to cope with, because there are really two kinds of deletion possible:
    • Deletion where we want to expunge the record from all releases. This is technically easy, but unlikely to actually be what we want (and it would make the PyPI metadata and the package metadata disagree, which is undesirable)
    • Deletion where we simply want to disallow new uploads, containing the classifier, but still want to retain it for historical record.
  • Rename: This is hard to deal with, because you don't want to suddenly start rejecting previous versions of the classifier, it would break people's uploads for little reason, but you want to treat the old and the new name as equivalent.

This also makes a simple text file not really well suited for it, because you can't really different between deletion to expunge from deletion to block from renames. In addition, internally in Warehouse (and legacy PyPI) the trove classifiers are represented as a rows in a database that we foreign key against, so something that we depend on isn't going to be a workable solution unless we do something janky like try to automatically reconcile our database against that dependency (which then starts to get into all of the problems I listed above with having to figure out what sort of change it was made).

Beyond all of that though, regardless of what we call the list in some other location or the list inside of the DB the "canonical" location, practically speaking, PyPI is going to be the defacto canonical location in every way that anyone actually cares about (since 99% of the time, what someone cares about when looking at classifiers, is whether PyPI will accept them or not).

Ultimately, I think the canonical location being on PyPI makes things easier to maintain and manage and it allows us to provide a better user experience for end users as well. It lets us put structured data in the database, while providing a UI to actually manage it that glosses over the details of actually managing that structured data. It also lets us tailor what the list we give people contains, based on what the context of us giving them that list is. For example, in documentation we would almost certainly exclude any legacy aliases for renamed classifiers or deleted classifiers that we still have the record for but are no longer accepting, but for an API endpoint that something like flit might call, we'd want to include all of the classifiers we are currently accepting (legacy alias or not) but none of the ones that we are not. I imagine there'd even be an API that reports all classifiers past and present and their current status.

(...)

waldyrious:

@ brainwane, @ di and all: is there a place where the suggestion made by @ ncoghlan above:

would a file in the specifications section of the PyPUG and/or some other PyPA repo that anyone can submit a PR to be a better primary data source for this information than the PyPI database?

...could be tracked, e.g. by opening a new issue? Or has that already been decided against?

FWIW, I still think that a public and collaborative ("PR-able") data source would be preferable to a private database table. At least the table definitions for recreating the database could be made available in a repo somewhere, similar to https://noc.wikimedia.org, and for similar reasons.

ncoghlan:

I withdrew the basic suggestion of a flat text file based on Donald's comments at #1300 (comment)

That doesn't rule out the possibility of a "classifier log" format though, that tracks the possible operations as a series of historical events:

  • addition of new classifiers
  • renaming of classifiers
  • prohibition of a classifier in new uploads (rare)
  • removal of a classifier from all published metadata records (incredibly rare due to the resulting inconsistent with the artifact's internal metadata)

The way to pursue the idea further would be as a new issue proposing to derive the contents of the classifier table from a source controlled log of classifier changes, and then after discussing a suitable design with the Warehouse devs, working on a PR to actually implement that.

@waldyrious
Copy link
Author

Pinging the authors of the comments above: @ncoghlan, @jonparrott, @dstufft.

@brainwane brainwane added feature request developer experience Anything that improves the experience for Warehouse devs needs discussion a product management/policy issue maintainers and users should discuss labels Apr 22, 2018
@di di closed this as completed in #7582 Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Anything that improves the experience for Warehouse devs feature request needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants