-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389
Comments
Regarding performance, since the curent implementation does not validate the tags at all anyway, maybe we can just return everything between version and |
If we're considering performance to this level, we could also say we shouldn't parse the version into a As a data point, I just scanned all of the wheels on a slightly out of data local copy of all the PyPI data, and I see 1.4M wheels that don't use build tags, and 1700 that do. So I don't think parsing build tags is a problem in practice 😉 (Disclaimer - I did this by counting dashes, and there were some wheels with 3 dashes and one with 6, which suggests that at least some of the 5-dashes cases may be malformed wheel names, which means the number actually using build tags could be even lower). |
Thanks for that data point, @pfmoore ! It sounds like just going ahead and doing the build tag parsing is the right move as they are seemingly very rare. Then I'm advocating for BuildTag = typing.NewType("BuildTag", Union[Tuple[], Tuple[int, str]]) See https://github.com/pypa/packaging/blob/main/packaging/utils.py#L15 for an example of us already using type aliases for structured/formatted types. If that works for people I can do a PR to update the code for that. |
Works for me. |
I practically have a PR ready with the proposed change; should have the PR opened before the end of the day. |
PR is up for review! |
Right now,
packaging.utils.parse_wheel_filename()
returnsNone
for a build tag if it isn't defined,str
if it is. There are two potential ways of tweaking this.(See #387 for background.)
Return an empty string
One possible change is simply return the empty string instead of
None
. This just normalizes the return type.Return a tuple
Another option is to return a tuple. Pip uses
()
as a default and(int, str)
if there is a build tag (as per PEP 427). The default could also be(0, '')
to get similar sorting results as PEP 427 outlines. But the key point here is to automatically do the build tag parsing as appropriate to get people results.Perf consideration
Is it worth spending the (literal) energy on parsing build tags every time? For instance, if one was to parse every wheel file name in a large Simple repository index (e.g. https://download.pytorch.org/whl/torch_stable.html) and it happened to use build tags extensively, would it be worth parsing things ahead of time, or is it better to punt until absolutely necessary?
I think this comes down to whether build tags take precedence over wheel tags in wheel selection, or are they more of a tiebreaker when wheel tags are the same? E.g. is
pkg-2021.1-cp39-cp39-manylinux.whl
a better fit thanpkg-2021.1-42-py3-none-any.whl
, or the other way around if you want the best fit of a wheel? If wheel tags take precedence (typically), then doing the parsing work should only be done if necessary. But if a wheel tag (typically) takes precedence then pre-emptively doing it for a simpler API makes sense to me.The text was updated successfully, but these errors were encountered: