(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389

brettcannon · 2021-01-26T19:17:20Z

Right now, packaging.utils.parse_wheel_filename() returns None for a build tag if it isn't defined, str if it is. There are two potential ways of tweaking this.

(See #387 for background.)

Return an empty string

One possible change is simply return the empty string instead of None. This just normalizes the return type.

Return a tuple

Another option is to return a tuple. Pip uses () as a default and (int, str) if there is a build tag (as per PEP 427). The default could also be (0, '') to get similar sorting results as PEP 427 outlines. But the key point here is to automatically do the build tag parsing as appropriate to get people results.

Perf consideration

Is it worth spending the (literal) energy on parsing build tags every time? For instance, if one was to parse every wheel file name in a large Simple repository index (e.g. https://download.pytorch.org/whl/torch_stable.html) and it happened to use build tags extensively, would it be worth parsing things ahead of time, or is it better to punt until absolutely necessary?

I think this comes down to whether build tags take precedence over wheel tags in wheel selection, or are they more of a tiebreaker when wheel tags are the same? E.g. is pkg-2021.1-cp39-cp39-manylinux.whl a better fit than pkg-2021.1-42-py3-none-any.whl, or the other way around if you want the best fit of a wheel? If wheel tags take precedence (typically), then doing the parsing work should only be done if necessary. But if a wheel tag (typically) takes precedence then pre-emptively doing it for a simpler API makes sense to me.

The text was updated successfully, but these errors were encountered:

uranusjr · 2021-01-27T08:48:50Z

Regarding performance, since the curent implementation does not validate the tags at all anyway, maybe we can just return everything between version and .whl unparsed, and provide an additional function to parse it (maybe directly into Tag instances)?

pfmoore · 2021-01-27T09:19:22Z

If we're considering performance to this level, we could also say we shouldn't parse the version into a Version object, or the tags into a set of Tags objects, as that has a cost. IMO, we should parse all of the data into user-friendly forms, and applications that are doing significant amounts of bulk parsing should consider whether they want to do something lower-level. That may mean that this function isn't appropriate for all use cases, but that's fine by me.

As a data point, I just scanned all of the wheels on a slightly out of data local copy of all the PyPI data, and I see 1.4M wheels that don't use build tags, and 1700 that do. So I don't think parsing build tags is a problem in practice 😉 (Disclaimer - I did this by counting dashes, and there were some wheels with 3 dashes and one with 6, which suggests that at least some of the 5-dashes cases may be malformed wheel names, which means the number actually using build tags could be even lower).

brettcannon · 2021-01-27T18:57:45Z

Thanks for that data point, @pfmoore ! It sounds like just going ahead and doing the build tag parsing is the right move as they are seemingly very rare.

Then I'm advocating for () to be the default value (for easy not checks), we do the parse eagerly, and we introduce the following:

BuildTag = typing.NewType("BuildTag", Union[Tuple[], Tuple[int, str]])

See https://github.com/pypa/packaging/blob/main/packaging/utils.py#L15 for an example of us already using type aliases for structured/formatted types.

If that works for people I can do a PR to update the code for that.

pfmoore · 2021-01-27T19:00:05Z

Works for me.

brettcannon · 2021-01-27T21:31:03Z

I practically have a PR ready with the proposed change; should have the PR opened before the end of the day.

brettcannon · 2021-01-28T01:20:40Z

PR is up for review!

brettcannon added enhancement packaging.utils labels Jan 26, 2021

This was referenced Jan 26, 2021

Add functions for parsing wheel and sdist filenames #387

Merged

New release with ability to load 10.9 Universal2 wheels on ARM #385

Closed

brettcannon mentioned this issue Jan 28, 2021

Change the return type for the build number portion of utils.parse_wheel_filename() #390

Merged

brettcannon closed this as completed in #390 Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389

(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389

brettcannon commented Jan 26, 2021

uranusjr commented Jan 27, 2021

pfmoore commented Jan 27, 2021

brettcannon commented Jan 27, 2021

pfmoore commented Jan 27, 2021

brettcannon commented Jan 27, 2021

brettcannon commented Jan 28, 2021

(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389

(Potentially) tweak build tag return values for utils.parse_wheel_filename() #389

Comments

brettcannon commented Jan 26, 2021

Return an empty string

Return a tuple

Perf consideration

uranusjr commented Jan 27, 2021

pfmoore commented Jan 27, 2021

brettcannon commented Jan 27, 2021

pfmoore commented Jan 27, 2021

brettcannon commented Jan 27, 2021

brettcannon commented Jan 28, 2021