Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of architectures in platform tags on macOS #381

Open
ronaldoussoren opened this issue Dec 31, 2020 · 10 comments
Open

Order of architectures in platform tags on macOS #381

ronaldoussoren opened this issue Dec 31, 2020 · 10 comments

Comments

@ronaldoussoren
Copy link
Contributor

The order of architectures in the result of tags.mac_platform() does not look at the type of python installation and prefers non-universal wheels over universal wheels. I'd prefer behaviour where tags.mac_platform() prefers universal wheels over non-universal wheels.

An example: when running on an intel mac "pip install foo" will prefer a wheel with tag "macosx_10_9_x86_64" over a wheel with tag "macosx_10_9_universal2", I'd prefer to install the universal2 wheel instead.

The primary motivation of this is that installation of "universal2" wheels makes it possible to use tools like py2app and pyinstaller to create application bundles that support both CPU architectures ("Universal 2 binaries" in Apple speak).

A disadvantage is slightly higher disk usage when universal2 wheels are used.

As an aside: In the long run I'd hope that most projects currently providing mac wheels will provide "universal2" wheels, but I expect that there will be "x86_64" wheels as well for the foreseeable future because only fairly recent versions of pip support universal2 wheels.

@henryiii
Copy link
Contributor

henryiii commented Jan 1, 2021

I really prefer the current, "most specific" wheel selection. It is easy to reason with, and gives control to the library authors. If a library provides both a universal wheel and a specific wheel, if I'm on that specific architecture, most of the time I don't want the universal wheel. Only in very specific cases, like py2app and pyinstaller, which could/should have a special mode to get only universal wheels; most "normal" packaging with pip would be negatively affected. Keep in mind that some package stacks, like PyTorch, Tensorflow, etc. are in the GB range, and Apple's default disk sizes are not that large.

It's also how other parts of package selection work - I can provide a non-platform specific wheel and then optimized platform specific ones, knowing that pip will fall back on the non-platform specific one when they can't get an optimized one.

TLDR; I'm strongly in favor of this being an option, but not the default. And strongly not in favor of not having a way to opt out if it is the default.

@uranusjr
Copy link
Member

uranusjr commented Jan 1, 2021

I think this is really down to different usages. Sometimes you want the most specific tag for the current platform, and sometimes the most permissive. There really is no “right” ordering.

Ideally packaging.tags should provide a mechanism to allow users to sort the tags how they want, but without such flexibility (and additional maintenance burden), the current implementation is the most useful for currently available downstream tools.

@ronaldoussoren
Copy link
Contributor Author

Right. There's multiple scenario's when installing, not just for installing but for deployment targets as well. But there's also an easy of use argument: with universal2 wheels you can "pip download ..." libraries and install them on other systems without having to think about the CPU architecture.

From past experience there's another reason I prefer universal wheels: Years back we had "intel" wheels with 32-bit and 64-bit code. The 64-bit code was preferable in general, except for a couple of packages that did not (yet) support 64-bit on macOS (mostly GUI code). In those scenario's it was very convenient that you could just install everything as a fat binary and launch python in 32-bit mode where needed without having to reinstall dependencies.

BTW. In general I'd expect that the size of binaries is not that relevant in the grant scheme of things, but the tensorflow definitely shows that this is not a general rule. The entire package is about 700 MB, 600 MB of which are shared libraries.

@henryiii
Copy link
Contributor

henryiii commented Jan 1, 2021

In those scenario's it was very convenient that you could just install everything as a fat binary and launch python in 32-bit mode where needed without having to reinstall dependencies.

I personally would never forcibly install a package that doesn't support the current architecture in a "shared" environment; I'd make a new environment "venv_x86_64" and pip install all the packages there in x86 mode. This will also work if a single dependency was not able to produce universal wheels, but had one of each, which prioritizing universal wheels and sharing the environment will not.

For TensorFlow, won't the 600 MB shared libraries all need to be universal as well to support a universal wheel? I'm assuming this will be roughly double, though have no idea how close this is to being accurate. I'm not against universal2 by any means, I just don't think it's the one and only answer. Packaging systems, like Homebrew and conda are not going Universal2, but are shipping arch-specific, as they always have.

Currently, shipping Universal2 + x86_64 is likely ideal for most packages. In a couple of years, maybe just Universal2. At some point, I'd expect to see universal2 becoming as irritating and undesirable as 32/64 fat binaries are today, and arm64 will be the most common wheel (probably around the time Python starts providing an official arm64-only installer).

@ronaldoussoren
Copy link
Contributor Author

In those scenario's it was very convenient that you could just install everything as a fat binary and launch python in 32-bit mode where needed without having to reinstall dependencies.

I personally would never forcibly install a package that doesn't support the current architecture in a "shared" environment; I'd make a new environment "venv_x86_64" and pip install all the packages there in x86 mode. This will also work if a single dependency was not able to produce universal wheels, but had one of each, which prioritizing universal wheels and sharing the environment will not.

I mostly agree. But scenario I described was years back when virtualenv wasn't as well known as it is now. I've since switched to virtualenvs for everything and minimal software in the global installation (basically just pip and pipx).

I do expect that there are still a lot of users that don't use virtual environments though.

For TensorFlow, won't the 600 MB shared libraries all need to be universal as well to support a universal wheel? I'm assuming this will be roughly double, though have no idea how close this is to being accurate. I'm not against universal2 by any means, I just don't think it's the one and only answer. Packaging systems, like Homebrew and conda are not going Universal2, but are shipping arch-specific, as they always have.

The sheer size of the binaries in Tensorflow is a good reason for not using Universal2 wheels there unless you really need it. I knew Tensorflow is a large package, but until I looked at it in detail I'd expected that most of the size would have been in data files and not in code.

Currently, shipping Universal2 + x86_64 is likely ideal for most packages. In a couple of years, maybe just Universal2. At some point, I'd expect to see universal2 becoming as irritating and undesirable as 32/64 fat binaries are today, and arm64 will be the most common wheel (probably around the time Python starts providing an official arm64-only installer).

That will take some time, Macs tend to live relatively long especially outside of business environments. That's one reason the installers on Python.org still support macOS 10.9, there are users "stuck" on that version of macOS for various reasons and supporting it in the installer is fairly easy.

New Intel Macs are still sold today, so it will be at least 3 years until all Intel Macs are out of AppleCare coverage. And systems tend to be used long after that, my previous system is about 6 years old by now and would have continued working fine if I wasn't lusting for the new hotness ;-)

For my own packages, in particular PyObjC, I currently ship x86_64 and univeral2 wheels. Both of them build and tested on an Apple Silicon system (and thanks to Rosetta I can test both architectures on that system). I expect to switch to to universal2-only in June or so.

@henryiii
Copy link
Contributor

henryiii commented Jan 1, 2021

I do expect that there are still a lot of users that don't use virtual environments though.

But most of those uses are unlikely to try mixing architectures manually as you described, I'd expect! And installing a non-working package into a shared environment should not be expected to be "recommended" for those sorts of users.

Is the Python 3.8.2 that ships with macOS 11 ARM CLT a universal build, or is it ARM only? Or does it even ship in that download at all? Even scarier, is there a Python 2.7 installed? I'd hope not, but it's still here on macOS 11 Intel. If there is a Python 3 in the CLT, then the version of Pip included in that (19.2.3 for me) will likely be very common on macOS for the next 12 months. I'd expect the CLT Python to be the most popular, followed by the Homebrew Python, with the manual download clocking in after that. Homebrew will update the Pip version pretty quickly, but the CLT one likely won't update till just before the next OS release, I expect.

@brettcannon
Copy link
Member

#161 may help with this. As @uranusjr pointed out, people have asked to make packaging.tags more configurable, but that's a big API design project and one no one has been willing to tackle (and make sure it can easily be maintained).

@ronaldoussoren
Copy link
Contributor Author

ronaldoussoren/pyobjc#370 is from a PyObjC user that runs into this: the user is on x86_64 where pip selects the x86_64 wheel for pyobjc, but would like to use the universal2 wheel because they are redistributing to various machines (Munki is a macOS management tool).

At the very least I'd love to see a command-line flag that makes it possible to prefer (or even require) universal2 wheels when installing.

@uranusjr
Copy link
Member

uranusjr commented Jul 2, 2021

I would say they should not use pip to do this, or maybe pip should have a mechanism to use --platform et al. to grab the universal2 wheel. Changing the tag order to satisfy this use case seems wrong to me.

@ronaldoussoren
Copy link
Contributor Author

I, obviously, don't agree especially on the "they should not use pip" part. There are two clear use cases here:

  1. Installing software for my machine
  2. Installing software that I can distribute to other systems running the same OS

For the first use case either behaviour works fine, and the current behaviour is slightly better. For the second only my proposed behaviour works, or you'd have to stop using pip. That is why I propose preferring universal2 wheels, that way installing "just works" for all users at a cost of slightly higher disk usage for some users.

For my own projects I'll stop providing single-architecture wheels to ensure all users get wheels they can use. And for my own use I'll probably start using a local package repository just to ensure that the right wheels are available (but that's more because the amount of wheels supporting macOS/arm64 at all is still depressingly small).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants