Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/stdenv/cross-compilation.chapter.md: explain tuples #180030

Closed
wants to merge 9 commits into from
Closed

doc/stdenv/cross-compilation.chapter.md: explain tuples #180030

wants to merge 9 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jul 3, 2022

Description of changes

This commit adjusts the documentation for the config field of {host,build,target}Platform in the following ways:

  1. It clarifieds that stdenv uses multiarch tuples, which are the same thing as autoconf tuples except for two cases involving 32-bit architectures where autoconf tuples are ambiguous.

  2. It mentions and gives examples of the 5-field form for tuples, which include both a libc and an abi.

  3. It documents that this field should be canonicalized, and explains where to find the canonicalization rules.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

This commit adjusts the documentation for the `config` field of
`{host,build,target}Platform` in the following ways:

1. It clarifieds that stdenv uses multiarch tuples, which are the same
   thing as autoconf tuples except for two cases involving 32-bit
   architectures where autoconf tuples are ambiguous.

2. It mentions and gives examples of the 5-field form for tuples,
   which include both a `libc` and an `abi`.

3. It documents that this field should be canonicalized, and explains
   where to find the canonicalization rules.
@ghost ghost mentioned this pull request Jul 3, 2022
13 tasks
@github-actions github-actions bot added the 8.has: documentation This PR adds or changes documentation label Jul 3, 2022
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 3, 2022
Copy link
Member

@sternenseemann sternenseemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes, I wouldn't document it in this way, by referring to Debian's multiarch triples, but the intention is good. If you're interested, I have a note somewhere on my laptop from when I researched this topic, maybe that's helpful for your purposes.

doc/stdenv/cross-compilation.chapter.md Outdated Show resolved Hide resolved
@@ -44,7 +44,14 @@ The exact schema these fields follow is a bit ill-defined due to a long and conv

`config`

: This is a 3- or 4- component shorthand for the platform. Examples of this would be `x86_64-unknown-linux-gnu` and `aarch64-apple-darwin14`. This is a standard format called the "LLVM target triple", as they are pioneered by LLVM. In the 4-part form, this corresponds to `[cpu]-[vendor]-[os]-[abi]`. This format is strictly more informative than the "Nix host double", as the previous format could analogously be termed. This needs a better name than `config`!
: This is a 3-, 4-, or 5- component shorthand for the platform. Examples of this would be `x86_64-unknown-linux-gnux32`, `aarch64-apple-darwin14`, and `mips64el-unknown-linux-muslabin32`. This is a standard format called the "[multiarch tuple](https://wiki.debian.org/Multiarch/Tuples)", as [pioneered by autoconf](https://www.gnu.org/software/autoconf/manual/autoconf-2.65/html_node/System-Type.html#System-Type), [disambiguated to create multiarch](https://wiki.debian.org/Multiarch/Tuples#Used_solution), and adopted by LLVM. In the 5-part form, this corresponds to `[cpu]-[vendor]-[os]-[libc][abi]`; note that there is no hyphen separating the `[libc]` field from the `[abi]` field. This format is strictly more informative than the "Nix host double", as the previous format could analogously be termed. This needs a better name than `config`!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd honestly leave Multiarch out of this. It is yet another system that has not necessary anything to do with us, since it is what Debian does. autoconf OTOH is used for figuring out builtins.currentSystem, so it makes sense why it is a point of reference for us.

Another problem about multiarch is that we actually don't do multiarch (in the sense of the directory layout) and getting a multiarch (i.e. multilib) gcc to work is a struggle (if it's even possible?), so using the term may lead to confusion.

Note also that autoconf itself calls its (1), 2, 3 or 4 component platform strings target triplets, canonical system type or canonical name.

Considering the string to be up to 5 components is a bad idea, especially for documentation purposes. Both nixpkgs and autoconf only allow up to 4 dash-separated components. While autoconf surely globs on the 4th components (in fact it probably does on all of them), considering it as (sometimes) multifielded is a stretch. Nixpkgs treats it as an opaque string and this is, in my opinion, the correct interpretation, even though some of those strings relate to each other via some structure. The used libc is derived from the ABI in nixpkgs and never directly derived from the platform string.

Nix host double is a bad term, it'd be better to call them Nix systems or Nix system doubles for consistency. It's also not the “previous format”, but very much used today.

Copy link
Author

@ghost ghost Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested, I have a note somewhere on my laptop from when I researched this topic, maybe that's helpful for your purposes.

Yes, I am interested in that.

I'd honestly leave Multiarch out of this.

Okay.

Note also that autoconf itself calls its (1), 2, 3 or 4 component platform strings target triplets

I think is slightly bonkers to call a four-component thing a "triple".

Both nixpkgs and autoconf only allow up to 4 dash-separated components

Unfortunately the last two components are not separated by a dash.

The used libc is derived from the ABI in nixpkgs

This is not true! Both mips64el-linux-gnuabin32 and mips64el-linux-muslabin32 use the same ABI: n32 yet they use different libcs. If the libc were derived from the ABI then this would be impossible!

This is exactly the sort of confusion that comes from people pretending that the last two components are really one big component.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix host double is a bad term, it'd be better to call them Nix systems or Nix system doubles for consistency. It's also not the “previous format”, but very much used today.

FWIW I didn't write that sentence, it was preexisting text. I have however updated it with your recommendation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true! Both mips64el-linux-gnuabin32 and mips64el-linux-muslabin32 use the same ABI: n32 yet they use different libcs. If the libc were derived from the ABI then this would be impossible!

This is exactly the sort of confusion that comes from people pretending that the last two components are really one big component.

As you can see, nixpkgs does not consider these ABIs the same:

nix-repl> :p (lib.systems.elaborate { config = "mips64el-linux-gnuabin32"; }).parsed.a
{ _type = "abi"; abi = "n32"; name = "gnuabin32"; }

nix-repl> :p (lib.systems.elaborate { config = "mips64el-linux-muslabin32"; }).parsed.
{ _type = "abi"; abi = "n32"; name = "muslabin32"; }

The libc is derived in after parsing the triple itself by looking at the ABI, but you can of course also theoretically override this (lib.systems.elaborate { config = "mips64el-linux-muslabin32"; libc = "glibc"; } (where the ABI would stay musl and even isMusl == true)). This is why I think it makes more sense to treat the ABI component as some opaque string which we can sometimes extract more information out of (n32 ABI, musl “ABI”).

Overall I guess it is a bit of a problem that we conflate ABI and libc in nixpkgs in a weird way which is also not done by autoconf (the third component for autoconf can either be os or kernel-system which is much vaguer (e.g. linux-musl)).

I think is slightly bonkers to call a four-component thing a "triple".

Yes, but I'm not sure about a good alternative that is also in use; LLVM's triples are also not always three component strings.

Yes, I am interested in that.

https://gist.github.com/sternenseemann/a00d91b8e58cca3e18792771483b4c25

Copy link
Author

@ghost ghost Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, nixpkgs does not consider these ABIs the same:

That is an egregious nixpkgs bug which really should be fixed.

n32 is an ABI and so is x32; gnuabin32, muslabin32, gnuabix32, and muslabix32 are not ABIs.

musl “ABI”).

Musl isn't an ABI... they maintain a list of ABIs they support and links to what they use as the definition for each ABI.

ABIs exist even when there is no libc around:

  • A statically-linked no_std rust binary will have no libc involved at all (neither "musl" nor "gnu" nor anything else), yet it still has to pick an ABI if it wants to make system calls into the kernel.
  • When enabling seccomp (for example for the nix sandbox) a process needs to declare what ABIs it will use, in order to enable system call filtering. This is totally independent of libc choice.
  • file can easily detect the ABI of a statically-linked ELF binary (it's part of the magic bytes), but not which libc (if any) it includes

(the third component for autoconf can either be os or kernel-system which is much vaguer (e.g. linux-musl)).

As far as I can tell autoconf considers any part matching the regex e?abi[^-]*$ to be a comment field. As I mentioned elsewhere, arm-unknown-linux-eabicrazypants is a valid autoconf-name.

I think is slightly bonkers to call a four-component thing a "triple".

Yes, but I'm not sure about a good alternative that is also in use

canonical?

doc/stdenv/cross-compilation.chapter.md Outdated Show resolved Hide resolved
@ghost ghost marked this pull request as draft July 11, 2022 06:52
@ghost ghost closed this Jul 27, 2022
@sternenseemann
Copy link
Member

Got this closed by accident? I think we should still do this, I just don't have a lot of capacity to think about this atm.

@ghost
Copy link
Author

ghost commented Jul 27, 2022

Sure; I'm still interested in this I just no longer thing it will get resolved with a single PR.

I just don't have a lot of capacity to think about this atm.

Totally understand. When you do have a bit of time I would really appreciate it if you could take a look at #180964. It looks big and complicated but it's pretty easy to review. There are simple instructions for verifying that it does not change any behavior, so reviewing basically boils down to deciding it if it is a net improvement in readability (which I hope is obvious, but maybe not).

If #180964 merges then there are a bunch of one-liner corrections (which do change behavior) to that logic that I would like to submit. Those can be submitted/reviewed independently and concurrently. The control-flow restructuring in #180964 makes the need for those corrections (and their effects) much more obvious.

@ghost ghost reopened this Jul 27, 2022
Copy link
Member

@Ericson2314 Ericson2314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides my one comment looks goods. Thanks for opening this, and thank you @sternenseemann for taking care of the rest.

@ghost ghost marked this pull request as ready for review October 2, 2022 00:18
@ghost ghost requested a review from fricklerhandwerk as a code owner October 2, 2022 00:18
Copy link
Contributor

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend using the correct data types for the things you want to describe. Use callouts, use lists if you have multiple (similar) items. This is much easier to read.

Avoid long blocks of text. The manuals are already very dense, which makes them to read. Reference manuals are for looking things up, and information should be arranged accordingly to make that as easy as possible.

Always link to definitions of terms, otherwise readers will have to either guess or do their own research to find out what you mean.

Here are some general guidelines for writing documentation in the Nix ecosystem.

doc/stdenv/cross-compilation.chapter.md Outdated Show resolved Hide resolved
Comment on lines +50 to +51

This field should be *canonicalized*. The rules for canonicalizing a tuple are kept in the `config.sub` file in the source code for `gnu-config`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this means. Is "should be canonicalized" a recommendation to the user? What is gnu-configand where is this config.subfile? I suppose you mean this file?

Unless you want to clarify this, I'd just leave it out and fix later if the need arises.

Suggested change
This field should be *canonicalized*. The rules for canonicalizing a tuple are kept in the `config.sub` file in the source code for `gnu-config`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you moved that line to the end, which seems sensible. I still don't know what it means, so once you address the original comment, for me this is ready to merge.

Adam Joseph and others added 2 commits April 24, 2023 05:29
@ofborg ofborg bot added the 6.topic: cross-compilation Building packages on a different platform than they will be used on label Apr 24, 2023
@ghost ghost closed this Aug 5, 2023
@ghost ghost deleted the pr/manual/tuples branch August 5, 2023 20:56
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cross-compilation Building packages on a different platform than they will be used on 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants