Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0107] Nixpkgs version attribute usage normalization #107

Closed
wants to merge 19 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 182 additions & 0 deletions rfcs/0107-version-normalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
---
feature: nixpkgs_version_normalization
start-date: 2021-09-24
author: Anderson Torres
co-authors:
shepherd-team:
shepherd-leader:
related-issues:
---

# Summary
[summary]: #summary

Normalize the `version` attribute used in Nixpkgs' expresions.

# Motivation
[motivation]: #motivation

Nowadays, the most commonly used format for the `pname` and `version` attributes
along the Nixpkgs' expressions is:

- For stable releases:
- `pname` reflects the software name;
- `version` reflects the released version.
- For unstable releases:
- `pname` reflects the software name;
- `version` is a string following the format `unstable-YYYY-MM-DD`, where
`YYYY-MM-DD` denotes the date when the code was generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not quite right. The date is the date of the commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am avoiding to use VCS (especially git)-loaded vocabulary.

Do you think "commit" is a better wording regardless of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the date of the change or when the change was published?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording «date when the change was made or published» sounds better than code being generated to me, too.


This is a simple and easy-to-understand format.

However, it does not map very well with the Nix function
`builtins.parseDrvName`:

```example
# expected: { name = "mpv"; version = "0.35.15"; }
nix-repl> builtins.parseDrvName "mpv-0.35.15"
builtins.parseDrvName "mpv-0.35.15"
{ name = "mpv"; version = "0.35.15"; }

# expected: { name = "mpv"; version = "unstable-2021-05-03"; }
nix-repl> builtins.parseDrvName "mpv-unstable-2021-05-03"
builtins.parseDrvName "mpv-unstable-2021-05-03"
{ name = "mpv-unstable"; version = "2021-05-03"; }
```

It happens because the `version` attribute returned by `builtins.parseDrvName`
starts with a digit. This is not a bug, but a deliberate design decision of Nix
specification; therefore we should strive to follow it, neither circumventing
nor ignoring it.

That being said, `version` attribute should start with a digit.

Furthermore, `version` attribute should follow the semantics for upgrading as
stated in `nix-env` manpage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this motivation a bit lacking. From my experience there is quite strong consensus that nix-env and builtins.parseDrvName have some inherent flaws (which have already caused all sorts of problems, see NixOS/nixpkgs#114995, NixOS/nixpkgs#118481 and NixOS/nixos-homepage#672) and we should be getting rid of them as soon as feasible.

In that sense the RFC solves a problem that would be disappear anyway. Since the new Nix CLI would introduce a replacement fornix-env which hopefully doesn't exhibit these flaws.

On the other hand having a solution for the meantime could be interesting, since the new nix profile subcommand seems to heavily rely on flakes which means it can't be a stable replacement for nix-env in Nix 2.4. This is just speculation though, I'm not sure what @edolstra has planned here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the BDFL said there is no flaw in parseDrvName. It just splits name and version from a given string, returning a {name, version} set.

I don't think the examples given above point to a flaw in it.


This document describes a format suitable to fix this issue while striving for
simplicity.

# Detailed design
[design]: #detailed-design

## Terminology

_Disclaimer_: whereas some of the terms enumerated below are borrowed from
[pkgsrc guide](https://www.netbsd.org/docs/pkgsrc/) plus a bit of terminology
employed by git, this document aims to be general, not being overly attached to
them.

- Program denotes the piece of software to be fetched and processed via Nixpkgs.
- This term makes no distinction about the immediate or intended uses of the
program; it can range from non-interactive programs to full GUI
applications, even including filesets intended as input to other programs
(such as firmwares and fonts).
- Team denotes the maintainers of the program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to map to the above linked guide? Otherwise if it is only relevant to this RFC we should replace team entirely with maintainer.

- This term makes no distinction among the various models of organization of a
team; it ranges from a solitary programmer to a business company - and
everything inbetween.
- Source denotes the origin of the program.
- This term makes no distinction among precompiled, binary-only or high-level
code distributions.
- Snapshot denotes the source of the program taken at a point of the time.
- Labeled snapshot denotes any snapshot explicitly labeled by its team.
- Unlabeled snapshot denotes any snapshot not explicitly labeled by its team.
- Release denotes any distributed snapshot, as defined by its team.
- Branch denotes a logical sequence of snapshots, as identified by the program's
team;
- Usually these branches are denoted by names such like `stable`, `master`,
`unstable`, `trunk`, `experimental`, `staging` etc.;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or by major version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A major version is more akin to a labeled snapshot, not a logical sequence of them. Indeed, major versions don't necessarily follow a continuous sequence. An example is Zig: between 0.8.0 and 0.8.1 there are many small, discrete snapshots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These only differ in patchlevel anyway.

I mean that PostgreSQL has branches 13.x and 14.x and some more.

Linux kernel, annoyingly, has stable branches differing just in minor version (although major.minor is just formatting for Linux kernel versions). So maybe a branch can be called by a version prefix instead of a name?


## Design

- For a labeled snapshot:
- `version` should be constituted of the version of the snapshot, as defined
by the upstream project, without any alphabetical characters (e.g. "v",
"rel") prepending it.
- Alphabetical characters after the first should be maintained, except
AndersonTorres marked this conversation as resolved.
Show resolved Hide resolved
optionally those clearly used as separators, in which case they are
replaced by dots (emulating a typical semver).
AndersonTorres marked this conversation as resolved.
Show resolved Hide resolved

- For an unlabeled snapshot:
- `version` should be constituted of a concatenation of the elements below in
this order:
- the version of the latest labeled snapshot, as defined above;
AndersonTorres marked this conversation as resolved.
Show resolved Hide resolved
- If the project never released a labeled snapshot, `0_0` should be used
as default.
Copy link

@ghost ghost Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the change from 0_0 to i.j impact lexicographical ordering?
I'm thinking most (all?) systems probably use ASCII values to sort, but it'd be good to both research this and explicitly consider it.

Copy link
Member Author

@AndersonTorres AndersonTorres Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version ordering is the same as used by Nix itself, being registered in the nix-env manpage.

I am treating it as previous knowledge here.

The idea of 0_0 is that, being an uncommon numbering scheme (underscores are not employed in Nixpkgs version attrs, as far as I remember), it will not clash anything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about systems other than Nix as well, as other programs operating with nixpkgs data may be using the version attribute for their own purposes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see in the nix repl 0_0 sorts as greater ('newer') than for example 0.1.

Copy link
Contributor

@kevincox kevincox Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also against 0_0. It is not very intuitive for readers. If we need to put a number to satisfy parseDrvName I think we should just use 0 or something more obvious like 0unreleased

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0_0" is [0, "_", 0] thus smaller than "0.0" but larger than "0"

Lexicographic comparisons applies to string components after splitting.

"0.pre" seems to be smaller than anything not starting with "0.pre.pre" (dot after zero is optional). I hope that "0.pre.pre" upstream version will not be a frequent issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the man page does not mention this but both . and - are special-cased separators that don't become part of any component! I had assumed the . was part of the evaluation.

Copy link

@milahu milahu Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy cow. just use sort -V aka sort --version-sort

printf "%s\n" 0.0.0 0.0.1 0.0 0_0 1.0 0.1 2.0 \
1.0-unstable-2023-02-04 1.0+unstable-2023-02-04 \
1.0-unstable.2023-02-04 1.0+unstable.2023-02-04 \
| sort -V

0.0
0.0.0
0.0.1
0.1
0_0
1.0
1.0+unstable-2023-02-04
1.0+unstable.2023-02-04
1.0-unstable-2023-02-04
1.0-unstable.2023-02-04
2.0

nix should produce the same sort order, otherwise i would call that a nix bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix does not produce the same order, pre is special-cased to sort earlier than in sort -V, and this is not a bug but a reasonable decision, compatibility with which we want to keep to avoid globally-breaking changes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, makes sense

builtins.sort (a: b: (builtins.compareVersions a b) < 0) [ "000" "00" "0_0" "0" "0.0" "1.0" "1.0pre" "1.0-pre" "1.0post" "1.0-post" "1.0-unstable-2023-02-04" "1.0-unstable.2023-02-04" ]

[ "000" "00" "0" "0_0" "0.0" "1.0pre" "1.0-pre" "1.0" "1.0post" "1.0-post" "1.0-unstable-2023-02-04" "1.0-unstable.2023-02-04" ]

- the string `+unstable=YYYY-MM-DD`, where `YYYY-MM-DD` denotes the date
the mentioned unlabeled snapshot was distributed.

- Some atypical programs have special considerations:
- Linux kernel modules:
- Besides the rules established above for typical snapshots (whether labeled
or unlabeled), `version` shoud have appended `+linux=XX.XX.XX`, where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that done right now? Is there already a consensus for kernel modules that we need to change as little as possible?

Copy link
Member Author

@AndersonTorres AndersonTorres Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consensus, as far as I know, is to append +kernel=<kernel version employed to build this module>. I just switched it to linux, aiming to keep some generality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make any difference for us if we would keep kernel instead of linux? Maybe someone that often works on kernel modules could give some insight here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, no. Someone suggested it was to disambiguate with other possibilities like kfreebsd...

`XX.XX.XX` is the corresponding Linux kernel version used to build the
aforementioned module.

# Examples and Interactions
[examples-and-interactions]: #examples-and-interactions

Some useful examples:

- Bochs is a typical Sourceforge-hosted project; its labeled snapshots can be
fetched from tarballs obtained via URLs like
<https://sourceforge.net/projects/bochs/files/bochs/2.6.11/>; for this
example, we have `pname = "bochs"; version = "2.6.11";`.

- MPV is a typical Github-hosted program; its labeled snapshots can be fetched
from tarballs obtained via URLs like
<https://github.com/mpv-player/mpv/releases/tag/v0.33.1>; for this example, we
get rid of the `"v"` prepended to the version tag: `pname = "mpv"; version =
"0.33.1";`.

- SDL2 is hosted on Github; its latest labeled version can be downloaded from
<https://github.com/libsdl-org/SDL/releases/tag/release-2.0.14>; therefore we
have `pname = "SDL2"; version = "2.0.14";`.

_However_, this labeled version was released December 21, 2020, while the
latest change was done in May 28, 2021. Therefore, for this particular
unlabeled releases of SDL2, we have `pname = "SDL2"; version =
"2.0.14+unstable=2021-05-28";`.

- Cardboard is a typical Gitlab-hosted program; it has no labeled release yet,
therefore we use `0.0.0` as default stable version; the latest commit was made
AndersonTorres marked this conversation as resolved.
Show resolved Hide resolved
on May 10, 2021; therefore, we have `pname = "cardboard"; version =
"0_0+unstable=2021-05-21";`.

AndersonTorres marked this conversation as resolved.
Show resolved Hide resolved
- The Linux drivers for Realtek rtl8192eu can be fetched from a Github page,
<https://github.com/Mange/rtl8192eu-linux-driver>. It has no labeled release;
the latest code is from May 12, 2021. Supposing e.g. it was built for Linux
kernel version 5.10.01, we therefore have `pname = "rtl8192eu"; version =
"0_0+unstable=2021-05-12+linux=5.10.01";`.

# Drawbacks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repology compatibility and formats used by other distributions needs investigation here.

If we break compatibility with the version formats present on Repology, we create more noise for maintainers using this as a tool and may even impede some automation which uses repology as a data source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talk about this with the Repology maintainer. He thinks it is a good format, however he will not implement it because it is not the prevalent format in Nixpkgs.

[drawbacks]: #drawbacks

The main drawback is the conversion of the already existent expressions which
does not follow the format proposed here. It can require a degree of manual
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we would have another extremely big task to convert the entire tree to this. I think an automated way which is 95% correct from the current format should be preferred to make it very easy for us.

intervention. However, this task is easily sprintable and amenable to
automation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In some cases, the build system is passed the version, and might be confused by this format.
However, there are already precedents when Nixpkgs version and build system visible version differ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it also cause issues with testVersion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, although for pre-releases I would expect any use of testVersion to need overrides anyway… Maybe I am too pessimistic.

Copy link
Member

@davidak davidak Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing leading "R" causes the test to fail.

[davidak@gaming:~/code/nixpkgs]$ nix build .#vapoursynth.tests.version --no-link --print-build-logs
warning: Git tree '/home/davidak/code/nixpkgs' is dirty
vapoursynth> Version string '59' not found!
vapoursynth> The output was:
vapoursynth> VapourSynth Video Processing Library
vapoursynth> Copyright (c) 2012-2022 Fredrik Mellbin
vapoursynth> Core R59
vapoursynth> API R4.0
vapoursynth> API R3.6
vapoursynth> Options: -
error: builder for '/nix/store/x8dngsnq6v6him83z8f574pprc6kwnbp-vapoursynth-59-test-version.drv' failed with exit code 1

Workaround

Created wanted format in the test.

    tests.version = testers.testVersion {
      package = vapoursynth;
      version = "R${version}";
    };

Adding "+date=2021-05-25" to version causes the test to fail.

[davidak@gaming:~/code/nixpkgs]$ nix build .#vapoursynth.tests.version --no-link --print-build-logs
warning: Git tree '/home/davidak/code/nixpkgs' is dirty
vapoursynth> Version string '59+date=2021-05-25' not found!
vapoursynth> The output was:
vapoursynth> VapourSynth Video Processing Library
vapoursynth> Copyright (c) 2012-2022 Fredrik Mellbin
vapoursynth> Core R59
vapoursynth> API R4.0
vapoursynth> API R3.6
vapoursynth> Options: -
error: builder for '/nix/store/m7ig1gqkd1wyn008d9b61xmhf2g17vz5-vapoursynth-59+date=2021-05-25-test-version.drv' failed with exit code 1;

Workaround

Overwrite the version in the test that the build should have.

    tests.version = testers.testVersion {
      package = vapoursynth;
      version = "R59";
    };

@AndersonTorres and everyone: Should we keep the leading "R" in this case where it is part of the official upstream release version as displayed in --version or document and use these workarounds?

Even the format with leading v is often part of the official version.

https://github.com/NixOS/nixpkgs/blob/350fd0044447ae8712392c6b212a18bdf2433e71/pkgs/tools/misc/adrgen/default.nix#L21-L25

So should we respect upstream choice or change the version format, because we don't like it?

# Alternatives
[alternatives]: #alternatives

The alternative is doing nothing. The impact of it is keeping the Nixpkgs
codebase confusing, less discoverable and incompatible with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add inconsistent to the list.

`builtins.parseDrvName`.

# Unresolved questions
[unresolved]: #unresolved-questions

- Allow some degree of freedom and extensibility for handling deviations, such
different, non-standard naming schemes or unusual releasing schedules
eventually employed by many teams.

- Interactions between `pname` and `version`, like multi-branch releases.

# Future work
[future]: #future-work

- Update expressions that do not follow this proposal.
- Update manuals and related documentation in order to reflect this proposal for
future expressions.