Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0119] Formalize testing for nixpkgs packages #119

Merged
merged 14 commits into from
Jun 30, 2022

Conversation

jonringer
Copy link
Contributor

@jonringer jonringer commented Dec 30, 2021

To better enable review software to run automatic testing, as well as help define when a PR is "tested enough"

rendered: https://github.com/NixOS/rfcs/blob/master/rfcs/0119-testing-conventions.md

@jonringer jonringer changed the title [0119] Formalize testing for nixpkgs packages [RFC 0119] Formalize testing for nixpkgs packages Dec 30, 2021
Comment on lines +67 to +68
- A quick trivial example (e.g. `<command> --help`) to demonstrate that one (or more)
of the programs were linked correctly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also testVersion introduced in NixOS/nixpkgs#121896.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to help mitigate regressions from appearing in release channels.

# Detailed design
[design]: #detailed-design
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand on what are the intended changes to the status quo?

passthru.tests is a name documented inside the manual, however nixosTests are recommended to be also put there.

(also, if sorting by resource consumption, maybe this split is not needed?)

Are we encouraged to compromise on something in the name of more test coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I wasn't sure what the status quo should be. My current thoughts are, "here is some addtional metadata you can add to ensure that people know how your package may break. Or add your package to the tests of other packages to ensure it's not broken."

@Ekleog
Copy link
Member

Ekleog commented Dec 31, 2021

FWIW, I think part of the statu quo comes from NixOS/nixpkgs#44439 (which switched to passthru.tests after that). That said, it'd probably be good to make it into a proper RFC, if only for documentation, now that we actually have a working RFC system! It'd also clarify which tests should go into which phase. (For instance, I'd maybe have put the --help into a full-blown test rather than an installCheckPhase, thinking that the environment would be closer to actual production environments this way)

That said, I'm not sure I feel really good about adding passthru.nixosTests: it'd require re-adjusting all automated tools like r-ryantm or ofborg to take this new attribute into consideration too. I'd probably feel better about this RFC adding a passthru.tests.nixos.<name> sublist. Though technically the previous RFC didn't allow such behavior, I think most tools would support such a change natively (at least my recollection of the changes needed to r-ryantm and ofborg are it should work naturally, as AFAIR nix recurses into the whole set and not a single layer when being asked to build a set)

Does that make sense? :)

- Running tests which include downstream dependencies.
- This avoids cyclic dependency issues for test suites.
- Running lengthy or more resource expensive tests.
- There should be a priority on making package builds as short as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you! 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I originally wrote this about sage, and sageWithTests. Which took even longer :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(looks sadly at NixOS/nix#3600 being unmerged) although for a given situation with a package, speeding up separate tests on their own would also not hurt

@jonringer
Copy link
Contributor Author

That said, I'm not sure I feel really good about adding passthru.nixosTests: it'd require re-adjusting all automated tools like r-ryantm or ofborg to take this new attribute into consideration too.

That shouldn't be much work. And be adopted slowly, much like passthru.tests.

I'd probably feel better about this RFC adding a passthru.tests.nixos. sublist. Though technically the previous RFC didn't allow such behavior, I think most tools would support such a change natively (at least my recollection of the changes needed to r-ryantm and ofborg are it should work naturally, as AFAIR nix recurses into the whole set and not a single layer when being asked to build a set)

Seems to be very similar to nixosTests. Also, nix-build will only build the top-level unless you do a recurseForDerivations. And that will only work if you do something like hydra-eval or nix-env -qa. If I do nix-build -A python3, It's not also going to build python3.pkgs, even though it's an attr.

Also, nix build will only build a derivation, nix-build will build a list, attr set, or derivation.

@7c6f434c
Copy link
Member

FWIW, I think part of the statu quo comes from NixOS/nixpkgs#44439 (which switched to passthru.tests after that). That said, it'd probably be good to make it into a proper RFC, if only for documentation, now that we actually have a working RFC system! It'd also clarify which tests should go into which phase.

I think the documentation proper, i.e. Nixpkgs manual, already describes quite a few of things about tests. So if we want to establish substantially new recommendation (even if on purely advisory level), we should say which parts are new, if we think the manual misses minor points of current practice, we could just fix it there, and if RFC is a better point of documentation than the manual, then maybe we should say it out loud so that this problem can be discussed on its own.

@Ericson2314
Copy link
Member

I add that checkPhase and installCheckPhase are always second-choice options, i.e. a separate derivation, no matter how tiny, is always preferable if it can be cleanly separated from the build.

@Ericson2314
Copy link
Member

Separately, I dislike the fact that nixosTests Currently violate the nixpkgs vs nixos layering. If we could make them them functions that are passed some nixos stuff I would prefer that. This is orthogonal to rest of the RFC, but if we are formalizing things maybe it's good time to think about this stuff.

@jonringer
Copy link
Contributor Author

I add that checkPhase and installCheckPhase are always second-choice options, i.e. a separate derivation, no matter how tiny, is always preferable if it can be cleanly separated from the build.

On the contrary, a small test suite which can quickly identify whether or not a package should be considered "correct" I think is highly desired. Until we have tooling in place to do something like, "please test everything as well", then I'm going to still insist on unit tests as a good practice.

Maybe in another RFC, we can standardize around making a passthru.tests.check and passthru.tests.installCheck the normal.

Separately, I dislike the fact that nixosTests Currently violate the nixpkgs vs nixos layering. If we could make them them functions that are passed some nixos stuff I would prefer that. This is orthogonal to rest of the RFC, but if we are formalizing things maybe it's good time to think about this stuff.

Sure... but nixpkgs is for packaging software, and nixos can be thought of as the practice of combining it. It's immensely useful to verify that it works as intended with other software.

It's a better user experience if we know that a nextcloud upgrade breaks the nextcloud+mysql integration. As that's one way that people are likely to leverage it.

The whole purpose of the RFC is to make changes with more certainty. From a committer's perspective, to change "I'm scared to press merge on something " to, "The build passed, the tests passed, the coupled packages passed, the nixos tests passed. I have a high degree of certainty that merging this is a good idea." Or even enable the ability to confidently do a "merge when CI passes".

@jonringer
Copy link
Contributor Author

Separately, I dislike the fact that nixosTests Currently violate the nixpkgs vs nixos layering.

meanwhile NixOS/nixpkgs#153551

@AndersonTorres
Copy link
Member

I add that checkPhase and installCheckPhase are always second-choice options, i.e. a separate derivation, no matter how tiny, is always preferable if it can be cleanly separated from the build.

It looks like a good idea!
I ponder about a tests.nix file and a passthru.tests = import ./tests.nix;.

# Alternatives
[alternatives]: #alternatives

Continue to use current ad-hoc conventions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rfcs/0119-testing-conventions.md Outdated Show resolved Hide resolved
`checkPhase` or `installCheckPhase` when packaging within nixpkgs.

Usage for `passthru.nixosTests.<name>`
- Reserved for tests utilitizing the nixosTest utilties.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem a meaningful distinction to me. Maybe we should flip this.

  • Reserved for tests that are more resource intensive or that require additional system features such as kvm.
    • Generally these are tests using the nixosTest utilities.

Of course this makes the name less meaningful, but we can bikeshed that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep it consistent in plurality with the already used passthru.tests. Also pkgs.nixosTests is how it's exposed in nixpkgs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate name makes testing everything slightly harder. Today, we can do nix-build -A postgresql.tests and expect everything to run. This won't be the case anymore.

This also has a runtime cost, whereas standardizing on pkg.tests.nixos does not, as the tests attr value won't be evaluated in normal use as a dependency of something. A separate attr has the overhead of an extra attr and an extra thunk, for each package. We've decided against a pkg.exe alias for this reason (abs path to mainProgram).

Usage for mkDerivations `installCheckPhase`:
- A quick trivial example (e.g. `<command> --help`) to demonstrate that one (or more)
of the programs were linked correctly.
- Assert behavior post installation (e.g. python's native extensions only get installed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "only" in this sentence is confusing me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native extensions only get installed. However, most test suites will consume the code in the build directory. So tests will fail because the compiled extensions will not be present.

I'm not sure how to word this better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe start with the phrase about the build directory?

None? This is opt-in behavior for package maintainers.

# Alternatives
[alternatives]: #alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative is that we consider all dependent packages as tests. We can have dependent packages that are just tests for example a testFoobar package to test foobar.

Then a PR author would responsible that all dependents build (aka pass) or are marked broken.

The obvious issue here is that for packages with lots of dependents it becomes infeasible for the average author to run a tool that builds everything and marks failures as broken. I think it is worth mentioning this alternative because this RFC demonstrates a clean way to define an appropriate sample size. Then it is expected that nixpkgs-provided build resources can be used for the full build + mark as broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and this is the compromise with passthru.tests listing downstream dependencies. The idea is to list packages which are "likely to break with breaking changes".

For example, some packages may make use of many of systemd's features, however, other packages only really use libudev, which is much more stable. We could probably forego the libudev packages, and just list the packages which are using systemd's more advanced features.

How far should testing go?
- What consistitutes that "enough testing" was done to a package before a change was merged?

Should `<packaga>.passthru.tests` be flat?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote yes. If you need some sort of differentiation you can use a naming convention such as ownerSomething. For now these conventions can be per-package and if we see enough of these arise we can consider unifying them or adding more structure once we have clear use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means: use a list and not a set? I guess two reasons not to flatten:

  1. If the tests are slow, I'm sure it'll occasionally help someone to be able to target a specific one?
  2. If there's enough speculative value in having some ~well-known attrs dedicated to a specific purpose (as in bullet 2 of https://github.com/NixOS/rfcs/pull/119/files#r805415330)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant for it to be as a single attr set. I wanted to avoid: passthru.tests.scenarioA.variantB.test

Should `<packaga>.passthru.tests` be flat?
For packages which have extremes in resource usage when testing (e.g. pytorch), it may
be beneficial to have additional structure for the tests to denote expectations of resources
and ownership of testing for upstream packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this differentiation can be solved by nixosTests vs tests.

@kevincox
Copy link
Contributor

I nominate myself and @Ericson2314 as shepherds.

@Profpatsch
Copy link
Member

Separately, I dislike the fact that nixosTests Currently violate the nixpkgs vs nixos layering. If we could make them them functions that are passed some nixos stuff I would prefer that. This is orthogonal to rest of the RFC, but if we are formalizing things maybe it's good time to think about this stuff.

Sure... but nixpkgs is for packaging software, and nixos can be thought of as the practice of combining it. It's immensely useful to verify that it works as intended with other software.

It's a better user experience if we know that a nextcloud upgrade breaks the nextcloud+mysql integration. As that's one way that people are likely to leverage it.

I think it’s fine in this context, because it is not provided to nixos, but nixos is used as an implementation detail to run the tests, which could in principle also be done by a different runtime.

In addition, due to the functional & lazy nature of nix it doesn’t actually incur a static dependency on nixpkgs in the derivation, so, there is no technical layering breakage either.

Comment on lines 42 to 44
Standardize `passthru.tests.<name>` and `passthru.nixosTests.<name>` as a mechanism of
more expensive but automatic testing for nixpkgs. As well as encourage the usage of
`checkPhase` or `installCheckPhase` when packaging within nixpkgs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused where to draw the line between what should be done in installCheckPhase and non-vm passthru.tests. When reviewing this has been a source of confusion, see: NixOS/nixpkgs#153496 as an example. (Reading this I'm no longer convinced the decision to move it from installCheckPhase was the right one :>)

cc @thiagokokada so that you can add your thoughts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, installCheckPhase is similar to checkPhase, in what both means that if this test fail I don't even want this package to be build (that as far I had understand in the NixOS/nixpkgs#153496, that was the case, since that was a very basic functionality check for that package).

passthru.tests, AFAIK, allows the package to be build regardless and would only fail if you try to run the package tests itself. That may be an advantage for packages that has too many tests or the tests uses too much resources to run (e.g.: maybe the tests uses much more memory than the build phase). Both are not the case of NixOS/nixpkgs#153496 though, since the check had a minimal impact on total build time.

So yeah, between this and the fact that to use passthru.tests we had to resort to a clear hack, I would much prefer that installCheckPhase was kept.

Copy link

@thiagokokada thiagokokada Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, IMO, VM-less passthru.tests makes more sense for tests like the one described in this document: maybe we want to see how this particular package being build interacts with other packages, or some other kinda of integration testing.

But if the idea is test only the current package, installCheckPhase generally makes more sense to me (of course, there could be exceptions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we want expensive build steps spread among different derivations, but that's hard. Expensive tests are sometimes feasible to split, so splitting them is good when it doesn't create a mess.

(Cheap checks with no new deps do not follow this logic, of course)

Also, separated tests make it more practical to support packages with different level of «works» on different platforms, because we do not need to touch the main build causing expensive rebuild on the «happy» platforms to add partial support (and its tests) on the more exotic ones. There is no single «package is broken», in the black-and-white view we have never had a non-broken LibreOffice packages, for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated tests also make it easier to debug failures, since you can easily access the package (which may build successfully but then fail the test). I think we should follow something like a "principle of least privilege", where it is best practice to always use the test that makes the fewest assumptions about its environment.

checkPhase assumes it is run within the build environment. That's a lot of assumptions. This increases build time for quick iteration, makes it harder to debug failures and also makes the test results less meaningful since the environment differs from the end-user's environments.

installCheckPhase has somewhat fewer assumptions than checkPhase, but still many of the same disadvantages. The results are more meaningful though, since the environment is more similar to the end-user's environment.

passthru.tests makes the fewest assumptions. If the tests pass in this "clean environment", they will likely pass in the user's environment as well (unless tests are actually broken by some environment variable or something similar). They can be run as-needed (but should be run (semi-)automatically in some reliable fashion). They make debugging easier.

passthru.nixosTests assumes access to a nixos VM, but does not assume a special build environment.

As a result, I'd suggest to pick passthru.tests > passthru.nixosTests > installCheckPhase > checkPhase. Practicality is king of course: If upstream assumptions make it trivial to run the test suite in checkPhase but hard to run them in passthru.tests, then its better to have some tests in checkPhase than to have no tests at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much ~ecosystem perspective on this, but I do have a few scattered thoughts from authoring and maintaining a small number of packages (broadly, my thinking has been aligning with timokau over time):

  1. I like passthru.tests at the front of the list. I can think of at least a few times I've been in the middle of some meaningful work and ended up derailed by an immediate or remote test or test-dependency failure:

    • I've started to see a check/test dependency on a linter or formatter as an antipattern in my own work. At least twice I've had a linter/formatter package turn up broken, or a bump in nixpkgs to trigger a rebuild that fails due to a lint/format check.

    • Recently I noticed that resholve's CLI arg parser dependency wasn't building on aarch64-darwin due to some failing tests. This was a pain since it was a platform I don't have readily available. I ended up overriding the dependency to disable checks for expediency, since resholve's own test suite (which I factored out into passthru.tests a while back) already exercises the CLI and should fail on borg/hydra if it did ultimately depend on whatever caused the test breaks.

  2. I've thought for a while that it would be nice to have a low-friction mechanism for collecting example uses (say, a nudge to submit them + a CLI/form/template for doing so? Examples themselves might just be a valid invocation of some CLI, a sample file that exercises some parser, etc.) from package users. It occurs to me now that if there was a ~standard passthru.tests.x attr for these, it could at least drive the nudge (i.e. example, if you open a nix-shell with something that has fewer than N?)

  3. I like prioritizing installCheckPhase over checkPhase. I have some minor insecurity about checkPhase providing false-confidence about something that installPhase or fixupPhase then breaks (especially since resholve presently runs in fixup).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've avoided installCheckPhase for fear of corrupting the outputs. NixOS/nixpkgs#143862 should counter that.

@spacekookie spacekookie added status: open for nominations Open for shepherding team nominations and removed status: new labels Jan 26, 2022
@spacekookie
Copy link
Member

A bit late to the party but this RFC is open for nominations :)

@lheckemann
Copy link
Member

Current nominations:

More nominations to move this RFC forward are welcome!

@7c6f434c
Copy link
Member

7c6f434c commented Feb 9, 2022

I nominate: @cole-h @Mic92 @Ekleog @Profpatsch @timokau @SuperSandro2000 (fully expecting most to say they do not have time right now)

@Mic92
Copy link
Member

Mic92 commented May 28, 2022

We are now in the Final Comment Period (FCP). Unless any major objections are raised, this will be accepted and merged in 10 days from now, on June 6th!

Please post any final comments on this pull request.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/rfc119-formalize-testing-for-nixpkgs-packages-is-now-in-fcp/19372/1

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to omit passthru as described in the comment and my earlier comment https://github.com/NixOS/rfcs/pull/119/files#r857168303

Short process question, wouldn't it make sense for the rfc author to address all comments before entering the final comment period? Feel free to ignore this if the author just forgot to push.

# Detailed design
[design]: #detailed-design

Standardize `passthru.tests.<name>` as a mechanism of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <name> syntax suggests that the namespace is flat, but it isn't. Both ofborg and nix-build respect the recurseForDerivations attribute (aka recurseIntoAttrs), which is a valuable behavior

  • for authors, adding an attribute is easier than merging attribute sets
  • for "testers", the computation of the set of attribute names is lazier, improving evaluation performance when they only run a specific test or set of tests

passthru is an implementation detail of mkDerivation that must not be part of this specification. You could mention it once as a note to guide package authors, and you can use it in examples with mkDerivation, but it must not be part of the package specification. (See also for an attempt to define "package" NixOS/nix#6507)

Suggested change
Standardize `passthru.tests.<name>` as a mechanism of
Standardize `<pkg>.tests` as a mechanism of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't $tests be populated in each build environment if this is true? the inclusion of passthru was because those get pruned, and is the current convention. (not to say that current conventions aren't flawed, and should be changed)

@kevincox kevincox added status: FCP in Final Comment Period and removed status: in discussion labels Jun 1, 2022
@edolstra
Copy link
Member

@Mic92 Since FCP has passed, do you want to go ahead and merge this, or does it need further discussion?

@kevincox
Copy link
Contributor

I see one unresolved comment. Maybe we should wait for a response or to integrate that into the RFC?

@roberth
Copy link
Member

roberth commented Jun 15, 2022

My objections against pkg.passthru.tests in favor of pkg.tests have not been discussed or implemented yet.

@jonringer
Copy link
Contributor Author

My objections against pkg.passthru.tests in favor of pkg.tests have not been discussed or implemented yet.

I'm not very knowledgeable about the implications of #92, so I'll defer to your expertise. I made a mention that it was previously passthru.tests as the convention.

more expensive but automatic testing for nixpkgs. As well as encourage the usage of
`checkPhase` or `installCheckPhase` when packaging within nixpkgs.

Criteria for `tests.<name>`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.<name> suggests the type attrsOf package, whereas in practice we make use of nix-build's attrset traversal capability to build NixOS tests. They're approximately attrset trees with derivations at the leaves; elaborated in the GitHub suggestion below.

Nested attribute sets of packages have already been implied earlier in the discussion, but haven't made it explicitly into the document yet.
Noteworthy advantages of nested attribute sets

  • easier to write than merges (//)
  • not susceptible to name collisions
  • allows the selection of groups of derivations by path
  • lazier. // forces the computation of all the attribute names, which may be slow when an expression library is involved

If you don't want to go into the technical details, I guess you could roughly summarize this as "tools should align with nix-build's behavior", but it seems like some elaboration of that would be expected for "Formalize testing for nixpkgs packages".
The following is not a complete description of nix-build and allows some divergence (in the mathematical sense as well), as I believe defaulting recurseForDerivations to false is counter-intuitive. Furthermore, nix-build's behavior can be exploited to limit the default set of tests that a user can be expected run, whereas an asynchronous tool can run all tests.

Suggested change
Criteria for `tests.<name>`:
Criteria for `tests.<name>`:
- Each `test.<name>` should be a derivation, or an attribute tree with derivations at the leaves.
- `nix-build -A pkg.tests` only recurses into attrsets below `tests.<name>` that also have the attribute `recurseForDerivations = true`, as set by `recurseIntoAttrs`.
- Other tools may recurse into attrsets that do not define `recurseForDerivations`, but must adhere to `recurseForDerivations = false;`.
- Tools may allow the user to specify an attribute, in which case the traversal starts at that attribute, regardless of any `recurseForDerivations` in the specified attribute or its ancestry.
- Function-valued attributes should be ignored.

Copy link
Contributor Author

@jonringer jonringer Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested attribute sets of packages have already been implied earlier in the discussion, but haven't made it explicitly into the document yet.
Noteworthy advantages of nested attribute sets

easier to write than merges (//)
not susceptible to name collisions
allows the selection of groups of derivations by path
lazier. // forces the computation of all the attribute names, which may be slow when an expression library is involved

I don't think it's going to be super common for the need to override tests in one package from another, which would require (//). The goal of this is to have a compromise between building everything, and building what's most likely to fail.

For example, if there's an update to openssl, then I would expect to run the downstream packages which have the most rebuilds (e.g. 1000+ rebuilds) and maybe some NixOS tests. But this will all be defined within openssl's expression, I can't think of use case where I would want to (//) more attrs into the tests attrset, I would just add them directly to openssl's passthru.tests.

From a PR reviewer's standpoint, if I see that someone has a PR updating openssl, I just want to be able to do nix build -f . openssl.tests and have enough faith that it will cover the most likely regressions. If there was a regression found later, then we can add that package to tests, and improve it over time.

My other issue with nested tests, is that these tests are meant to act more like a CI/CD. If any one of them fails, then that failure should be addressed. It's not going to be a "well the simple tests pass, but this one one failed, I guess it's fine". A test shouldn't exist if it's failure state isn't going to indicate that the changes cause regressions.

Copy link
Contributor Author

@jonringer jonringer Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm not opposed to nested attrsets from a technical perspective. More of a "if we get to the point where we feel it's necessary to start organizing the tests with nested attrsets", then we are probably adding a little too much complexity to the tests, and should keep the list of tests shorter to what can be expected of a PR reviewer can handle with average desktop compute resources.

Similarly, I think usage of (//) is also in this domain, and there's likely too much complexity being added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's going to be super common for the need to override tests in one package from another, which would require (//).

This is kind of what we're doing with nixosTests though. We're taking tests from one place and using it them in a package.

[> allows the selection of groups of derivations by path]

"if we get to the point where we feel it's necessary to start organizing the tests with nested attrsets"

This was a distraction. Neither of us care about organizing the tests into clever hierarchies.

I just want to throw whatever expression in there and expect all of it to run in nix-build, ofborg, etc.

By throwing out nested attrsets we need to start caring about the hierarchy, because we will have to flatten it.
If I change a test in nixosTests from a single test to a test matrix, that should have no ill effect on the packages that reference it. If we disallow nesting, it will.

My other issue with nested tests,

Well, this is about test selection again, viewing nested attrsets as a means only to that end, which it is not.

It's really about making NixOS test linking easy and robust.


I guess I could change NixOS/nixpkgs#176557 to always return attrsOf derivation; also when no matrix is defined: { test = drv; }, and flattening the attrsets when it has multiple axes. That would make the PR a breaking change rather than a refactor with added functionality, so I'll scope that out for the PR itself. I also don't want to be associated with such a needless breaking change.

Honestly though, I think it's a bad choice, as we're painting ourselves into a corner with this unnecessary restriction. Our tooling has no problem dealing with nested attrsets...

lazier. // forces the computation of all the attribute names, which may be slow when an expression library is involved

I've seen a developer question his life choices when iterating on expressions that had a false dependency on an expensive attrset's keys. The expression they iterated on had no dependency on those keys, but they had to be computed because they //-ed it to their packages attrset. This added 30 seconds to their cycle time, making unnecessary strictness take up the majority of their cycle time. They'd been passionate about Nix and they had plenty of experience, but weren't intimately familiar with the evaluator to figure out that // was ruining their workflow.

Please reconsider flattening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were just concerned about maintaining a package, and you were just concerned about the long-term maintenance of that package? what would you want the "api" to be for the test suite?

I'm not advocating that nested attrsets should be disallowed, I'm saying they are a "code smell". Since nested attrsets already exist in nixosTests, then I think that's fine for that to carry over. And having a large test suite in nixosTests should probably not be frowned upon if the tests provide value.

I am saying that (//) and large test suite should be avoided, because the desire of this rfc to make PR acceptance more automated, and large testing requirements will get in the way of that.

@kevincox
Copy link
Contributor

FCP is complete. The RFC has been accepted!

@kevincox kevincox merged commit 220af8e into NixOS:master Jun 30, 2022
@jonringer jonringer deleted the testing-rfc branch June 30, 2022 18:29
@vcunat
Copy link
Member

vcunat commented Jul 3, 2022

Now the "rendered" link in the first post won't work anymore. I suggest to edit it, for UX of later visitors.

@jonringer
Copy link
Contributor Author

Yes, thank you :)

@infinisil infinisil added status: accepted and removed status: FCP in Final Comment Period labels Aug 23, 2023
KAction pushed a commit to KAction/rfcs that referenced this pull request Apr 13, 2024
* Formalize tests for nixpkgs packages

* Update rfcs/0119-testing-conventions.md

Co-authored-by: Kevin Cox <[email protected]>

* Update rfcs/0119-testing-conventions.md

* Update rfcs/0119-testing-conventions.md

Co-authored-by: Anderson Torres <[email protected]>

* Update rfcs/0119-testing-conventions.md

Co-authored-by: Anderson Torres <[email protected]>

* Update rfcs/0119-testing-conventions.md

* Update rfcs/0119-testing-conventions.md

* Update rfcs/0119-testing-conventions.md

* Add shepherds

* Update RFC post 30th Apr meeting

* Update rfcs/0119-testing-conventions.md

* passthru.tests -> tests

* typo

* Update rfcs/0119-testing-conventions.md

Co-authored-by: Robert Hensing <[email protected]>

Co-authored-by: Kevin Cox <[email protected]>
Co-authored-by: Anderson Torres <[email protected]>
Co-authored-by: Eelco Dolstra <[email protected]>
Co-authored-by: Robert Hensing <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.