Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make nix-build-if-changed a first class feature of nix #4364

Closed
Profpatsch opened this issue Dec 15, 2020 · 22 comments
Closed

Make nix-build-if-changed a first class feature of nix #4364

Profpatsch opened this issue Dec 15, 2020 · 22 comments

Comments

@Profpatsch
Copy link
Member

Is your feature request related to a problem? Please describe.

When using nix on CI, I want every nix-build/nix-store --realize to be a no-op if it’s already built. This means already built OR available on a substituter. If you run a naïve nix-build, it will download the full run time closure, which can be many megabytes of useless networking if all you want to do is verify whether the build succeeds.

One step further, if I want to generate a script with nix (e.g. a test suite), and then run it on CI (outside of the nix sandbox), it obviously needs to download the runtime closure. However, it does not have to download it if it didn’t change, because that means we ran the test before and the result won’t change (as long as all test dependencies are tracked by nix). Compare to how bazel handles test suites in a similar fashion.

Describe the solution you'd like

This requirement is so common that multiple people have written wrappers around nix independently of each other. The ones that I know of:

I think at this point it would make sense for nix to get one or more flags to make those scripts unnecessary. (They effectively have to parse nix-build --dry-run output to figure out what nix would do and then evaluate a second time.)

A simple --fetch=no/--fetch=if-changed/--fetch=always (default) flag to nix-build might be enough to get most of the advantages; it would only download the resulting path of it had to do a build, and it would signal (via exit code?) whether it actually did the build and download. This might confuse multiple separate issues though, so I don’t know if that’s the best semantics.

  • --fetch=always: current behavior`
  • --fetch=if-changed: the result will only be fetched if it had to be built (aka wasn’t either in the local store or completely substitutable). This is a good way to e.g. build test scripts if you want to run them as well.
  • --fetch=no: the result should never be fetched. You might think “but if I have to build I will have it in my store by design”, but that’s not true if you have a --builder configured because you don’t want to actually build things on CI. In that case, even though the builder builds the resulting path, all CI wants to know is if the store path can be successfully built, it does not need the result.

Describe alternatives you've considered

  • Keep the scripts and add some kind of primitive check to nix to expose whether it has already built the result, so that we don’t need to parse stderr. This would still duplicate evaluation time however!
    • A workaround would be: make evaluation completely cachable, so that we don’t need to re-evaluate between the two nix calls
  • Put the whole functionality of these scripts into nix, that is: don’t expose the information whether a build is already cached, but make it possible to run the result iff there was a build
    • I don’t think this is a viable solution, since we don’t have a nix-run in the old-style tools and the new-style command line is experimental, plus it would hide an important piece of information (already built) and bind it to a fixed action, so it’s less composable if somebody wants to use the information for some other integration goal.

cc @regnat

@Profpatsch
Copy link
Member Author

cc @Mic92

@Mic92
Copy link
Member

Mic92 commented Dec 15, 2020

cc @domenkozar @zimbatm @colemickens

@thufschmitt
Copy link
Member

Thanks for opening that, I also think that's a missing bit in the ecosystem.

A few remarks:

One step further, if I want to generate a script with nix (e.g. a test suite), and then run it on CI (outside of the nix sandbox), it obviously needs to download the runtime closure. However, it does not have to download it if it didn’t change, because that means we ran the test before and the result won’t change (as long as all test dependencies are tracked by nix). Compare to how bazel handles test suites in a similar fashion.

Technically bazel doesn't handle that as bazel tests are morally build actions too, so also sandboxed (but doesn't mean that Nix shouldn't support it of course ;) ). That sounds like worst-case scenario though, as when possible it would be nicer to have the test be a derivation itself − possibly disabling the sandbox just for it with sandbox-mode =relaxed and __noChroot.

--fetch=no: the result should never be fetched. You might think “but if I have to build I will have it in my store by design”, but that’s not true if you have a --builder configured because you don’t want to actually build things on CI. In that case, even though the builder builds the resulting path, all CI wants to know is if the store path can be successfully built, it does not need the result.

I guess that would be mostly the same as --store myBuilder (not if you have several builders of course, but the current Nix model requires a central scheduler which fetches all the build artifacts, and changing that would certainly be way too much effort)

They effectively have to parse nix-build --dry-run output to figure out what nix would do and then evaluate a second time

Fwiw, you can possibly be a slight bit smarter by

  1. getting the output path (with nix-instantiate blah.outPath)
  2. Look it up in the local store and each substituter with something like nix-store --query --hash $outPath --store $substituter

(But agreed, that's not getting you very far, and that's re-implementing some logic that Nix already has internally)

Keep the scripts and add some kind of primitive check to nix to expose whether it has already built the result, so that we don’t need to parse stderr.

Most new-style commands have a --json flag to output their result as parseable json. I guess it wouldn't be too much effort to have a nix-build --dry-run --json.

This would still duplicate evaluation time however!
A workaround would be: make evaluation completely cachable, so that we don’t need to re-evaluate between the two nix calls

You can cache most of the evaluation by first evaluating to a derivation and then doing all the operations on the drv file itself (or use the native caching provided by flakes once they stabilize)

@Ericson2314
Copy link
Member

Isn't there already an issue for this?

@zimbatm
Copy link
Member

zimbatm commented Dec 15, 2020

See #3946

@Profpatsch
Copy link
Member Author

That sounds like worst-case scenario though, as when possible it would be nicer to have the test be a derivation itself − possibly disabling the sandbox just for it with sandbox-mode =relaxed and __noChroot.

I disagree with this. nix sandboxing is very restrictive in what kind of things you can do. You shouldn’t need to adjust your tests to work inside a sandbox. For example, your tests could be doing their own sandboxing, but user namespaces (which nix uses) are not really nestable.

I guess that would be mostly the same as --store myBuilder (not if you have several builders of course, but the current Nix model requires a central scheduler which fetches all the build artifacts, and changing that would certainly be way too much effort)

I don’t think so, you shouldn’t need to download the final result.

You can cache most of the evaluation by first evaluating to a derivation and then doing all the operations on the drv file itself (or use the native caching provided by flakes once they stabilize)

My bad, Profpatsch/nix-build-if-changed.py doesn’t do evaluation twice, but evaluates once and then runs nix-store --realize twice on the evaluated drv.

@Ericson2314
Copy link
Member

This is at least most of the way fixed with #4387

@Profpatsch
Copy link
Member Author

Another edge-case I noticed in practice:

if your root (the drv you are building) is e.g. a writers.writeDash script, then it has allowSubstitutes set to false, which means it will always start building.

Ignoring all allowSubstitutes = false derivations until the first one that is substitutable sounds like a good default that we probably want for this feature.

@zimbatm
Copy link
Member

zimbatm commented Dec 22, 2020

Side rant; IMO allowSubstitutes is an anti-feature.

Even as an advanced user, I often got confused when seeing rebuilds in CI. And because all we get is the store path, I am now hunting down which nix code is producing those. I'd rather Nix being slow and consistent. Or at least have a clear explanation of why the rebuild is happening. Or have a --ignore-allow-substitues flag.

This also prompted me to write this forced-cache.nix script: https://github.com/Mic92/nix-build-uncached/#packages-with-allowsubstitutes--false

@domenkozar
Copy link
Member

I agree, it makes "why didn't my build substitute" more complex than it needs to be. It's also a slippery slope as there's no guarantee that substituting would be slower than building.

I think it's a poor workaround for #3379

@domenkozar
Copy link
Member

@zimbatm opened #4442

@picnoir
Copy link
Member

picnoir commented Apr 27, 2021

Depends on: #4747

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/github-actions-and-or-with-hercules-ci/13537/2

@Ericson2314
Copy link
Member

Ericson2314 commented Jun 9, 2021

#4387 was merged, so try something like

nix-build --store my-remote-store --builders 'auto - - 1 1'

with Nix master.

@nomeata
Copy link
Contributor

nomeata commented Jun 9, 2021

Is there documentation for this? https://github.com/NixOS/nix/pull/4387/files doesn’t touch the manual, and it is not clear to me if this was just a bugfix, or if docs are missing.

@Profpatsch
Copy link
Member Author

#4387 was merged, so try something like

nix-build --store my-remote-store --builders 'auto - - 1 1'

with Nix master.

yeah I’m sorry @Ericson2314, but anything that is not documented in depth in the manual, I cannot in good conscience call “implemented”. This is a general gripe I have with nix, it is a culture problem (no offense intended).

@Profpatsch
Copy link
Member Author

For example I stumbled over the auto setting in a different substitution context recently, but it is not documented anywhere.

@domenkozar
Copy link
Member

This feature really deserves a simple flag for users to flip it on/off.

@Ericson2314
Copy link
Member

@Profpatsch Yes, but my PR also didn't change the CLI :D: it simply made some existing things work where they didn't before.

@Ericson2314
Copy link
Member

#3946 and this are the same, as far as I can tell. Should one of them be closed?

@Profpatsch
Copy link
Member Author

I feel like the other one should have been closed, since this one had more discussion and a better worked out description.

@domenkozar
Copy link
Member

I always close the newest one, but I don't mind either way.

mupdt pushed a commit to mupdt/nix that referenced this issue Mar 1, 2023
This resolves the problems detailed here:
NixOS#4442
The `allowSubstitute=false` "feature" wreaks havoc on CI / AWS / container
type builds, because we *cannot* cache hit on AWS pkgs, and further features
like `nix-build-if-changed` do not work without really ugly wrapper hacks:
NixOS#4364
Seen in the wild, in addition to the above, we've hit two problems at PDT specifically:
1. User buildEnv results can't be substituted out of S3. This means that in
   pdtnix, we can't push them to S3 from CA and have hosts like mtsync pull
   them. This then results in mtsync failing, because it tries to build, but it
   doesn't have DTS-8 installed.
2. On AWS, there are >10,000 pkgs in texlive that say
   allowSubstitutes=false. These are all in S3, but if you cold start a host it
   will rebuild them all!

Moreover, our buildPDTEnv calls are sometimes very expensive (20,000+ symlinks
in turnover-tool style bundles!), so the whole notion that this is faster to
build locally than substitute out of S3 is flawed. We also run pytest
suites, which involve starting tests on NFS, etc., and do a bunch of `find`
filtering to find tests.

Per comment upstream, this is a controversial change and unlikely to be
accepted upstream without more discussion (currently underway in the
community), so we'll keep it internal for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants