Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0040] "Ret-cont" recursive Nix #40

Closed
wants to merge 21 commits into from
Closed
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
3a8338a
Initial draft "ret-cont" recursive Nix
Ericson2314 Feb 1, 2019
3b8422a
Fix typos and finish trailing sentance
Ericson2314 Feb 5, 2019
4da9193
Switch to advocating temp store rather than daemon socket
Ericson2314 Feb 5, 2019
800b5f3
ret-cont-recursive-nix: Fix typo
langston-barrett Feb 6, 2019
f708983
ret-cont-recursive-nix: Fix typo
Mic92 Feb 6, 2019
6a87c1b
ret-cont-recursive-nix: Fix typo
globin Feb 7, 2019
36193e5
ret-cont-recursive-nix: Fix typo
langston-barrett Feb 8, 2019
ffb9203
ret-cont-recursive-nix: Clean up motivation, adding examples
Ericson2314 Feb 10, 2019
5564fdb
ret-cont-recursive-nix: Improve syntax highlighting
Ericson2314 Feb 10, 2019
22f8322
Do a lousy job formalizing the detailed design
Ericson2314 Feb 11, 2019
7f5f854
ret-cont-recursive-nix: Mention `builtins.exec` in alternatives
Ericson2314 Feb 11, 2019
5c9f1fb
ret-cont-recursive-nix: Fix typo
Mic92 Feb 11, 2019
5e56f21
ret-cont-recursive-nix: Remove dangling "$o"
Ericson2314 Feb 25, 2019
ba7dcce
Update rfcs/0000-ret-cont-recursive-nix.md
Ericson2314 Aug 15, 2019
8bcb4e6
ret-cont-recursive: Fix typo
Ericson2314 Nov 2, 2019
baae1e6
ret-cont: Add examples and expand future work
Ericson2314 Nov 2, 2019
9448a2a
ret-cont: Fix syntax error
Ericson2314 Nov 2, 2019
37a643e
ret-cont: Mention Ninja's upcomming `dyndep` and C++ oppertunity
Ericson2314 Nov 2, 2019
14b134d
ret-cont: Fix missing explicit `outputs` and `__recursive`
Ericson2314 Nov 2, 2019
1b0a6a1
ret-cont: "caching builds" -> "caching evaluation"
Ericson2314 Nov 5, 2019
3fe1c3d
ret-cont: Improve formalism and reference #62
Ericson2314 Dec 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 245 additions & 0 deletions rfcs/0000-ret-cont-recursive-nix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
---
feature: ret-cont-recursive-nix
start-date: 2019-02-01
author: John Ericson (@Ericson2314)
co-authors: (find a buddy later to help our with the RFC)
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
related-issues: (will contain links to implementation PRs)
---

# Summary
[summary]: #summary

"Ret-cont" recursive Nix is a restricted form of recursive Nix, where one builds a derivations instead of executing builds during the builds.
This avoids some platform-specific contortions relating to nested sandboxing.
More importantly, it prevents imperative and overly linear direct-style build scripts;
easy to write but throwing away the benefits of Nix.

# Motivation
[motivation]: #motivation

The benefits of recursive Nix have been described in many places.
One main reason is if we want Nix to function as a build system and package manager, we need upstream packages to use Nix too without duplicating their build systems in Nixpkgs.
For this case, people usually imagine derivations like
```nix
{ stdenv, pkgs, nix }:

stdenv.mkDerivation {
name = "foo";
version = "1.2.3";

src = ...;

nativeBuildInputs = [ nix ];
NIX_PATH = "nixpkgs=${pkgs.path}";

outputs = [ "out" "dev" ];

doConfigure = false;
doBuild = false;

installPhase = ''
for o in $outputs; do
pkg=$(nix-build -E '((import <nixpkgs> {}).callPackage ./. {}).'"$o")
cp -r $pkg ${!o}
done
'';
}
```
The other main reason is other build systems should be translated to Nix without vendoring tons of autogenerated code in Nixpkgs.
For this, case, the one difference is we need to generate some Nix first.
```nix
stdenv.mkDerivation {
# ...
installPhase = ''
bazel2nix # new bit
for o in $outputs; do
pkg=$(nix-build -E '((import <nixpkgs> {}).callPackage ./. {}).'"$o")
cp -r $pkg ${!o}
done
'';
}
```

"Ret-cont" recursive Nix, short for "return-continuation" recursive Nix, is a different take on recursive Nix.
The normal variant in the examples above might be termed "direct-style" recursive Nix.
Consider what happens with the recursive "nix-build" in those examples:
the outer build blocks while the inner one builds, and then the other one continues.
Just as we can CPS-transform programs, reifying the context of a function call as another function (which is passed as an argument), so we can imagine splitting the derivation in two at this blocking point.
This gives the "continuation" part of the name.
But whereas the CPS transformation makes the continuation an argument, the Nix *derivation* language is first order.
Instead, we can produce a derivation which has the callee as a dependency, and continuation drv downstream depending on it.
Since the outer derivation evaluates (builds) the inner derivation rather than calling anything, I deem that it returns the derivation.
This gives the "return" part of the name.
Both differences together, the first example becomes something like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the first example had two outputs - this one has only one. Does this mean I can only recurse once?

Copy link
Member Author

@Ericson2314 Ericson2314 Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can recurse as many times as you want. That one output is the derivation itself, not some output of it. After you've been rewritten into the final derivation (i.e. a non-recursive one), you produce all of its outputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I suppose my question more meant: can you provide behaviour equivalent to your first example. If so, could that be demonstrated in this RFC?

Copy link
Member Author

@Ericson2314 Ericson2314 Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The later ones are equivalent. Or at least intended to be :). Perhaps my typo I fixed in 5e56f21 was causing confusion?

```nix
{ stdenv, pkgs, nix }:

stdenv.mkDerivation {
name = "foo";
version = "1.2.3";

src = ...;

nativeBuildInputs = [ nix ];
NIX_PATH = "nixpkgs=${pkgs.path}";

__recursive = true;

outputs = [ "drv" ];

doConfigure = false;
doBuild = false;

installPhase = ''
mv $(nix-instantiate -E '((import <nixpkgs> {}).callPackage ./. {}).'"$o") $drv
'';
}
```
Note how in this case we don't need to do any "post-processing" of the produced derivation.
When the outer derivation can just "become" the inner derivation, explicitly copying the derivation outputs like before becomes unnecessary.

So why prefer this variation of the standard design?
I've always been concerned with the ease of which someone can just "nix-build ...; nix-build ...; nix-build ..." within a derivation with recursive Nix.
This creates a linear chain of dependencies, which isn't terribly performant: shorter critical paths are crucial for parallelism and incrementality and this fails with both.
Building derivations is less convenient, but makes linear chains and the proper dependency graph *equally* less convenient, removing the perverse incentive.
And in general, dynamism in the dependency graph, which is the essence of what recursive Nix provides, is only a feature of last resort, so making it more difficult across the board isn't concerning.

Additionally, see https://github.com/edolstra/nix/commit/1a27aa7d64ffe6fc36cfca4d82bdf51c4d8cf717 for Eelco's draft implementation of recursive Nix, and the Darwin sandboxing restrictions that make it a Linux-only feature.
Sandboxing and Darwin are crucial to Nix today, and we shouldn't sacrifice either of them.
With "ret-cont" recursive Nix, actual builds are never nested, so we don't need any fancy constraints on the derivation "runtime" (i.e. the code that actually performs and isolates builds).
Furthermore, we can skip needing to talk to the daemon by just producing a local store:
```nix
stdenv.mkDerivation {
# ...
outputs = [ "drv" "store" ];
installPhase = ''
mv $(nix-instantiate --store $store -E '((import <nixpkgs> {}).callPackage ./. {}).'"$o") $drv
'';
}
```
This further simplifies the implementation.
Derivations remain built exactly as today, with only logic *between* building steps that is entirely platform-agnostic changing.

# Detailed design
[design]: #detailed-design

Derivations today build outputs, and are associated to those outputs.
We extend the derivation language by allowing a derivation to indicate their output is more derivations, and ultimately be associated with one of *those* derivations's associated outputs.
Derivations that that do so indicate this with some special attribute, say `__recursive`.
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
Such derivations must have two outputs, `store` and `drv`.
`store` would be a local Nix store limited to just drvs and fixed output builds.
`drv` would contain a symlink to one of the derivations in the store, the root.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ideally for recursive drvs you also have the actual outputs of the final derivation specified, so the nix-lang usage of them can just reference outputs appropriately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that's true.

After the build completes, Nix verifies all the drv files and fixed outputs are valid (contents match hashes, etc.) and merges the built store into the ambient store.
Finally, any uses of the original derivation can be substituted to instead use the symlinked derivation.

To faux-formalize everything in the vein of a small-step semantics:
```
immediatelyDependsOn(drv0, drv1)
immediatelyDependsOn(drv1, drv2)
-------------------------------------------------- deps-trans
transitivelyDependsOn(drv0, drv2)
```
```
∀<d : transitivelyDependsOn(drv, -)>
d ? __recursive == false
-------------------------------------------------- build-readiness
isReadyToBuild(drv)
```
```
drv0 : Drv
∀<o : outputs(drv0)> build_o : StorePath
∀<o : outputs(drv0)> build(drv0) = { ${o} = build_o; }
isReadyToBuild(drv0)
drv0 ? __recursive == false
-------------------------------------------------- normal-build
∀<o : outputs(drv0)> assoc(drv0, o, build_o)
```
```
drv0, drv1 : Drv
drv1path : RelativePath
∀<o : outputs(drv0)> build0_o : StorePath
isReadyToBuild(drv0)
drv0 ? __recursive == true
drv0.outputs = { "store" = ...; "drv" = ...; }
build(drv0) = { succeeded = build0; }
isTrustlessStore(build0_store)
drv1 = read(build0_store + drv1path)
readlink(build0_drv) = build0.store + drv1path
-------------------------------------------------- immediate-drv-deligation
reducesTo(drv0, drv1)
```
```
drv0, drv1, drv2 : Drv
reducesTo(drv1, drv2)
immediatelyDependsOn(drv0, drv1)
-------------------------------------------------- transitive-drv-deligation
reducesTo(drv0, drv0[drv2/drv1])
```
```
drv0, drv1 : Drv
reducesTo(drv0, drv1)
∀<o : outputs(drv0)> build0_o : StorePath
∀<o : outputs(drv0)> assoc(drv0, o, build0_o)
-------------------------------------------------- delegative-build
∀<o : outputs(drv0)> assoc(drv1, o, build1_o)
```

## Design Notes

There's a few things we can call out from the faux-formalization.

- `isTrustlessStore` is called that because the restricted on the contents—fixed output builds / plain data and drvs—is fully and cheaply verifiabled.
This is in contrast to normal builds,
where the relationship between the derivation and build can only be verified by redoing the build,
and where even then there's no way to know whether to blame the output for being actually malicious, or the derivation for merely being non-deterministic.

- The substitution of drvs in a downstream derivation reminds me of the substitution of drvs for content hashes with the intensional store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was one of my comments on that PR. We should figure out how to handle reductions more generally: In the continuation-based recursive nix case, we have foo.drv → foo'.drv whereas in the intensional store case we have (foo.drv)!out → cashash-foo, but the principle is the same.

We should muse on this point, and hopefully write a small-step semantics for both together that is more elegant than the above.

- The *building* itself of derivations is unchanged.
All the magic happens through the `reducesTo` relation.

- Because drvs can produce plans of drvs producing more drvs ad-infinitum, it's possible to never terminate (no `reducesTo` from a derivation to an `isReadyToBuild` derivation) but that's the user's fault.
We can detect simple cycles analogous to black holes in thunks: if a derivation produces a redirected derivation depending on the original, a cycle is effectively recreated even though we don't have a hash fixed point.
Nix should raise an error rather than looping, but either behavior is permissible.

# Drawbacks
[drawbacks]: #drawbacks

- The opinionated nature may put off those who think Nix is too hard to learn already, and think simple recursive "nix-build" is good for newcomers.

- If we ever want full recursive Nix, this doesn't really build in that direction.
It sidesteps the bulk of the difficulty which is in making the nested sandboxing and daemon communication secure.
To me though, this is a feature not a bug; I don't want to go in that direction just yet.

# Alternatives
[alternatives]: #alternatives

- Don't allow fixed-output builds.
All data can be stuck inside the drv file, so this can be cut without limiting expressive power.
But this is much less efficient, and more cumbersome for whatever produces the data.

- Use a socket to talk to the host daemon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hugely in favor of this approach. Using nested stores and all that is kind of hacky, and we have lots of reasons besides actually nested builds to want recursive nix (e.g. store querying, dynamic dependency fetching, etc.). In general I'm in favor of moving toward a much more structured way for builds that are nix-aware to talk to the store; builds that aren't can then use generic functionality built on top of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't really care soooo much on this point. I guess I am more interested in pointing out that we can skip the daemon socket, as proof of the simpler computational model that is this version, than insisting that we must skip the daemon socket.

https://github.com/edolstra/nix/commit/1a27aa7d64ffe6fc36cfca4d82bdf51c4d8cf717, a draft implementation of full recursive Nix, has done this and we can take the details from that.
This might sightly more efficient by reducing moving files, but is conceptual overkill given this design.
No direct access to the host daemon rules about a bunch of security concerns, and simplifies the interface for non-Nix tools producing derivations.
The latter I very much hope will happen, just as Ninja is currently used with CMake, Meson, etc., today.

- Full recursive Nix (builds within builds)

- Import from derivation.
This has been traditionally considered an alternative to this, but I will soon propose an implementation of that relying on this; I no longer consider the two in conflict.

- Keeping the status quo and use vendoring.
But then Nix will never scale to bridging the package manager and build system divide.

# Unresolved questions
[unresolved]: #unresolved-questions

The exact way the outputs refer to the replacement derivations / their outputs is subject to bikeshedding.

# Future work
[future]: #future-work

A version of IFD that delays evaluation in derivation to keep evaluation non-blocking.
This works on the same principle as this keeps all derivations non-blocking (be they higher order or not).