-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
primops: make nature of foldl' strictness clearer #7158
primops: make nature of foldl' strictness clearer #7158
Conversation
It does behave correctly for all the other accumulator values.
For the purpose of illustration only, we can look at the lazy foldl in Nixpkgs lib:
I don't believe laziness in the first accumulator value is going to be a performance issue in the real world, whereas making it strict may cause real world issues, potentially breaking old expressions. While I judge that chance of breaking to be slim, I weigh this to be a bad risk compared to the even slimmer strictness/performance benefit. {
foldl' = f: acc0: l: builtins.seq acc0 (builtins.foldl' f acc0 l);
} In conclusion, I'd recommend to add the above test case instead and leave the implementation as is, and even add a test case so we don't break |
I did a simple benchmark using hyperfine and > hyperfine -w 5 -L commit ac0fb38e8a5a25a84fa17704bd31b453211263eb,aee7373357f20caaf14f02b72136b7950c55405d -s 'git checkout {commit} && make install' './out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/'
Benchmark 1: ./out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/
Time (mean ± σ): 2.360 s ± 0.052 s [User: 2.232 s, System: 0.245 s]
Range (min … max): 2.321 s … 2.493 s 10 runs
Benchmark 2: ./out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/
Time (mean ± σ): 2.376 s ± 0.075 s [User: 2.245 s, System: 0.241 s]
Range (min … max): 2.329 s … 2.583 s 10 runs
Summary
'./out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/' ran
1.01 ± 0.04 times faster than './out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/' So seems being lazier is slightly faster, but I'm not sure if it is significant enough to violate user's expectation. |
This is also fair enough, we can just make it clearer in the documentation that this fold is lazy. Another argument for keeping it as is would be that implementing a strict fold based on the lazy builtin is easy, but not the other way round, potentially leaving us with a much worse performing lazy fold in nixpkgs. |
The measurement difference is within σ, so it is not significant. |
This is not a sufficient characterization of this fold. It is only lazy in the initial accumulator, and only if
This gets a bit "academic" so to speak. I don't think we have much of a choice, because we don't want to break existing expressions. |
Also note: lib.nix currently defines
and
where |
I believe this line to be a very old compatibility shim. Ideally it would wrap |
Of note: since attrsets force their keys in any case, adding the additional strictness to
|
It does if This isn't new information though. We should define weak head normal form aka WHNF in the Nix manual so that we can reference it and not feel like we have document things through sporadic github comments. |
Not just that, but we should also document the strictness behavior of the builtin types (mostly lists and attrsets) |
But that’s a different case. Of course if you never force the thing in the first place (by ignoring it), it won’t be looked at. |
Or maybe I don’t understand what you mean |
Seems like the only way in which |
It does force intermediate thunks, except the initial one that you pass to
That would have been a sensible thought, but it doesn't apply to the linked observation. |
This is not due to the final force, but seems due to the fact that we use Edit: What I mean to say, using plain Consider this diff: diff --git a/src/libexpr/primops.cc b/src/libexpr/primops.cc
index 28b998474..3c2d589b3 100644
--- a/src/libexpr/primops.cc
+++ b/src/libexpr/primops.cc
@@ -2891,13 +2891,16 @@ static void prim_foldlStrict(EvalState & state, const PosIdx pos, Value * * args
Value * vCur = args[1];
for (auto [n, elem] : enumerate(args[2]->listItems())) {
Value * vs []{vCur, elem};
vCur = n == args[2]->listSize() - 1 ? &v : state.allocValue();
state.callFunction(*args[0], 2, vs, *vCur, pos);
}
- state.forceValue(v, pos);
+ //state.forceValue(v, pos);
} else {
- state.forceValue(*args[1], pos);
+ //state.forceValue(*args[1], pos);
v = *args[1];
}
} We can still replicate your example:
You can observe the difference by manually translating the expression into the equivalent calls (also with the above patch applied):
Consequently, my originally proposed change hardly makes any difference in practice, though. Edit: So I guess a way forward would be keep things as is, add more tests for this and try making the documentation a bit more accurate. Not sure if we can drop the final force before return, I have the suspicion it doesn't do anything? |
This is actually a bit flawed. A # foldl' forces the accumulator values produced by the function. This must throw.
builtins.foldl' (_acc: item: item null) {} [ (_: throw "even though this accumulator value isn't used, it must be forced") (_: 1) ] But an extra test case would make the same point # foldl' is lazy in the list items when the function is lazy in the list item. This must return true.
builtins.foldl' (acc: item: acc) {} [ (throw "must be lazy in list items") ] == {}
sgtm
In theory the evaluator could apply a function and return a thunk for the body. I think this would check the set pattern, if any, but stop after that. I'd have to look up in the code what exactly happens though. In general, I think not forcing a thunk may not always cause a problem because often there will be other opportunities for it to be forced anyway, but in a rare case it may not be. In other words, it's not easy to be sure.
|
aee7373
to
4edbeba
Compare
@roberth I updated this change to only update the documentation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented test cases are much appreciated ❤️
src/libexpr/primops.cc
Outdated
evaluated first. For example, `foldl' (x: y: x + y) 0 [1 2 3]` | ||
evaluates to 6. | ||
...`. For example, `foldl' (x: y: x + y) 0 [1 2 3]` evaluates to 6. | ||
The return value of each application of `op` is evaluated strictly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be easier if the Nix docs defined weak head normal form for Nix. Lacking such a precise definition, perhaps we could call it the root?
A bit nitpicky perhaps, but to stay accurate, I'd recommend to avoid "evaluate strictly", because the Nix language does not switch between lazy and strict evaluation, as "strictly" might suggest.
The return value of each application of `op` is evaluated strictly, | |
The root of the return value of each application of `op` is evaluated immediately, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use “immediately”, but I think root is too confusing. I'd much rather fix that in one go with seq
and deepSeq
. In the Nix manual currently evaluate always implies only the root, anything beyond that is evaluate deeply. With the removal of the “strictly” I think it is relatively clear in context.
Let's do the WHNF stuff in a separate PR.
* Clarify the documentation of foldl': That the arguments are forced before application (?) of `op` is necessarily true. What is important to stress is that we force every application of `op`, even when the value turns out to be unused. * Move the example before the comment about strictness to make it less confusing: It is a general example and doesn't really showcase anything about foldl' strictness. * Add test cases which nail down aspects of foldl' strictness: * The initial accumulator value is not forced unconditionally. * Applications of op are forced. * The list elements are not forced unconditionally.
4edbeba
to
d0f2da2
Compare
Any blockers? |
None as far as I'm concerned, but would like to discuss with the team before merging this. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1 |
This is approved, the conversation is safe to mostly ignore since the work ended up being much simpler. The tests are surely better than non-tests, and if we are unsure about the documentation part we should split out the tests to merge that first. |
Haskell actually also had this problem originally, and it was only accidentally changed to the more intuitive behavior in 2015! See also Which |
Btw I had to work around this here. |
Edit: Outdated change description!
I was a bit confused by the fact that
primop_foldlStrict
only (shallowly) forces the input list. To determine what a strict left fold “should” do, I looked at Haskell's foldl' which calls seq on the accumulation value (which you can see by thez :: b
type annotation in the arguably confusing code). I think this is fair enough, since Haskell is where most people's intuition aboutfoldl'
will come from.Forcing the accumulation value
vCur
has the benefit that it limits how nested the intermediate thunks in a fold can become which hopefully saves on used stack size in larger folds (I done any testing on the impact so far). The list elements are only forced insofar, as it is necessary when forcingvCur
which means we still save on unnecessary computations.I added two new test cases which demonstrate that a)
vCur
is forced and b) list elements are not necessarily forced. Additionally I did a sanity check, evaluating a nontrivial CI pipeline as well as multiple NixOS configurations which did not turn up any obvious regressions.