Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate failing PR check (check-by-name) #289649

Closed
mfrw opened this issue Feb 18, 2024 · 12 comments
Closed

Investigate failing PR check (check-by-name) #289649

mfrw opened this issue Feb 18, 2024 · 12 comments
Labels
0.kind: bug Something is broken 0.kind: build failure A package fails to build
Milestone

Comments

@mfrw
Copy link
Member

mfrw commented Feb 18, 2024

Steps To Reproduce

Steps to reproduce the behavior:
Any PR created after the merge of:

Seems to fail PR checks.

Logs:

pkgs/test/nixpkgs-check-by-name/scripts/run-local.sh master
this path will be fetched (0.00 MiB download, 0.01 MiB unpacked):
  /nix/store/hks7d5aa2wcw0nbh67g0s7z3dq5vvxva-jq-1.7-dev                                         
copying path '/nix/store/hks7d5aa2wcw0nbh67g0s7z3dq5vvxva-jq-1.7-dev' from 'https://cache.nixos.org'...
Using HEAD commit 8bd36b1c91283672d04bb47f52a67476de0ef584
Creating Git worktree for the HEAD commit in /run/user/1000/tmp.iFFGOrvnqf/merged.. Done
Fetching base branch master to compare against.. 573cead3dd6f4ca10ed1bc7dc3589bc85e150ab5
Creating Git worktree for the base branch in /run/user/1000/tmp.iFFGOrvnqf/base.. Done
Merging base branch into the HEAD commit in /run/user/1000/tmp.iFFGOrvnqf/merged.. 8bd36b1c91283672d04bb47f52a67476de0ef584
Reading pinned nixpkgs-check-by-name revision from pinned-tool.json.. d934204a0f8d9198e1e4515dd6fec76a139c87f0
Creating Git worktree for the nixpkgs-check-by-name revision in /run/user/1000/tmp.iFFGOrvnqf/tool-nixpkgs.. Done
Building/fetching nixpkgs-check-by-name..
this path will be fetched (0.92 MiB download, 3.78 MiB unpacked):
  /nix/store/5fjdmbiziyp47gfc9kmfgvxdlzd6bba1-nixpkgs-check-by-name
copying path '/nix/store/5fjdmbiziyp47gfc9kmfgvxdlzd6bba1-nixpkgs-check-by-name' from 'https://cache.nixos.org'...
/nix/store/5fjdmbiziyp47gfc9kmfgvxdlzd6bba1-nixpkgs-check-by-name
Running nixpkgs-check-by-name..
pkgs.openobserve: This attribute is manually defined (most likely in pkgs/top-level/all-packages.nix), which is only allowed if the definition is of the form `pkgs.callPackage pkgs/by-name/op/openobserve/packag
e.nix { ... }` with a non-empty second argument.
Validation failed, see above errors

Add a 👍 reaction to issues you find important.

@mfrw mfrw added 0.kind: build failure A package fails to build 0.kind: bug Something is broken labels Feb 18, 2024
@mfrw
Copy link
Member Author

mfrw commented Feb 18, 2024

@7c6f434c
Copy link
Member

@infinisil is the ratchet absolute not relative so this breaks all PRs in the meantime and doesn't say that there is already a failure on the target? I would guess ideally «relative increase» — including the things currently in the absolute exception list → red, «absolute increase but fine relatively» → gray, «absolute and relative are both fine» → green.

It's another question whether this should have been a failure at all with non-default callPackage and some argument passing.

@infinisil
Copy link
Member

See #289751

In this case it wasn't a ratchet check (relative) failure, but rather an absolute failure, the PR shouldn't have been merged at all.

@7c6f434c
Copy link
Member

Oh, in the sense that the initially-hoped-to-be-avoided «need to move out of by-name» cases are actually checked not in comparison to target branch but in absolute?

@infinisil
Copy link
Member

Yeah changing it to a non-pkgs.callPackage still requires moving out of pkgs/by-name, irrelevant of the base branch.

@7c6f434c
Copy link
Member

I ask about keeping the wrong version from the base branch here (but apparently this is not taken into account).

Could we get gray instead of red if the base branch fails relative to itself? Hopefully this could shift the balance from blind reverts to looking a bit longer for an easy fix, and also matches the semantics better (red should be when a PR breaks something…)

@infinisil
Copy link
Member

Ah I see what you mean. I don't think it's a good idea though. If master is broken like that (which btw can happen for all CI checks), it's generally impossible to know whether PRs are valid by themselves anymore. So if you merge PRs while the base branch is broken, you risk breaking it even more! Marking such base branch failures as red makes sure that it has to get fixed first.

We should make sure that we never get to such a state in the first place, e.g. by enforcing that CI is green before merging. But in the absence of that, if PRs do still get merged with failing CI and it causes master to break (as was the case here), it just has to get fixed ASAP, and reverting is the fastest way to do that.

@mfrw
Copy link
Member Author

mfrw commented Feb 20, 2024

@infinisil @7c6f434c Should we reopen this issue to have more discussion ?

@7c6f434c
Copy link
Member

We should make sure that we never get to such a state in the first place,

You have enabled the red checks before having data whether all the corner cases behave like you would expect. This was a mistake. By now, I am not sure you can quickly give a summary of what should be done in each corner case (via general rules, not by looking at a specific case) — and I am sure nobody else can. So I think at least «not caused by the local PR» problems should be gray. From all the writing, the problems are per-top-level-attribute anyway, so if the list of problematic attributes has not changed…

you risk breaking it even more!

Sorry, but for check-by-name I am not buying this logic.

The hard part is basically figuring out and telling everyone what to do in the newly discovered non-obvious corner case. Fixing the things is supposed to be a zero-rebuild (ofborg checks this!) moving files around, moving a few easy cases around doesn't really add that much to it.

And, really — «This can make some security fixes more complicated to get right» was not a discussed drawback in the accepted RFC, so reducing the amount of security fixes reverted for trivial naming reasons is not less important than being slightly closer to the full migration while nobody is even sure the tooling is currently correct. Gray for «probably not your fault» failures encourages fix-forward.

@infinisil
Copy link
Member

I think you're confusing something here, this failure was not an edge case. The RFC clearly specifies that only pkgs.callPackage can be used for packages in pkgs/by-name. And it also mentions the drawback that it won't work for alternate callPackage functions. This is also documented in Nixpkgs (also here).

This was caught in the original PRs CI check, properly causing it to fail. And I could've confirmed that the failure is legitimate if I got a ping beforehand (which I didn't). The PR was merged regardless, probably because of there being false positives in the past, but these were all fixed already, so this was not the case here. If failing CI is ignored, I expect the one merging it to have made sure that it won't cause problems, which was not done here. I can't take the blame for this.

So really, this is simply a case of the master branch being broken. And just like all other CI checks, the pkgs/by-name check also fails for all PRs in such a case, even if those PRs don't have anything to do with the failure. There is nothing special about the pkgs/by-name check in that regard. Now because pkgs/by-name is one of the few checks that even has access to the base branch, it can give a better error message if the base branch already fails. I even implemented this in the past, though it was later dropped for other reasons, this could be implemented again though.

The only things I do take the blame for are the past false positives which caused some confusion, but by now these are all fixed, and I actively informed all affected PRs of this. I also take the blame for the confusion regarding error messages and the one unanticipated edge case (totally unrelated to master failing fyi). The CI check should indicate whether merging a PR would break master, and what one can do to make CI succeed. This is already tracked and I already started working on it this week. It's really tricky to get right though, so please be patient.

@infinisil infinisil added this to the RFC 140 milestone Feb 20, 2024
@7c6f434c
Copy link
Member

If failing CI is ignored, I expect the one merging it to have made sure that it won't cause problems, which was not done here.

I'll bite the bait: as long as the mass-renaming has not landed anywhere, violating by-name rules does not cause real problems yet. Unlike eval errors, that usually mean cryptic errors for a priori plausible attempts to install something.

but by now these are all fixed, and I actively informed all affected PRs of this.

You are handling the error reports very well and impressively quickly. Sorry if I had given the impression I doubt that part.

However, not many false positives are needed to burn the confidence temporarily. So yeah, people will ignore some of the complaints from CI until the classes of CI check complaints get to be common knowledge. And it is hard to predict which ones.

(I am not sure what is the true situation with the multi-versioned packages, to be honest, but I am kind of waiting for what the mass-migration PR will do with them… because I am not 100% sure everything is settled right now)

@infinisil
Copy link
Member

as long as the mass-renaming has not landed anywhere, violating by-name rules does not cause real problems yet.

That's entirely orthogonal. The automated renaming won't have any effect on the checks.

(I am not sure what is the true situation with the multi-versioned packages, to be honest, but I am kind of waiting for what the mass-migration PR will do with them… because I am not 100% sure everything is settled right now)

That is the one unanticipated edge case. My recommendation for now is to refactor the multiple versions such that the pkgs/by-name check doesn't trigger, see #287918 (comment).

The automated migration won't touch such packages, it's smart enough to detect when two packages depend on the same file. Once the automated migration code is in place though, I'm considering weakening the new package check to only complain when the new package could be migrated automatically. So this means new versions of packages would still be allowed in all-packages.nix, therefore resolving this well enough.


For now, you can check out #290743 for some improved error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 0.kind: build failure A package fails to build
Projects
None yet
Development

No branches or pull requests

3 participants