Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetchgit: fetching the same repo from different places causes a different FOD, incompatible with other fetchers #347212

Open
Atemu opened this issue Oct 8, 2024 · 4 comments

Comments

@Atemu
Copy link
Member

Atemu commented Oct 8, 2024

Issue description

A few days ago I when attempting to review an ffmpeg bump, the ffmpeg git host was down. That's what we have mirrors for though, so I went ahead and simply nix-prefetch-github'd the GH mirror. But building ffmpeg would still try to fetch from the ffmpeg git host.

The reason is that fetchgit uses this function on the given url and rev parameters to determine the name of the FOD:

urlToName = url: rev: let
inherit (lib) removeSuffix splitString last;
base = last (splitString ":" (baseNameOf (removeSuffix "/" url)));
matched = builtins.match "(.*)\\.git" base;
short = builtins.substring 0 7 rev;
appendShort = lib.optionalString ((builtins.match "[a-f0-9]*" rev) != null) "-${short}";
in "${if matched == null then base else builtins.head matched}${appendShort}";

In the case of the ffmpeg git host this results in name = "ffmpeg".

All other relevant fetchers for directory structures (fetchFrom(*GitHost), fetchzip) use source as the name however and a FOD with the same hash but different names is a different derivation in Nix.

The other fetchers choose name = "source" to facilitate replacing the FOD with any other that shares the same contents, no matter where you get them from. You could fetchzip a tarball and it'd be the same FOD as fetching some rev from GitHub as long as the contents are the same.

Proposal

fetchgit should also use name = "source" by default instead of the git repo name.

This actually also confused me when I was watching the nom output as Nix was getting stuck on building the "ffmpeg" derivation...

Difficulties

Changing the default name would change all FODs that do not override name. A hydra eval would then have to re-fetch every fetchgit in Nixpkgs again which could be a significant load we'd put on the git hosts services.

OTOH, according to my grep, there only appear to be ~400 files that mention fetchgit in Nixpkgs which isn't that many if we assume each file to contain one fetchgit FOD. This does not in any way account for automated use of fetchgit; i.e. applying to an entire package set.

Alternative

We could instead introduce fetchgit2 which uses name = "source" by default and let its use be introduced organically rather than all at once.

While we're at it, we could also use that opportunity to deprecate/change leaveDotGit as it's fundamentally broken right now: #8567

@TomaSajt
Copy link
Contributor

TomaSajt commented Oct 8, 2024

I remembered there was a PR that was the opposite of this: #153386
(it proposes to not use "source", even for fetchFromGitHub)

@Frontear
Copy link
Member

Frontear commented Oct 8, 2024

That would be honestly really annoying for FODs because as Atemu has pointed out, the resulting drv will always be considered different and will force a rebuild. The whole purpose of FODs is to determine content consistency, not consistency of the mirror url.

I do agree that naming it source can oftentimes be very misleading or just plain confusing to see in the /nix/store, but I feel like thats a trivial detail.

What we could do is directly expose the name attribute and allow maintainers to set it. That way, even if the source changes, so long as you provide the right name and content, the FOD should produce identical output.

@Atemu
Copy link
Member Author

Atemu commented Oct 9, 2024

We could actually have our cake and eat it too here when CA-drvs became a thing. I'm not sure how their current incantation works in this regard but they could be made to consider any FOD with the same hash an instance of the same CA-drv. Although I don't know whether I'd like them to be invalidated by that as that'd imply any new instance of the FOD would need to be realised first before being considered an instance which wouldn't solve my initial problem which spurred this issue.

I do agree that naming it source can oftentimes be very misleading or just plain confusing to see in the /nix/store, but I feel like thats a trivial detail.

I think so too. The source name is bad for easy discoverability but you're not supposed to discover based on store paths as we have the high-level Nix expressions to process drvs with much richer information than what could be expressed in a path name.

What we could do is directly expose the name attribute and allow maintainers to set it. That way, even if the source changes, so long as you provide the right name and content, the FOD should produce identical output.

This is the status quo: You can already pass any name you like as the caller of fetchgit.

What this issue is about is changing the default for the name parameter in fetchgit. Perhaps I should have been more explicit about that.

@Frontear
Copy link
Member

Frontear commented Oct 9, 2024

What this issue is about is changing the default for the name parameter in fetchgit. Perhaps I should have been more explicit about that.

Thanks for the clarification! That's actually exactly what I was thinking, I just didn't know fetchgit actually did expose its name attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants