Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapse command & control when a build farm is used #11

Open
blaggacao opened this issue Dec 9, 2022 · 8 comments
Open

Collapse command & control when a build farm is used #11

blaggacao opened this issue Dec 9, 2022 · 8 comments

Comments

@blaggacao
Copy link
Collaborator

... into a single job.

Having a job matrix is really useful for speeding up the total cycle time of the CI.

However, when a remote build farm already does parallel scheduling for us, GH runners, as the command and control plane consume resources and idle along.

To reduce waste, we need to collapse the command & control plane into a single resource claim (i.e. job) under these circumstances.

The log output should be designed in a way that it effectively substitutes the UI as best as possible.

@nrdxp
Copy link
Contributor

nrdxp commented Dec 9, 2022

The downside is that we won't have a way to track the build progress and spawn the task run as soon as its closure is ready from the builder, we will have to wait for everything to build for all jobs, which is why I decided against doing this from the beginning. In my own testing sending multiple builds to nixbuild.net individually from the given task runners isn't really inefficient at all, and the service already works out the redundancies for me.

@blaggacao
Copy link
Collaborator Author

blaggacao commented Dec 9, 2022

redundancies

I assume you mean #12, right?

We'll have to assess cost impacts a bit ahead of time.

@rickynils
Copy link

There is a similar discussion for nixbuild-action: nixbuild/nixbuild-action#28.

While the builds will be deduplicated on remote builders (at least on nixbuild.net they will), there might be several GHA runners just idling waiting for the same build to finish. I guess that if you are aware that you have certain "low-level" packages that will trigger rebuilds in all GHA jobs, you could perhaps create some jobs in GHA that will run first, making sure low-level stuff is built. Then the other GHA jobs could depend on those jobs. I understand that it is not always practical or even possible to do something like that, though.

@blaggacao
Copy link
Collaborator Author

blaggacao commented Dec 12, 2022

making sure low-level stuff is built

A often used pattern in standard is:

./automation/toolchain.nix
./foo/packages.nix

It's actually trivial via the top level composition of Block Types in the std-action's ci.yaml to ensure that packages: { needs: [ toolchain ] }. That should get us already quite far or even far enough. 🚀

@nrdxp
Copy link
Contributor

nrdxp commented Dec 12, 2022

Currently each task runner has a preliminary build step, so from my perspective at least, this might already be the best way to handle it, since as you say the build step will simply idle til nixbuild.net is done building its dependencies, the plus side though is that the task stage will begin immediately as soon as those builds are finished, without having to wait for unrelated builds, which is what would happen if we collapsed all builds together.

A better optimization would probably be to skip the nix installation if the cache was restored successfully. In my preliminary testing at least, the duration of the install Nix action seems to correlate with the size of the /nix/store before the install script is called. The bigger the store the longer the installation takes. Since we already have Nix installed anyway (from the restored cache), we could simply skip that wait entirely. We might need Nix 2.12 to use the new auto-build-users to make that easy though.

@blaggacao
Copy link
Collaborator Author

blaggacao commented Dec 19, 2022

  • we know build action is a no-op
  • we can leverage this at the very least to collaps build type matrices into a single command & control job per matrix
  • we expect heavy builds to be predominantly of this build type so this should give us a good 80/20 balance
  • the user can still help optimization via the GH Action's needs directive

I opened a ticket for ways how to avoid the nix installer (and restore it either from cache or from DISCOVERY_SSH).

@nrdxp
Copy link
Contributor

nrdxp commented Jan 25, 2023

What would be really nice is if we could simply have something like keep-builds-running that nixbuild.net offers on its side but implemented directly in Nix so user can still use any remote build machine. Then we could spawn the jobs directly in discovery without having to wait for them all to finish before spawning the build matrices.

@blaggacao
Copy link
Collaborator Author

Untitled Diagram
The problem is the red arrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants