Collapse command & control when a build farm is used #11

blaggacao · 2022-12-09T21:08:41Z

... into a single job.

Having a job matrix is really useful for speeding up the total cycle time of the CI.

However, when a remote build farm already does parallel scheduling for us, GH runners, as the command and control plane consume resources and idle along.

To reduce waste, we need to collapse the command & control plane into a single resource claim (i.e. job) under these circumstances.

The log output should be designed in a way that it effectively substitutes the UI as best as possible.

nrdxp · 2022-12-09T21:56:44Z

The downside is that we won't have a way to track the build progress and spawn the task run as soon as its closure is ready from the builder, we will have to wait for everything to build for all jobs, which is why I decided against doing this from the beginning. In my own testing sending multiple builds to nixbuild.net individually from the given task runners isn't really inefficient at all, and the service already works out the redundancies for me.

blaggacao · 2022-12-09T23:20:40Z

redundancies

I assume you mean #12, right?

We'll have to assess cost impacts a bit ahead of time.

rickynils · 2022-12-11T21:10:49Z

There is a similar discussion for nixbuild-action: nixbuild/nixbuild-action#28.

While the builds will be deduplicated on remote builders (at least on nixbuild.net they will), there might be several GHA runners just idling waiting for the same build to finish. I guess that if you are aware that you have certain "low-level" packages that will trigger rebuilds in all GHA jobs, you could perhaps create some jobs in GHA that will run first, making sure low-level stuff is built. Then the other GHA jobs could depend on those jobs. I understand that it is not always practical or even possible to do something like that, though.

blaggacao · 2022-12-12T02:40:57Z

making sure low-level stuff is built

A often used pattern in standard is:

./automation/toolchain.nix
./foo/packages.nix

It's actually trivial via the top level composition of Block Types in the std-action's ci.yaml to ensure that packages: { needs: [ toolchain ] }. That should get us already quite far or even far enough. 🚀

nrdxp · 2022-12-12T15:27:12Z

Currently each task runner has a preliminary build step, so from my perspective at least, this might already be the best way to handle it, since as you say the build step will simply idle til nixbuild.net is done building its dependencies, the plus side though is that the task stage will begin immediately as soon as those builds are finished, without having to wait for unrelated builds, which is what would happen if we collapsed all builds together.

A better optimization would probably be to skip the nix installation if the cache was restored successfully. In my preliminary testing at least, the duration of the install Nix action seems to correlate with the size of the /nix/store before the install script is called. The bigger the store the longer the installation takes. Since we already have Nix installed anyway (from the restored cache), we could simply skip that wait entirely. We might need Nix 2.12 to use the new auto-build-users to make that easy though.

blaggacao · 2022-12-19T04:41:54Z

we know build action is a no-op
we can leverage this at the very least to collaps build type matrices into a single command & control job per matrix
we expect heavy builds to be predominantly of this build type so this should give us a good 80/20 balance
the user can still help optimization via the GH Action's needs directive

I opened a ticket for ways how to avoid the nix installer (and restore it either from cache or from DISCOVERY_SSH).

nrdxp · 2023-01-25T19:45:19Z

What would be really nice is if we could simply have something like keep-builds-running that nixbuild.net offers on its side but implemented directly in Nix so user can still use any remote build machine. Then we could spawn the jobs directly in discovery without having to wait for them all to finish before spawning the build matrices.

blaggacao · 2023-01-25T20:06:38Z

The problem is the red arrow.

blaggacao mentioned this issue Dec 9, 2022

Fall back build on GH runner workers #12

Open

nrdxp mentioned this issue Dec 12, 2022

Filter out concluded targets #2

Closed

blaggacao mentioned this issue Dec 19, 2022

dogfood #16

Open

nrdxp mentioned this issue Jan 25, 2023

detachable builds NixOS/nix#7693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collapse command & control when a build farm is used #11

Collapse command & control when a build farm is used #11

blaggacao commented Dec 9, 2022

nrdxp commented Dec 9, 2022 •

edited

Loading

blaggacao commented Dec 9, 2022 •

edited

Loading

rickynils commented Dec 11, 2022

blaggacao commented Dec 12, 2022 •

edited

Loading

nrdxp commented Dec 12, 2022 •

edited

Loading

blaggacao commented Dec 19, 2022 •

edited

Loading

nrdxp commented Jan 25, 2023

blaggacao commented Jan 25, 2023

Collapse command & control when a build farm is used #11

Collapse command & control when a build farm is used #11

Comments

blaggacao commented Dec 9, 2022

nrdxp commented Dec 9, 2022 • edited Loading

blaggacao commented Dec 9, 2022 • edited Loading

rickynils commented Dec 11, 2022

blaggacao commented Dec 12, 2022 • edited Loading

nrdxp commented Dec 12, 2022 • edited Loading

blaggacao commented Dec 19, 2022 • edited Loading

nrdxp commented Jan 25, 2023

blaggacao commented Jan 25, 2023

nrdxp commented Dec 9, 2022 •

edited

Loading

blaggacao commented Dec 9, 2022 •

edited

Loading

blaggacao commented Dec 12, 2022 •

edited

Loading

nrdxp commented Dec 12, 2022 •

edited

Loading

blaggacao commented Dec 19, 2022 •

edited

Loading