Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing ofBorg with GitHub Actions #355847

Open
4 of 7 tasks
Mic92 opened this issue Nov 14, 2024 · 64 comments
Open
4 of 7 tasks

Replacing ofBorg with GitHub Actions #355847

Mic92 opened this issue Nov 14, 2024 · 64 comments
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: developer experience

Comments

@Mic92
Copy link
Member

Mic92 commented Nov 14, 2024

This is one of the two plans to ensure we can also perform github evaluation checks in the future.

See https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025
for more information.

To replace OfBorg’s functions with GitHub Actions the following tasks need to be implemented:

  • Running evaluation checks on Nixpkgs
  • Eval NixOS options.
  • Identifying package rebuilds and adding appropriate labels to the repository.
  • Notifying package maintainers
  • (Optional) Rebuilding selected packages for Linux/macOS.
  • build lib-tests if ./lib changes
  • Performance report equivalent

I already created a proof of concept pull request here: #352808

Update

We have our first jitsi meeting to coordinate the migration on the 14.11 (today) at 17:00 UTC (18:00 Berlin time) at https://jitsi.lassul.us/nixos-infra

@Mic92 Mic92 added the 0.kind: bug Something is broken label Nov 14, 2024
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025/2

@Bot-wxt1221
Copy link
Member

evaluation checks takes too many resource. I'm worried about if github action's machine can run it in reasonable time.

@Mic92
Copy link
Member Author

Mic92 commented Nov 14, 2024

@Bot-wxt1221 I managed to run it in 5 minutes for naive nix-env evaluation based on the default.nix entry point and 15 minutes using the same logic that ofborg uses: https://github.com/Mic92/nixpkgs/actions/workflows/eval.yml

Both seem already faster compared to the hours of waiting for the ofborg queue that we experience today.

Also this is not yet the end of the line of optimizations. We still have https://github.com/Mic92/nixpkgs/blob/main/pkgs/top-level/release-attrpaths-superset.nix to split evaluation in smaller parts that can run even in parallel.

@JohnRTitor
Copy link
Contributor

Will PR commands like @ofborg build hello be supported with GitHub action?

@JohnRTitor
Copy link
Contributor

#352808 (comment) and

I worry that bot accounts like ryantm-r can easily hit the limit of CI. CC @ryantm

@Mic92
Copy link
Member Author

Mic92 commented Nov 14, 2024

@JohnRTitor

Yes it's possible:

name: Trigger on PR Comment

on:
  issue_comment:
    types: [created]

jobs:
  run-on-comment:
    if: github.event.issue.pull_request != null && contains(github.event.comment.body, '/build')
    runs-on: ubuntu-latest
    steps:
      - name: Check out code
        uses: actions/checkout@v3

@FliegendeWurst FliegendeWurst added the 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions label Nov 14, 2024
@Mic92
Copy link
Member Author

Mic92 commented Nov 14, 2024

#352808 (comment) and

I worry that bot accounts like ryantm-r can easily hit the limit of CI. CC @ryantm

Well. We have to try and see. Just now it's speculation if it works or not.

@JohnRTitor
Copy link
Contributor

Good to know, though huge builds like kernel and its modules, chromium and firefox will obviously not work. And we'll possibly have to setup a blacklist else even individual contributors will hit their limits.

@Bot-wxt1221
Copy link
Member

According to github doc:

https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-github-actions/about-billing-for-github-actions

GitHub Actions usage is free for standard GitHub-hosted runners in public repositories, and for self-hosted runners. For private repositories, each GitHub account receives a certain amount of free minutes and storage for use with GitHub-hosted runners, depending on the account's plan. Any usage beyond the included amounts is controlled by spending limits.

So maybe we don't need to worry about time?

@Mic92
Copy link
Member Author

Mic92 commented Nov 14, 2024

Good to know, though huge builds like kernel and its modules, chromium and firefox will obviously not work. And we'll possibly have to setup a blacklist else even individual contributors will hit their limits.

You can run builds for 12h. Obviously we should establish some reasonable timeouts to be a good citizen in the ecosystem.

@Mic92
Copy link
Member Author

Mic92 commented Nov 14, 2024

Added a ^ meeting date for this.

@ibizaman
Copy link

Maybe of interest for this issue, at least just for inspiration, but I've also (ab)used GitHub actions to build tests in my project using a dynamically generated matrix. My project uses flakes but this should be adaptable to non-flakes https://github.com/ibizaman/selfhostblocks/blob/main/.github/workflows/build.yaml
This matrix then produces a big list of jobs, one job per test https://github.com/ibizaman/selfhostblocks/actions/runs/11502502422 like so:
image

@Mic92
Copy link
Member Author

Mic92 commented Nov 14, 2024

See the meeting notes for today's infra meeting where we mainly discussed the CI situation: https://github.com/NixOS/infra/blob/7688f20babbeb27a10e4d8669fffe4b0ed00e17f/docs/meeting-notes/2024-11-14.md

Here is the high-level plan:

  • Infinisil wants to take a look at evaluating nixpkgs in github actions to compute the number of changed paths
  • Independently we will take a look how we can build packages.
  • For the beginning we will just run github actions as they are designed as a pull_request event. This is because it's the most straight forward way and we actually have not validated if we cannot just build everything fast enough without resorting to my initial strategy.

Independently from meeting we also have other discussions about how we can develop ofborg in the future. However this might not happen before February, so we need some alternative solution in the meantime if not longer.

@infinisil
Copy link
Member

I've opened a draft PR here for evaluating Nixpkgs using GitHub Actions: #356023. For just evaluation (and those only taking 5 minutes on each arch) instead of also building, I don't think we need to do the running-on-forks dance. Building is harder to get, but it's arguably also less important (and very orthogonal to evaluation).

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025/27

@adisbladis
Copy link
Member

One important aspect that ofborg currently provides, and that this issue doesn't mention, is the performance report.
This currently works by evaluating nixpkgs twice, once before the PR and once after.

For the majority of PRs the performance report is not important, but for work on lib & stdenv, it can be very important.

The report currently does not report the impact of checkMeta, something that has lead to a less than stellar review experience since contributors & reviewers don't actually understand the real performance impact.

@JohnRTitor JohnRTitor removed the 0.kind: bug Something is broken label Nov 15, 2024
@JohnRTitor JohnRTitor pinned this issue Nov 15, 2024
@Mic92
Copy link
Member Author

Mic92 commented Nov 15, 2024

One important aspect that ofborg currently provides, and that this issue doesn't mention, is the performance report. This currently works by evaluating nixpkgs twice, once before the PR and once after.

For the majority of PRs the performance report is not important, but for work on lib & stdenv, it can be very important.

The report currently does not report the impact of checkMeta, something that has lead to a less than stellar review experience since contributors & reviewers don't actually understand the real performance impact.

Could that be another on-demand GitHub actions job? We could even run automatically if certain paths has been changed.

@azuwis
Copy link
Contributor

azuwis commented Nov 15, 2024

Good to know, though huge builds like kernel and its modules, chromium and firefox will obviously not work. And we'll possibly have to setup a blacklist else even individual contributors will hit their limits.

Building linux kernel is fine on Github Actions, the CPU time is sufficient, it takes less than 2 hours to build Jovian-NixOS linux kernel, and Github Actions offer max 6 hours per run.

The only concern is disk space, workarounds:

  1. Bind mount /mnt/nix to /nix, /mnt is 66G free by default.
  2. Set build-dir = /nix/var in nix.conf, by default nix use /tmp to hold /build in the sandbox, and takes up disk space in /, 20G free, not enough for building linux kernel.
  3. Remove files we don't need, docker images, /usr/local, /usr/share/swift, etc. It's possible to get more than 63G free disk space in / without affecting nix.
  4. Use BTRFS RAID0 to combine / and /mnt, and enable zstd compression, it's possible to get total 126G free disk space, and should be sufficient for most build tasks.

All of the above workarounds are implemented in https://github.com/azuwis/actions/blob/main/nix/prepare.sh.

Well, expect for 2), which can be set by:

    - uses: cachix/install-nix-action@v30
      with:
        extra_nix_config: |
          build-dir = /nix/var

@adisbladis
Copy link
Member

One important aspect that ofborg currently provides, and that this issue doesn't mention, is the performance report. This currently works by evaluating nixpkgs twice, once before the PR and once after.
...

Could that be another on-demand GitHub actions job? We could even run automatically if certain paths has been changed.

Sounds good to me.

@JohnRTitor
Copy link
Contributor

Building linux kernel is fine on Github Actions, the CPU time is sufficient, it takes less than 2 hours to build Jovian-NixOS linux kernel, and Github Actions offer max 6 hours per run.

I am concerned about building the kernel modules (both in tree and out of tree).

@Mic92
Copy link
Member Author

Mic92 commented Nov 16, 2024

Building linux kernel is fine on Github Actions, the CPU time is sufficient, it takes less than 2 hours to build Jovian-NixOS linux kernel, and Github Actions offer max 6 hours per run.

I am concerned about building the kernel modules (both in tree and out of tree).

Well. We should be quickly able to filter out and blacklist packages we don't want to build once the source of truth lives in the repository? Also we can actually stop github actions, which was not possible with ofborg builds.

@Kamillaova
Copy link
Contributor

Kamillaova commented Nov 19, 2024

Maybe of interest for this issue, at least just for inspiration, but I've also (ab)used GitHub actions to build tests in my project using a dynamically generated matrix. My project uses flakes but this should be adaptable to non-flakes https://github.com/ibizaman/selfhostblocks/blob/main/.github/workflows/build.yaml This matrix then produces a big list of jobs, one job per test https://github.com/ibizaman/selfhostblocks/actions/runs/11502502422 like so:

@ibizaman did you see this? https://github.com/thecaralice/flake-gha

@Mic92
Copy link
Member Author

Mic92 commented Dec 18, 2024

@FliegendeWurst is it because of unfree

@FliegendeWurst
Copy link
Member

@FliegendeWurst is it because of unfree

The package has license = lib.licenses.gpl2Only;.

@Mic92
Copy link
Member Author

Mic92 commented Dec 18, 2024

@FliegendeWurst is it because of unfree

The package has license = lib.licenses.gpl2Only;.

Any unfree dependencies?

@FliegendeWurst
Copy link
Member

None: jdk21, openjfx21, maven, fetchFromGitHub, makeDesktopItem, copyDesktopItems, wrapGAppsHook3, gtk3.

@wolfgangwalther
Copy link
Contributor

How reproducible is this? We could also try to increase swap if it happens rarely. But also I haven't seen this in a while. Is your pr maybe massively increasing regressing on memory usage?

Yes, it is my PR's fault, see: #303849 (comment)

Looking at the "master" column in the table in that comment, and taking into account that we run 4 chunks in parallel in CI.. we could end up with a worst-case of ~ 18.5 GB of memory used, if we're unlucky to have chunks 2 and 4 be finished before any of 0, 1 and 3 are. We currently have pretty much exactly 18.5 GB available:

Available memory: 14405 MiB, free swap: 4095 MiB

So without major regressions like my PR, this should only be a problem very rarely. Of course, the memory usage can shift between chunks over time, so it could still become one.

@lucasew

This comment was marked as off-topic.

@doronbehar
Copy link
Contributor

How reproducible is this? We could also try to increase swap if it happens rarely. But also I haven't seen this in a while. Is your pr maybe massively increasing regressing on memory usage?

Another eval failure on memory:

https://github.com/NixOS/nixpkgs/actions/runs/12468993409/job/34801318849

@wolfgangwalther
Copy link
Contributor

Another eval failure on memory:

https://github.com/NixOS/nixpkgs/actions/runs/12468993409/job/34801318849

The error I am seeing in your link is:

       >        error: attribute 'swig3' missing
       >        at /nix/store/6mzhh1mpc9w6xggfhw0mfibcwv7whvm7-source/pkgs/applications/video/kodi/unwrapped.nix:193:28:
       >           192|       "-DLIRC_DEVICE=/run/lirc/lircd"
       >           193|       "-DSWIG_EXECUTABLE=${buildPackages.swig3}/bin/swig"
       >              |                            ^
       >           194|       "-DFLATBUFFERS_FLATC_EXECUTABLE=${buildPackages.flatbuffers}/bin/flatc"
       >        Did you mean one of swig, sig, sigi, stig or swego?

That doesn't seem to be memory related.

@doronbehar
Copy link
Contributor

Oh thank you - I just say the message at the bottom stating how much memory is available and I thought it was because of this. Thank you and sorry for the noise :).

@getchoo getchoo mentioned this issue Dec 28, 2024
13 tasks
@misuzu
Copy link
Contributor

misuzu commented Dec 29, 2024

The This PR is is targeting a channel branch action fails even when the target branch was corrected:
https://github.com/NixOS/nixpkgs/actions/runs/12534682843/job/34956016297?pr=369067

@JohnRTitor
Copy link
Contributor

Looks like GitHub is caching some variables (like branch info), even in subsequent re runs.

Should be fixed by a (force) push.

We can also configure the yml to run on edited events, (which includes base branch change event, pr body updated), like https://github.com/NixOS/nixpkgs/blob/adaa9f280329b5f814e8dc83eceddd42b20f72f4/.github/workflows/nixpkgs-vet.yml#L14C1-L14C51

Wish there was just a base_changed event though, but that's upto GitHub https://github.com/orgs/community/discussions/35058

@GGG-KILLER
Copy link
Contributor

GGG-KILLER commented Dec 30, 2024

The This PR is is targeting a channel branch action fails even when the target branch was corrected:
https://github.com/NixOS/nixpkgs/actions/runs/12534682843/job/34956016297?pr=369067

Couldn't this be solved by using GitHub branch protection rules/rulesets instead of a workflow?
Having to run a whole CI pipeline just to check that it's not - pointing to a branch seems really overkill when we could configure a rule to point to nixos-* and nixpkgs-* and then create a bot account that can bypass those rules for the automated merges.

@misuzu misuzu unpinned this issue Dec 30, 2024
@misuzu misuzu pinned this issue Dec 30, 2024
@paparodeo
Copy link
Contributor

paparodeo commented Dec 31, 2024

https://github.com/NixOS/nixpkgs/actions/runs/12563383510/job/35025186249

out of memory (push to master -- subsequent count calculations will fail)

@JohnRTitor
Copy link
Contributor

Some PRs eval "processing" seems to get failed, which fails subsequent Tag job.

Example: https://github.com/NixOS/nixpkgs/actions/runs/12563389748/job/35025265089?pr=369760 (#369760)

@paparodeo
Copy link
Contributor

Some PRs eval "processing" seems to get failed, which fails subsequent Tag job.

Example: https://github.com/NixOS/nixpkgs/actions/runs/12563389748/job/35025265089?pr=369760 (#369760)

looks due to the out of memory failure in #355847 (comment)

@JohnRTitor
Copy link
Contributor

https://github.com/NixOS/nixpkgs/actions/runs/12563383510/job/35025186249

out of memory (push to master -- subsequent count calculations will fail)

@paparodeo Perhaps we need to increase the swap size/add a swap. #356023 (comment)

@wolfgangwalther
Copy link
Contributor

Failure to add the PR's author, who is a maintainer as well, as a reviewer:

https://github.com/NixOS/nixpkgs/actions/runs/12580336431/job/35062292601

Introduced in #366046, I guess.

@Mic92
Copy link
Member Author

Mic92 commented Jan 2, 2025

@wolfgangwalther potential fix: #370186

@infinisil
Copy link
Member

#370456

@emilazy
Copy link
Member

emilazy commented Jan 5, 2025

The eval action doesn’t properly fail on warnings, which is a regression from ofborg; see #371223. (It’d be great to get some insight into what attribute is causing the warnings, as that was a consistent pain point of ofborg-eval, but not failing at all is the bigger issue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: developer experience
Projects
None yet
Development

No branches or pull requests