-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 docker images take 6h+ to build #879
Comments
Thanks for reporting! Yup, I tried running the workflow 2 more times, they have all failed: https://github.com/orhun/git-cliff/actions/runs/10978847295 I think it is timing out due to QEMU/AMD builds :/ According to this docs, it is not possible to increase this limit.
Not sure what can I do expect disabling them temporarily, which I did in cde2a8e |
@orhun Looks like an issue with github action, the docker action is running now (4 completed) ? (could you rerun the 2.6.0 docker action) |
Yup, I need to create a new release soon. I don't think it is possible to trigger the pipeline again with the new changes in the workflow file :) |
In the meantime you can probably use one of the |
Yup, I close this one as the issue seems resolved :) |
I refactored the Docker builds in #888 based on these docs - it should be faster in theory but I feel like there are still some rough edges in the workflow. |
It takes 6hr+ even with the build matrix: https://github.com/orhun/git-cliff/actions/runs/11053756749/job/30708900152 💀 |
Hey! As i have the impression i caused some trouble with my comment (that i deleted... which might have been a mistake), i thought i share my rationale why i removed that comment. When i saw the change removing the arm64 image build, i thought that it was due to the fact, that building the images and publishing them was all done on the same node. But then i found the setup-buildx-action docs, which mention that this is already creating one node for each architecture. Which is basically what this suggests as well. Thus, i deleted my comment in the hope that i had not done any harm 😅 Sorry 'bout that! Also i guess in the current state this is not yet achievable as github has not yet released its official arm64 support as of this comment. |
From the log I'm reading, it looks like there is an issue with shellexpand, right? It takes one hour from the start to reach shellexpand, and the compilation takes at least 5 hours (it looks really long from my POV) ? Maybe we could add more But maybe as @LtdSauce pointed out it's just the QEMU build that is really slow as expected :/ For now as a quick fix we could disable the ARM build until we find a solution. |
Maybe docker buildx on macos arm runner (for arm build) are better ? |
Just to share some insights: on my DAYJOB we recently had to switch from qemu emulation for building arm64 images as well. In our case we were only doing |
No worries!
Ah, thanks for sharing the links. I subscribed to those issues and I hope it happens one day 🙏🏼
I think the issue is not about a specific crate, it is related to QEMU emulated builds.
I tried your suggestion in #888 but I guess |
I just find it strange at first glance that such package takes more than 5 hours to build compared to others.
Sad, yes I saw the same error in setup-qemu-action repository discussion. |
Can you send a link of that discussion? Man... all I wanted was to build arm64 images... |
I thought qemu is not needed when using the arm64 MacOs runner? The architecture is already native and afaik you should be able to do docker builds on MacOs that build a Linux image already? |
|
Yes with two differents job instead of a matrix 👍 |
Ah damn, right... |
FTR: This seems to be the way it is until GitHub launches their linux/arm64 runners. There seems to be no way for macos runners on arm64 to get docker to build. |
TL;DR
In-DepthI had a look again on the build logs of the failing actions and noticed something weird in the build logs: Although the The following build times are on my local machine. So as a benchmark the current HEAD took Try 1 (Pass same arguments to cargo-chef cook)So then I basically did the following patch to the current head to try to get caching going again:
The build with just Try 2 (remove cargo-chef cook)So my next idea: if the dependency caching is not actually caching anything it is better to be removed:
This build took Try 3 (Toolchain file)Now i noticed, that this project is using a toolchain file... and cargo-chef added a commit emitting the toolchain file in a version not yet used... so lets try our luck by "just" bumping the cargo-chef version:
But that yields the same result... so lets also add the arguments passed to cargo to chef like done in my First try from above. But it still did not work... Try 4 (copy toolchain file by hand)Then i stumbled across LukeMathWalker/cargo-chef#271 which looks exactly like the issue from my 3rd try. So, as cargo-chef cook does not use the toolchain file to build the dependencies, let's just copy it by hand. I applied the following patch to the unmodified head again:
Aaaand 🥁 it reduced the build time to (Some dependencies are rebuild... i suspect this be connected with macro expansion or something like that.. which is not cacheable afaik.) ConclusionThe build times in the docker build are quite high, because cargo-chef in the current state builds the dependencies twice. By injecting the toolchain file manually the caching works again and reduces the build times. This might not be enough to no longer hit the timeout, but as the Building the dependencies with the nightly compiler took roughly the same time as with installed 1.76 from the image. So i suggest just copy the toolchain file and hopefully the arm64 images can be build again. |
FTR: i submitted a PR in cargo-chef that hopefully solves the double install of dependencies and would make my suggested work-around unnecessary. |
I guess I found an easier way to solve this and opened a PR to fix this. Let me know if you agree. |
Wow man, you rock! Thanks for investigating it once again. |
Merged your PR, let's see how it'll go 🤞🏼 https://github.com/orhun/git-cliff/actions/runs/11380858481/job/31661022271 |
* chore(docker): ignore rust toolchain in docker builds This commit adds the known names of the rust-toolchain files to the .dockerignore file. This has two reasons why it makes sense: - The initial docker layer already has a set up rust toolchain that is sufficient to build the project. Thus, by providing a toolchain file, the toolchain would be installed again during docker build. - Currently cargo-chef only copies the toolchain files during cooking but it gets not used during the building of the dependencies in the cook call, see LukeMathWalker/cargo-chef#271. With this in mind, currently the dependencies were actually build twice. Once with the installed toolchain from the image itself, and then in the actual cargo build call with the toolchain speciefied in the toolchain file. Building them twice resulted in timeouts when building the arm64 images as they are emulated using qemu, which is itself already slower than building natively. Now one could argue, that as soon as the mentioned issue is solved using the toolchain again would be fine. But then it would be still needed to assemble the Dockerfile in a way that the toolchain is not build twice. Because the current structure of the Dockerfile builds the toolchain once in the cargo-chef prepare step and once during the cargo build step (and would later build it during the cargo-chef cook instead of cargo build). With all this in mind using no toolchain file but instead just using the sufficient rust installation from the base image makes sense. * Revert "chore(docker): disable building arm64 docker images temporarily (#879)" This reverts commit cde2a8e. Commit 73f75d5 made it possible to build the arm64 image again without running into timeouts.
Github released linux arm64 hosted runner for everyone :) |
That's great! I guess we don't want to switch to that unless we absolutely need it. Right now everything seems fine in the Docker builds. |
Is there an existing issue for this?
Description of the bug
https://github.com/orhun/git-cliff/actions/runs/10978847295/job/30482488830
The github action failed to build and push 2.6.0 tag to ghcr.io
The Docker job times out after 360 minutes, as described in the GitHub documentation
Steps To Reproduce
.
Expected behavior
.
Screenshots / Logs
No response
Software information
Github action
Additional context
No response
The text was updated successfully, but these errors were encountered: