-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot Creating a new builder instance in [Set up Docker Buildx] #893
Comments
@nmiculinic Hey! Could you read this? I don't know what's the latest situation is, but when I checked it last time I had to patch the action or setup buildx with my own command(without using any premade action). In other words, I have no idea how we could fix this on our end. Apparently, it isn't that easy and straightforward to keep parity with hosted github actions runners. |
Thanks for the link! I've used this mumoshu/actions-runner-controller-ci@e91c8c0 and got it working. Cannot you expose some environment variables to make it work seamlessly? |
this would be great to document too, since it's pretty common usecase for self-hosted runners |
@nmiculinic Hey! What do you mean, exactly? Do you think we can enhance anything other than documentation on our end to enhance the user experience here? If you're talking about a potential enhancement to |
@nmiculinic A documentation improvement would definitely be welcomed! I would review it if you could send a PR for that. |
I tried adding the step listed in #893 (comment) but I'm running into a problem where the setup-buildx-action is just hanging... I don't know how to debug. The runner logs in k8s don't tell me anything further about what's going on. - name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Context for Buildx
id: buildx-context
run: |
docker context create builders
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
endpoint: builders |
oh.. I just realized, this might be related to this: |
I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner's default docker context has value - name: Set up Docker Context for Buildx
id: buildx-context
run: |
docker context create builders
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
endpoint: builders Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is |
Docker host is only set to |
I am using - run: docker context create builders
- uses: docker/setup-buildx-action@v1
with:
version: latest
endpoint: builders |
I am running into the same issue @ghostsquad is facing where the I have tried the following solutions and am still facing the issue:
Any other suggestions? Thanks! name: GitHub Actions Demo
on: [push]
jobs:
Explore-GitHub-Actions:
runs-on: [self-hosted, linux]
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1 Logs (includes manually cancelling it due to hang): Download and install buildx
##[debug]Release v0.8.2 found
##[debug]isExplicit: 0.8.2
##[debug]explicit? true
##[debug]checking cache: /opt/hostedtoolcache/buildx/0.8.2/x64
##[debug]not found
Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd64
##[debug]Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd64
##[debug]Destination /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
##[debug]download complete
##[debug]Downloaded to /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
##[debug]Caching tool buildx 0.8.2 x64
##[debug]source file: /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
##[debug]destination /opt/hostedtoolcache/buildx/0.8.2/x64
##[debug]destination file /opt/hostedtoolcache/buildx/0.8.2/x64/docker-buildx
##[debug]finished caching tool
Docker plugin mode
##[debug]Plugins dir is /home/runner/.docker/cli-plugins
##[debug]Plugin path is /home/runner/.docker/cli-plugins/docker-buildx
##[debug]Re-evaluate condition on job cancellation for step: 'Set up Docker Buildx'.
Error: The operation was canceled.
##[debug]System.OperationCanceledException: The operation was canceled.
##[debug] at System.Threading.CancellationToken.ThrowOperationCanceledException()
##[debug] at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
##[debug] at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
##[debug] at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken)
##[debug] at GitHub.Runner.Worker.Handlers.NodeScriptActionHandler.RunAsync(ActionRunStage stage)
##[debug] at GitHub.Runner.Worker.ActionRunner.RunAsync()
##[debug] at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Set up Docker Buildx |
@rlinstorres are you running the self-hosted runners in Kubernetes? I tried this solution as well and got the same result. |
FWIW, what worked for me was:
Perhaps the key difference is that I had |
Hi @john-yacuta-submittable, let me send you more information about my environment to clarify and also help you!
FROM summerwind/actions-runner:latest
ENV BUILDX_VERSION=v0.8.2
ENV DOCKER_COMPOSE_VERSION=v2.5.1
# Docker Plugins
RUN mkdir -p "${HOME}/.docker/cli-plugins" \
&& curl -SsL "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64" -o "${HOME}/.docker/cli-plugins/docker-buildx" \
&& curl -SsL "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-linux-x86_64" -o "${HOME}/.docker/cli-plugins/docker-compose" \
&& chmod +x "${HOME}/.docker/cli-plugins/docker-buildx" \
&& chmod +x "${HOME}/.docker/cli-plugins/docker-compose"
jobs:
build:
name: Build
runs-on: fh-ubuntu-small-prod
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Context for Buildx
run: docker context create builders
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
endpoint: builders Also some screenshots: I hope this information can help you solve your problem. |
Thanks @rlinstorres! I managed to resolve my issue. It was an interesting case where I redeployed the node groups in the cluster. After redeployment, they worked just fine. Perhaps it could work for someone else too. I typically don't like this solution, but we did see that the step in the CI where it was getting stuck was with the file system/kernel level so it was possible the host the self-hosted runners pods were running on, in this case the nodes, was running too hot. My CI step for "Set up Docker Buildx": - name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
with:
driver: docker |
Sorry, I was wrong. This workaround doesn't work. |
@yuanying Hey! Thanks a lot for sharing. Does it look like the below?
Then the key takeaway here might be that the default docker context is somehow invisible to the setup-buildx-action and therefore we have to explicitly specify it via either |
Hello, First of all thank you for sharing this topic because it affects me too. I have the same problem as you but I can't use the workaround you mention in this post. Here is how I use my pipeline:
The pipeline stuck |
By default, the `docker:dind` entrypoint will auto-generate mTLS certs and run with TCP on `0.0.0.0`. This is handy for accessing the running Docker Engine remotely by then publishing the ports. For the runner, we don't need (or want) that behavior, so a Unix socket lets us rely on filesystem permissions. This also has the benefit of eliminating the need for mTLS, which will speed up Pod start slightly (no need to generate CA & client certs), and will fix actions#893 and generally improve compatibility with apps that interact with the Docker API without requiring a custom Docker context to be initialized.
Exactly same setup and issue here. Did you got it to work? |
At this point in time, shouldn't this now be done via the My question is, what Kubernetes RBAC permissions do the self-hosted runners have by default, are they sufficient to launch builder nodes, and if not, how do we change that? @mumoshu ? |
@mumoshu #2324 fixes this - were you interested in that change? If not, would you accept a change to the |
By default, the `docker:dind` entrypoint will auto-generate mTLS certs and run with TCP on `0.0.0.0`. This is handy for accessing the running Docker Engine remotely by then publishing the ports. For the runner, we don't need (or want) that behavior, so a Unix socket lets us rely on filesystem permissions. This also has the benefit of eliminating the need for mTLS, which will speed up Pod start slightly (no need to generate CA & client certs), and will fix actions#893 and generally improve compatibility with apps that interact with the Docker API without requiring a custom Docker context to be initialized.
By default, the `docker:dind` entrypoint will auto-generate mTLS certs and run with TCP on `0.0.0.0`. This is handy for accessing the running Docker Engine remotely by then publishing the ports. For the runner, we don't need (or want) that behavior, so a Unix socket lets us rely on filesystem permissions. This also has the benefit of eliminating the need for mTLS, which will speed up Pod start slightly (no need to generate CA & client certs), and will fix actions#893 and generally improve compatibility with apps that interact with the Docker API without requiring a custom Docker context to be initialized.
By default, the `docker:dind` entrypoint will auto-generate mTLS certs and run with TCP on `0.0.0.0`. This is handy for accessing the running Docker Engine remotely by then publishing the ports. For the runner, we don't need (or want) that behavior, so a Unix socket lets us rely on filesystem permissions. This also has the benefit of eliminating the need for mTLS, which will speed up Pod start slightly (no need to generate CA & client certs), and will fix actions#893 and generally improve compatibility with apps that interact with the Docker API without requiring a custom Docker context to be initialized.
By default, the `docker:dind` entrypoint will auto-generate mTLS certs and run with TCP on `0.0.0.0`. This is handy for accessing the running Docker Engine remotely by then publishing the ports. For the runner, we don't need (or want) that behavior, so a Unix socket lets us rely on filesystem permissions. This also has the benefit of eliminating the need for mTLS, which will speed up Pod start slightly (no need to generate CA & client certs), and will fix actions#893 and generally improve compatibility with apps that interact with the Docker API without requiring a custom Docker context to be initialized.
How exactly was this fixed? |
Describe the bug
Checks
To Reproduce
Expected behavior
It will work same as in hosted github runners
Environment (please complete the following information):
Helm values yaml:
The text was updated successfully, but these errors were encountered: