Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Creating a new builder instance in [Set up Docker Buildx] #893

Closed
1 of 2 tasks
nmiculinic opened this issue Oct 14, 2021 · 22 comments · Fixed by #2324
Closed
1 of 2 tasks

Cannot Creating a new builder instance in [Set up Docker Buildx] #893

nmiculinic opened this issue Oct 14, 2021 · 22 comments · Fixed by #2324
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed

Comments

@nmiculinic
Copy link

Describe the bug

  • Action which works correctly on hosted github runners does not work in self-hosted version

Checks

  • My actions-runner-controller version (v0.x.y) does support the feature
  • I'm using an unreleased version of the controller I built from HEAD of the default branch

To Reproduce

      - name: Checkout
        uses: actions/checkout@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
  /usr/local/bin/docker buildx create --name builder-3367d142-667f-46da-9e5a-56a8706f3c86 --driver docker-container --buildkitd-flags --allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host --use
  error: could not create a builder instance with TLS data loaded from environment. Please use `docker context create <context-name>` to create a context for current environment and then create a builder instance with `docker buildx create <context-name>`
  Error: The process '/usr/local/bin/docker' failed with exit code 1

Expected behavior
It will work same as in hosted github runners

Environment (please complete the following information):

  • Controller Version [e.g. 0.18.2] app.kubernetes.io/version=0.20.2
  • Deployment Method [e.g. Helm and Kustomize ]: helm
  • Helm Chart Version [e.g. 0.11.0, if applicable]: helm.sh/chart=actions-runner-controller-0.13.2

Helm values yaml:

# helm upgrade --install --namespace actions-runner-system --create-namespace actions-runner-controller actions-runner-controller/actions-runner-controller -f ~/Desktop/grid/infra/staging/gh.yaml
authSecret:
  create: true
  <redacted>

scope:
  singleNamespace: true

githubWebhookServer:
  enabled: true
  secret:
    create: true
    name: "github-webhook-server"
    github_webhook_secret_token: "<redacted>"

metrics:
  serviceMonitor: true
@mumoshu
Copy link
Collaborator

mumoshu commented Oct 14, 2021

@nmiculinic Hey! Could you read this?

I don't know what's the latest situation is, but when I checked it last time I had to patch the action or setup buildx with my own command(without using any premade action).

In other words, I have no idea how we could fix this on our end. Apparently, it isn't that easy and straightforward to keep parity with hosted github actions runners.

@nmiculinic
Copy link
Author

Thanks for the link!

I've used this mumoshu/actions-runner-controller-ci@e91c8c0 and got it working.

Cannot you expose some environment variables to make it work seamlessly?

@nmiculinic
Copy link
Author

this would be great to document too, since it's pretty common usecase for self-hosted runners

@mumoshu
Copy link
Collaborator

mumoshu commented Oct 23, 2021

Cannot you expose some environment variables to make it work seamlessly?

@nmiculinic Hey! What do you mean, exactly? Do you think we can enhance anything other than documentation on our end to enhance the user experience here?

If you're talking about a potential enhancement to docker/setup-buildx-action, I think you'd better file an issue there.

@mumoshu mumoshu added documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed labels Oct 23, 2021
@mumoshu
Copy link
Collaborator

mumoshu commented Oct 23, 2021

@nmiculinic A documentation improvement would definitely be welcomed! I would review it if you could send a PR for that.

@ghostsquad
Copy link

I tried adding the step listed in #893 (comment)

but I'm running into a problem where the setup-buildx-action is just hanging... I don't know how to debug. The runner logs in k8s don't tell me anything further about what's going on.

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders
      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

image

@ghostsquad
Copy link

oh.. I just realized, this might be related to this:
docker/setup-buildx-action#117

@cdivitotawela
Copy link

I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner's default docker context has value tcp://localhost:2376 and running following creates a new context with value unix:///var/run/docker.sock and use new context.

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is DOCKER_HOST=tcp://localhost:2376. I am not sure why this is needed and I believe if we remove this environment variable setting it will fix the issue.
https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/runner_controller.go#L1034

@cdivitotawela
Copy link

I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner's default docker context has value tcp://localhost:2376 and running following creates a new context with value unix:///var/run/docker.sock and use new context.

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is DOCKER_HOST=tcp://localhost:2376. I am not sure why this is needed and I believe if we remove this environment variable setting it will fix the issue. https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/runner_controller.go#L1034

Docker host is only set to DOCKER_HOST=tcp://localhost:2376 when DIND is run. So I don't think my suggestion is correct. Still searching what I need to do to make same docker build image work with github-runner and sef-hosted runner. :(

@rlinstorres
Copy link

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

@john-yacuta-submittable
Copy link

john-yacuta-submittable commented May 20, 2022

I am running into the same issue @ghostsquad is facing where the Set up Docker Buildx step is hanging. Below is the workflow I am running on self-hosted runners in Kubernetes that I believe is using the Docker Image summerwind/actions-runner:latest. I am also unable to see any logs even when including the flag option buildkitd-flags: --debug.

I have tried the following solutions and am still facing the issue:

  • Disabled auto updating of self hosted runners in the deployment
  • Updated “Set up Docker Buildx” stage to v2
  • Updated “Set up Docker Buildx” to leverage both docker-container and kubernetes drivers
  • Updated “Set up Docker Buildx” to use custom commands for buildx such as using curl to download

Any other suggestions? Thanks!

name: GitHub Actions Demo
on: [push]
jobs:
  Explore-GitHub-Actions:
    runs-on: [self-hosted, linux]
    steps:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

Logs (includes manually cancelling it due to hang):

Download and install buildx
  ##[debug]Release v0.8.2 found
  ##[debug]isExplicit: 0.8.2
  ##[debug]explicit? true
  ##[debug]checking cache: /opt/hostedtoolcache/buildx/0.8.2/x64
  ##[debug]not found
  Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd64
  ##[debug]Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd64
  ##[debug]Destination /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
  ##[debug]download complete
  ##[debug]Downloaded to /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
  ##[debug]Caching tool buildx 0.8.2 x64
  ##[debug]source file: /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
  ##[debug]destination /opt/hostedtoolcache/buildx/0.8.2/x64
  ##[debug]destination file /opt/hostedtoolcache/buildx/0.8.2/x64/docker-buildx
  ##[debug]finished caching tool
  Docker plugin mode
  ##[debug]Plugins dir is /home/runner/.docker/cli-plugins
  ##[debug]Plugin path is /home/runner/.docker/cli-plugins/docker-buildx
  ##[debug]Re-evaluate condition on job cancellation for step: 'Set up Docker Buildx'.
  Error: The operation was canceled.
  ##[debug]System.OperationCanceledException: The operation was canceled.
  ##[debug]   at System.Threading.CancellationToken.ThrowOperationCanceledException()
  ##[debug]   at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
  ##[debug]   at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
  ##[debug]   at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken)
  ##[debug]   at GitHub.Runner.Worker.Handlers.NodeScriptActionHandler.RunAsync(ActionRunStage stage)
  ##[debug]   at GitHub.Runner.Worker.ActionRunner.RunAsync()
  ##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
  ##[debug]Finishing: Set up Docker Buildx

@john-yacuta-submittable

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

@rlinstorres are you running the self-hosted runners in Kubernetes? I tried this solution as well and got the same result.

@mumoshu
Copy link
Collaborator

mumoshu commented May 25, 2022

FWIW, what worked for me was:

    - run: docker context create mycontext
    - run: docker context use mycontext
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v1
      with:
        buildkitd-flags: --debug
        endpoint: mycontext

Perhaps the key difference is that I had docker context use mycontext? 🤔

@rlinstorres
Copy link

rlinstorres commented May 25, 2022

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

@rlinstorres are you running the self-hosted runners in Kubernetes? I tried this solution as well and got the same result.

Hi @john-yacuta-submittable, let me send you more information about my environment to clarify and also help you!

  • A snippet of my Dockerfile:
FROM summerwind/actions-runner:latest

ENV BUILDX_VERSION=v0.8.2
ENV DOCKER_COMPOSE_VERSION=v2.5.1

# Docker Plugins
RUN mkdir -p "${HOME}/.docker/cli-plugins" \
  && curl -SsL "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64" -o "${HOME}/.docker/cli-plugins/docker-buildx" \
  && curl -SsL "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-linux-x86_64" -o "${HOME}/.docker/cli-plugins/docker-compose" \
  && chmod +x "${HOME}/.docker/cli-plugins/docker-buildx" \
  && chmod +x "${HOME}/.docker/cli-plugins/docker-compose"
  • EKS version: v1.21.9 (--enable-docker-bridge true --container-runtime containerd
  • actions-runner-controller helm chart version 0.17.3
  • RunnerDeployment and HorizontalRunnerAutoscaler manifest files using my docker image
  • A snippet of my workflow:
jobs:
  build:
    name: Build
    runs-on: fh-ubuntu-small-prod
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up Docker Context for Buildx
        run: docker context create builders
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Also some screenshots:

Screenshot 1:
Screen Shot 2022-05-25 at 9 43 55 AM

Screenshot 2:
Screen Shot 2022-05-25 at 9 44 45 AM

Screenshot 3:
Screen Shot 2022-05-25 at 9 46 05 AM

I hope this information can help you solve your problem.

@john-yacuta-submittable

Thanks @rlinstorres! I managed to resolve my issue. It was an interesting case where I redeployed the node groups in the cluster. After redeployment, they worked just fine. Perhaps it could work for someone else too.

I typically don't like this solution, but we did see that the step in the CI where it was getting stuck was with the file system/kernel level so it was possible the host the self-hosted runners pods were running on, in this case the nodes, was running too hot.

My CI step for "Set up Docker Buildx":

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          driver: docker

@yuanying
Copy link

yuanying commented Aug 22, 2022

I found that DOCKER_CONTEXT: default environment variable resolves this issue too.
We can add this env to RunnerDeployment.spec.template.spec.env

Maybe we can add this value to ARC itself.

Sorry, I was wrong. This workaround doesn't work.
I will look into it further.

@mumoshu
Copy link
Collaborator

mumoshu commented Aug 22, 2022

@yuanying Hey! Thanks a lot for sharing.
Still curious, but what does your workflow definition look like?

Does it look like the below?

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

Then the key takeaway here might be that the default docker context is somehow invisible to the setup-buildx-action and therefore we have to explicitly specify it via either DOCKER_CONTEXT or the endpoint option? 🤔

@metabsd
Copy link

metabsd commented Feb 13, 2023

Hello, First of all thank you for sharing this topic because it affects me too. I have the same problem as you but I can't use the workaround you mention in this post.

Here is how I use my pipeline:

    name: Build and push latest tag from devel and on new commits
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Context for Buildx
        shell: bash
        id: buildx-context
        run: |
          docker context create buildx-context || true

      - name: Use Docker Context for Buildx
        shell: bash
        id: use-buildx-context
        run: |
          docker context use buildx-context || true

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          buildkitd-flags: --debug
          endpoint: buildx-context

The pipeline stuck Set up Docker Buildx

image

milas added a commit to milas/actions-runner-controller that referenced this issue Feb 25, 2023
By default, the `docker:dind` entrypoint will auto-generate mTLS certs
and run with TCP on `0.0.0.0`. This is handy for accessing the running
Docker Engine remotely by then publishing the ports. For the runner,
we don't need (or want) that behavior, so a Unix socket lets us rely
on filesystem permissions.

This also has the benefit of eliminating the need for mTLS, which will
speed up Pod start slightly (no need to generate CA & client certs),
and will fix actions#893 and generally improve compatibility with apps that
interact with the Docker API without requiring a custom Docker context
to be initialized.
@philipsabri
Copy link

Hello, First of all thank you for sharing this topic because it affects me too. I have the same problem as you but I can't use the workaround you mention in this post.

Here is how I use my pipeline:

    name: Build and push latest tag from devel and on new commits
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Context for Buildx
        shell: bash
        id: buildx-context
        run: |
          docker context create buildx-context || true

      - name: Use Docker Context for Buildx
        shell: bash
        id: use-buildx-context
        run: |
          docker context use buildx-context || true

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          buildkitd-flags: --debug
          endpoint: buildx-context

The pipeline stuck Set up Docker Buildx

image

Exactly same setup and issue here. Did you got it to work?

@Nuru
Copy link
Contributor

Nuru commented Mar 17, 2023

At this point in time, shouldn't this now be done via the buildx kubernetes driver?

My question is, what Kubernetes RBAC permissions do the self-hosted runners have by default, are they sufficient to launch builder nodes, and if not, how do we change that? @mumoshu ?

@milas
Copy link
Contributor

milas commented Mar 18, 2023

@mumoshu #2324 fixes this - were you interested in that change?

If not, would you accept a change to the runner/entrypoint.sh to automatically automatically initialize and activate a Docker context? I think that should unblock it working with buildx but there's no need for that AND #2324, so checking in before I make another PR

mumoshu pushed a commit to milas/actions-runner-controller that referenced this issue Mar 25, 2023
By default, the `docker:dind` entrypoint will auto-generate mTLS certs
and run with TCP on `0.0.0.0`. This is handy for accessing the running
Docker Engine remotely by then publishing the ports. For the runner,
we don't need (or want) that behavior, so a Unix socket lets us rely
on filesystem permissions.

This also has the benefit of eliminating the need for mTLS, which will
speed up Pod start slightly (no need to generate CA & client certs),
and will fix actions#893 and generally improve compatibility with apps that
interact with the Docker API without requiring a custom Docker context
to be initialized.
mumoshu pushed a commit to milas/actions-runner-controller that referenced this issue Mar 25, 2023
By default, the `docker:dind` entrypoint will auto-generate mTLS certs
and run with TCP on `0.0.0.0`. This is handy for accessing the running
Docker Engine remotely by then publishing the ports. For the runner,
we don't need (or want) that behavior, so a Unix socket lets us rely
on filesystem permissions.

This also has the benefit of eliminating the need for mTLS, which will
speed up Pod start slightly (no need to generate CA & client certs),
and will fix actions#893 and generally improve compatibility with apps that
interact with the Docker API without requiring a custom Docker context
to be initialized.
mumoshu pushed a commit to milas/actions-runner-controller that referenced this issue Mar 25, 2023
By default, the `docker:dind` entrypoint will auto-generate mTLS certs
and run with TCP on `0.0.0.0`. This is handy for accessing the running
Docker Engine remotely by then publishing the ports. For the runner,
we don't need (or want) that behavior, so a Unix socket lets us rely
on filesystem permissions.

This also has the benefit of eliminating the need for mTLS, which will
speed up Pod start slightly (no need to generate CA & client certs),
and will fix actions#893 and generally improve compatibility with apps that
interact with the Docker API without requiring a custom Docker context
to be initialized.
mumoshu pushed a commit to milas/actions-runner-controller that referenced this issue Mar 28, 2023
By default, the `docker:dind` entrypoint will auto-generate mTLS certs
and run with TCP on `0.0.0.0`. This is handy for accessing the running
Docker Engine remotely by then publishing the ports. For the runner,
we don't need (or want) that behavior, so a Unix socket lets us rely
on filesystem permissions.

This also has the benefit of eliminating the need for mTLS, which will
speed up Pod start slightly (no need to generate CA & client certs),
and will fix actions#893 and generally improve compatibility with apps that
interact with the Docker API without requiring a custom Docker context
to be initialized.
@ryanpeach
Copy link

How exactly was this fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.