Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2809 : replace TCP docker daemon connection with unix socket #2833

Merged
merged 3 commits into from
Sep 22, 2023

Conversation

dm3ch
Copy link
Contributor

@dm3ch dm3ch commented Aug 23, 2023

TCP docker connection causes the following problem:

dind: fails because of "ERROR: could not create a builder instance with TLS data loaded from environment."
kubernetes: fails because of "/usr/bin/tar: ../../../.docker: Cannot mkdir: Permission denied"

This issue have been workaround on legacy work mode by using unix socket for interaction with docker daemon ( #2324 )

This PR reimplements same approach for RunnerSets.

P.S. I preferred this approach cause in this case RunnerSets can work with workflows wrote for public action runners without any changes in workflow to make it work (without adding steps to create buildx context properly)

/fix #2809

@dm3ch dm3ch requested review from mumoshu, toast-gear, nikola-jokic and a team as code owners August 23, 2023 16:07
@Link- Link- added gha-runner-scale-set Related to the gha-runner-scale-set mode attention Requires attention labels Aug 24, 2023
@Link- Link- added the needs testing Requires the e2e tests to pass and/or manual tests to verify label Sep 7, 2023
nikola-jokic
nikola-jokic previously approved these changes Sep 13, 2023
Copy link
Collaborator

@nikola-jokic nikola-jokic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dm3ch
Copy link
Contributor Author

dm3ch commented Sep 13, 2023

Do I need to fix something or merge fresh master?

@Link- Link- self-requested a review September 14, 2023 07:54
@Link- Link- added this to the gha-runner-scale-set-0.6.0 milestone Sep 14, 2023
Copy link
Member

@Link- Link- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to address the concern of group id before we merge this

@Link-
Copy link
Member

Link- commented Sep 20, 2023

As we plan to upgrade the Docker engine of the runner image to 24.0.6 this fix is going to be a necessity for DinD mode to work.

It resolves the error from running docker/setup-buildx-action@v3:

 ERROR: could not create a builder instance with TLS data loaded from environment. Please use `docker context create <context-name>` to create a context for current environment and then create a builder instance with `docker buildx create <context-name>`

Other workarounds are discussed here: #893

@Link-
Copy link
Member

Link- commented Sep 21, 2023

TODO before releasing

@shreelola
Copy link

Running containers inside containers is not working with this change.
Do you have any workaround for this?

docker: Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?.

@nikola-jokic
Copy link
Collaborator

Hey @shreelola,

Did you configure your autoscaling runner set with dind mode?

@shreelola
Copy link

shreelola commented Nov 2, 2023

Hi @nikola-jokic

Yes.
Standard docker pulls and run works with this change inside the runner pod, but when we try to run the container inside
the container, it fails.

Edit:

I delved further and discovered that the issue is not related to this change; it appears that the problem stems from the controller modification.

The old controller version was using TCP here: https://github.com/actions/actions-runner-controller/blob/v0.26.0/controllers/runner_controller.go#L1011
with the new runner version, it is using unix socket: https://github.com/actions/actions-runner-controller/blob/v0.27.5/controllers/actions.summerwind.net/runner_controller.go#L1072.

We started experiencing this issue after upgrading the runner version. Any insights would be highly appreciated. Thank you for your time.

@nikola-jokic
Copy link
Collaborator

The latest version uses unix sockets instead of tcp sockets. This issue references the same problem. As a workaround, may I suggest commenting out the entire containerMode object and providing the dind spec as it was described in prior releases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attention Requires attention gha-runner-scale-set Related to the gha-runner-scale-set mode needs testing Requires the e2e tests to pass and/or manual tests to verify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Working non-trivial example for gha-runner-scale-set
4 participants