Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clean up ssh agent after ourselves #51

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

blaggacao
Copy link
Collaborator

Context

The ssh connection to a discovery host leaves a dangling ssh agent.
If the workflow runner itself runs on a persistent runner, that process might just accumulate.

Solution

Clean up after ourselves. Unfortunately, only js actions allow to trigger a post action which is why a slight refactor needed to be done to wrap the bash action inside a js action.

@blaggacao blaggacao requested a review from nrdxp August 31, 2023 10:28
@nrdxp
Copy link
Contributor

nrdxp commented Aug 31, 2023

Is this something you have observed in practice?

@blaggacao
Copy link
Collaborator Author

Is this something you have observed in practice?

Yes, this had reportedly happend in some scenarios. Although, now, on second thought, I doubt it since the discovery is the only persistent runner and it regularly would not set up a ssh connection to itself.

Workers are regularly not persistent so they shouldn't suffer from dangling ssh agents.


Only in the scenario that a worker is serviced by a hosted runner can those be left dangling.

@michalrus do you happen to have more context on what we had observed, in this context?

@michalrus
Copy link

If the workflow runner itself runs on a persistent runner, that process might just accumulate.

I don’t think this is the case, since if an ssh-agent stayed running from a previous job, it will be reused

@michalrus do you happen to have more context on what we had observed, in this context?

Yes, so we’re using NixOS on the discovery host, and the standard GitHub Runner service (services.github-runners.runnerN = { … }defined here).

We’re not setting the .ephemeral option, which causes systemd service restart (and clean-up of not only processes, but also of the work dir), because several other runners are faster without cleaning their work dir.

So the ssh-agent process that this action (used to) launch in the background stayed there between runs, and for whatever reason didn’t react to SIGTERM from systemd when I requested the restart of the systemd service, and I had to kill it manually (or wait 90 s for SIGKILL – this value could also be tweaked).

But now that I think of it, I could just set .ephemeral = true in discovery runners?

Copy link

@michalrus michalrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works, then it’s looking good =) Thanks!

@@ -31,6 +31,8 @@ if ssh-keygen -y -f "$ssh_key_file" &>/dev/null; then
ssh-add -q "$ssh_key_file" && rm "$ssh_key_file"
# Auth agent socket to ssh config
echo "IdentityAgent $SSH_AUTH_SOCK" >> "$SSH_CONFIG_FILE"
# Save pid to cleanup in post step
echo "SSH_AGENT_PID=$SSH_AGENT_PID" >> "$GITHUB_STATE"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So later GHA export that automatically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants