fix: clean up ssh agent after ourselves #51

blaggacao · 2023-08-31T10:23:19Z

Context

The ssh connection to a discovery host leaves a dangling ssh agent.
If the workflow runner itself runs on a persistent runner, that process might just accumulate.

Solution

Clean up after ourselves. Unfortunately, only js actions allow to trigger a post action which is why a slight refactor needed to be done to wrap the bash action inside a js action.

nrdxp · 2023-08-31T16:29:46Z

Is this something you have observed in practice?

blaggacao · 2023-08-31T18:11:14Z

Is this something you have observed in practice?

Yes, this had reportedly happend in some scenarios. Although, now, on second thought, I doubt it since the discovery is the only persistent runner and it regularly would not set up a ssh connection to itself.

Workers are regularly not persistent so they shouldn't suffer from dangling ssh agents.

Only in the scenario that a worker is serviced by a hosted runner can those be left dangling.

@michalrus do you happen to have more context on what we had observed, in this context?

michalrus · 2023-09-01T11:27:23Z

If the workflow runner itself runs on a persistent runner, that process might just accumulate.

I don’t think this is the case, since if an ssh-agent stayed running from a previous job, it will be reused

@michalrus do you happen to have more context on what we had observed, in this context?

Yes, so we’re using NixOS on the discovery host, and the standard GitHub Runner service (services.github-runners.runnerN = { … } – defined here).

We’re not setting the .ephemeral option, which causes systemd service restart (and clean-up of not only processes, but also of the work dir), because several other runners are faster without cleaning their work dir.

So the ssh-agent process that this action (used to) launch in the background stayed there between runs, and for whatever reason didn’t react to SIGTERM from systemd when I requested the restart of the systemd service, and I had to kill it manually (or wait 90 s for SIGKILL – this value could also be tweaked).

But now that I think of it, I could just set .ephemeral = true in discovery runners?

michalrus

If it works, then it’s looking good =) Thanks!

michalrus · 2023-09-01T11:33:32Z

setup-discovery-ssh/setup.sh

@@ -31,6 +31,8 @@ if ssh-keygen -y -f "$ssh_key_file" &>/dev/null; then
  ssh-add -q "$ssh_key_file" && rm "$ssh_key_file"
  # Auth agent socket to ssh config
  echo "IdentityAgent $SSH_AUTH_SOCK" >> "$SSH_CONFIG_FILE"
+  # Save pid to cleanup in post step
+  echo "SSH_AGENT_PID=$SSH_AGENT_PID" >> "$GITHUB_STATE"


So later GHA export that automatically?

fix: clean up ssh agent after ourselves

b5fc670

blaggacao requested a review from nrdxp August 31, 2023 10:28

michalrus approved these changes Sep 1, 2023

View reviewed changes

michalrus reviewed Sep 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clean up ssh agent after ourselves #51

fix: clean up ssh agent after ourselves #51

blaggacao commented Aug 31, 2023

nrdxp commented Aug 31, 2023

blaggacao commented Aug 31, 2023

michalrus commented Sep 1, 2023

michalrus left a comment

michalrus Sep 1, 2023

fix: clean up ssh agent after ourselves #51

Are you sure you want to change the base?

fix: clean up ssh agent after ourselves #51

Conversation

blaggacao commented Aug 31, 2023

Context

Solution

nrdxp commented Aug 31, 2023

blaggacao commented Aug 31, 2023

michalrus commented Sep 1, 2023

michalrus left a comment

Choose a reason for hiding this comment

michalrus Sep 1, 2023

Choose a reason for hiding this comment