-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ephemeral (single use) runner registrations #510
Comments
@MatisseHack |
Yeah, this is why it's not part of the documentation or command line help. Because the partially implemented code doesn't work and has races so we didn't ship it. We have work on the backlog to register a runner with the service as ephemeral. It has to be a service side feature so the service is cooperative and doesn't assign it another job in the window when the runner or outside / in automation decides to tear it down. |
I've used the Makes sens why it wasn't shipped 😄 . I look forward to using it when it's supported though! |
We solved this issue by adding a timeout to our ephemeral workers. The timeout will deregister and kill the worker. #!/bin/bash -x
set -eux
/usr/bin/actions/register
# Automatically de-register in 30 min
echo "sudo /usr/bin/actions/deregister && sudo shutdown -h now" | at now + 30 minutes
sudo -i -u actions /home/actions/run.sh --once
/usr/bin/actions/deregister
shutdown -h now Actions team: please support We currently have ~16x parallelization on our CI to keep builds under 20 minutes. When moving to self-hosted, we had the following options:
Unfortunately, the second strategy only works well with So please consider supporting |
We also use the |
Also @bryanmacfarlane - two more requests that would make ephermeral workers easier:
Our hackish solution is to keep a registered worker that does no work: It simply re-registers itself once a day to avoid the 30 day cleanup 😂 Thanks! |
👋 there, Do we have a date as to which the |
This really sucks. It's the last nail in the coffin for ephemeral self-hosted workers. We're going to switch to permanent self-hosted workers and then probably switch to another CI provider :( |
@rclmenezes ack on #1 and #2. we're currently designing and working on it. The plan is exactly what you laid out, register the runner ephemeral with the backend service so the service auto cleans it up after the job is complete and the runner / container exits. |
That's amazing - thanks Bryan! 😄 We can definitely use permanent workers for the next few weeks until that's pushed. Thanks again! |
@bryanmacfarlane Just to verify, does this mean that single use runners won't accept more than one job? Right now I'm using a custom orchestrator that keeps a certain amount of single use runners running, removing them from the GitHub API when the containers exit. |
we're looking at trying to use self-hosted runners with a cluster scheduler (Slurm in our case): it would also be useful for us if there was an additional option that would exit immediately if there are no jobs queued This could be more general as a timeout feature (i.e. quit if no jobs received after X mins) |
Looking forward to this! I take it that when this is implemented, the backend won't be erroring out anymore when zero suitable runners are present, but instead just waits until one has registered itself (e.g. in response to a webhook)? It would be quite difficult to start the first runner otherwise... |
Any updates on this? Perhaps it will be solved by #660 |
We (https://github.com/Brightspace) are eagerly awaiting this feature 😄 |
We (https://github.com/airslateinc) too |
I am waiting for this too.. |
https://github.com/esp8266/Arduino/ would love this, too. Single-use self-hosted support would let us safely run PRs on actual embedded hardware, giving better coverage than our simulated host-based environment. |
We'd love to make use of those for https://github.com/lxc including providing tooling so someone can easily run a fixed number of instances (containers or VMs) on a LXD cluster, having each of those handle a single job before getting destroyed and replaced by a new one. This would allow for a safe, low-cost, self-hosted set of runners running a wide variety of Linux distributions. |
Forgot to mention but this would also make it easy to run on non-x86 architectures. LXD was similarly used to build up the foreign architecture support of Travis-CI where it's used to run arm64, ppc64el and s390x instances. Something similar could easily be done with Github actions if it wasn't for this limitation :) |
Any updates on this? |
@bryanmacfarlane any updates on this? |
We would like to use this feature as part of CML for running ephemeral continuous integration jobs. |
Not having a deterministic build environment is a show stopper for us 👎 |
Help us @bryanmacfarlane, you're our only hope! 🙏 |
We are experiencing this issue on plain hardware server with 10+ runners, any ideas how to fix it? No |
Is this resolved with the merge of #660? |
I believe so. |
Closing this as #660 has been merged, |
…r#510 - Change setup script to use --ephemeral which replaces --once Support multiple configurations mapped by workflow labels - Consume workflow_job instead of check_run events. - Give RunnerConf.profiles a default. - Update README - Update sample config
Implement changes for emphemeral fix of actions runner. actions/runner#510 - Change setup script to use --ephemeral which replaces --once Support multiple configurations mapped by workflow labels - Consume workflow_job instead of check_run events. - Update README - Update sample config - Support remote images and LXD server - Fix max_workers limit - Add systemd user service unit - Add dev dependencies to setup.cfg/py - Update Makefile to install via setup and pip - Show version via help - Set loglevel via cli argument - Add Alpine image build script and OpenRC startup Changes to AppConf: - RunnerConf: Add default values - RunnerConf: Add max_workers - RunnerConf: Add semaphore for limiting - Add Remote object - AppConf: Add dirs object with common paths RunnerEvent: - Generate instname on object creation LXDRunner: - Remove _thread_launch. Use ThreadPoolExecutor - Name threads with instname: lxdrunner-xxxxxx - Add LXD event listener to watch for completion events RunManager: - Use semaphore for runner conf limits - Submit to threadpool - Switch to multiple queues: One per config - Fix cleanup schedule Update Workflow: - build wheel - build lxd image - push to releases
It seems like the Registering an
That doesn't work with the ephemeral system because the registration token expires. While you don't need a registration token in the other case as the registration stays valid. We're using this to use the same workflows across both cloud and a self hosted runners, because some builds have tie in to the enterprise network (for the time being). |
@pje I'd like to confirm that there is no way to guarantee which workflow an ephemeral runner will pick up in case there are several runners are starting simultaneously for several |
Describe the bug
When starting a self hosted runner with
./run.cmd --once
, the runner sometimes accepts a second job before shutting down, which causes that second job to fail with the message:This looks like the same issue recently fixed here: microsoft/azure-pipelines-agent#2728
To Reproduce
Steps to reproduce the behavior:
Create a repo, enable GitHub Actions, and add a new workflow
Configure a new runner on your machine
Run the runner with
./run.cmd --once
Queue two runs of your workflow
The first job will run and the runner will go offline
(Optionally) configure and start a second runner
The second job will time out after several minutes with the message:
(where
[runner-name]
is the name of the first runner)Also: trying to remove the first runner with the command
./config.cmd remove --token [token]
will result in the following error until the second job times out:Expected behavior
The second job should run on (and wait for) any new runner that comes online rather than try to run as a second job on the, now offline, original runner.
Runner Version and Platform
2.262.1 on Windows
Runner and Worker's Diagnostic Logs
_diag.zip
The text was updated successfully, but these errors were encountered: