-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make jupyterhub idle server more easily configurable #390
Conversation
I've tested it using some low values (eg: I've opted for a default 1day timeout to promote servers using defaults/minimally-overridden configs to have cleanup enabled with some reasonable time, so user can experiment for some time without being impacted by shutdown, while stopping automatiically when they are clearly not using it anymore. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2156/Result : failure BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-idle-timeout-config DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1368/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please update env.local.example
with the 3 new vars and their documentations and delete the corresponding sample for JUPYTERHUB_CONFIG_OVERRIDE
.
jupyter_idle_kernel_cull_interval = jupyter_idle_kernel_cull_timeout / 2 | ||
c.Spawner.args.extend([ | ||
'--MappingKernelManager.cull_idle_timeout={}'.format(jupyter_idle_kernel_cull_timeout), | ||
'--MappingKernelManager.cull_interval={}'.format(jupyter_idle_kernel_cull_interval), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny I didn't have to manually set the cull_interval
to haft of the cull_idle_timeout
and it was working fine. What is the default value of the 2 cull_interval
if not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
300 seconds I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is the default of 300 from jupyter-server, but with my proposed changes at L205, the value is automatically halved from the timeout if not specified. I had to do this, otherwise the interval between updates took at least 300s regardless of my timeout=10 test.
I assume you also set The server cull timeout starts only when all kernels are down. For us at Ouranos, we set the server timeout to 4 days and the kernel timeout to 1 day. Together, they account for 5 days of complete inactivity before the server is actually gone, so any 4 days long weekend is taken care of. |
Given this new config, if I think we should put some value to avoid the server running forever. Maybe a bigger value like a week would be reasonable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I agree with this though:
Given this new config, if JUPYTER_IDLE_SERVER_CULL_TIMEOUT is not set or is zero (the default), the server would not be killed at all.
I think we should put some value to avoid the server running forever. Maybe a bigger value like a week would be reasonable?
So I'd also like to see that default changed before we merge this
Indeed. Sorry for the confusion. Both variables were set at 10s. |
…ver culling timeout
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2159/Result : failure BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-idle-timeout-config DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1370/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2158/Result : failure BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-idle-timeout-config DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1369/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe set the server timeout to 4 days instead of 3 days so the guy coming back from a 4 days long weekend have a chance resume his work?
I'd consider that an edge case. Also, they should be able to resume work from a notebook saved in the user-workspace that would be re-mounted on server restart. |
Yes it's an edge case. It's not about preserving the notebooks, it's about preserving any custom installs the user had made without properly recording them in a It's fine, 3 or 4, the node admin with adjust if they receive complains over long weekend. |
Overview
Add new variables to easily configure idle jupyter user instances.
Changes
Non-breaking changes
JUPYTER_IDLE_SERVER_CULL_TIMEOUT
,JUPYTER_IDLE_KERNEL_CULL_TIMEOUT
andJUPYTER_IDLE_KERNEL_CULL_INTERVAL
that allows fined-grained configuration of user-kernel and server-widedocker image culling when their activity status reached a certain idle timeout threshold.
JUPYTERHUB_CONFIG_OVERRIDE
specifically for idle server culling.If similar argument parameters should be defined using an older
JUPYTERHUB_CONFIG_OVERRIDE
definition,the new configuration strategy can be skipped by setting
JUPYTER_IDLE_KERNEL_CULL_TIMEOUT=0
.Breaking changes
Related Issue / Discussion
optional-components/jupyterhub-stop-idle
#389 (replaces)