Set singleuser.args to jupyterhub-singleuser #2805

yuvipanda · 2022-07-19T05:34:40Z

After writing
https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-a77643b43a7be453fa8556937bf32b27907e152a10d4c693f3e7670c66a44378,
I would like to ask that we reconsider requiring entrypoint be set to
a specific thing (jupyterhub-singleuser) in the images users are
required to produce. This makes it quite difficult to produce images
that work across multiple tools without having to configure each of
those tools. In this case, we want to use the exact same image in both
binder and apache beam. Apache beam requires an entrypoint be
specified, while at least as of now binder does not - so I wrote a
script that uses heuristics to determine if the caller is apache beam or
something else, and change behavior accordingly. But if I want this
image to also work with JupyterHub in the future, I'll have to modify my
script to detect if JupyterHub is calling it, and exec
jupyterhub-singleuser. Now if dask also wants to use this image, I'd
need to have a code path for that as well. I think this strong coupling
causes a lot of unnecessary friction here.

Another point of confusion here is that cmd in helm / k8s is actually
ENTRYPOINT in docker, while args in helm / k8s is CMD in docker.
Earlier, we were setting cmd to jupyterhub-singleuser, which
overrode any entrypoint the docker builder had set up - causing the
issues we were trying to fix. However, now that KubeSpawner doesn't
actually use args (k8s / helm) in any meaningful fashion, can we set
the default args to jupyterhub-singleuser? This has a few
advantages:

ENTRYPOINT set in Dockerfile is not overriden by what we set in
helm, which is the current problem. This goes away
All existing docker images continue to work without admins having to
set extra config or the image having to be rebuilt.
The same image can be used in multiple places without having to be
aware of who is calling it.

I think this is an approach to solving the original problem that is both
easier to document, easier for our users (existing and new) and
makes life a lot less difficult for people building images.

Re-implementation of #2449

After writing https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-a77643b43a7be453fa8556937bf32b27907e152a10d4c693f3e7670c66a44378, I would like to ask that we reconsider *requiring* entrypoint be set to a specific thing (`jupyterhub-singleuser`) in the images users are required to produce. This makes it quite difficult to produce images that *work across multiple tools* without having to configure each of those tools. In this case, we want to use the *exact same image* in both binder and apache beam. Apache beam *requires* an entrypoint be specified, while at least as of now binder does not - so I wrote a script that uses heuristics to determine if the caller is apache beam or something else, and change behavior accordingly. But if I want this image to also work with JupyterHub in the future, I'll have to modify my script to detect if JupyterHub is calling it, and exec `jupyterhub-singleuser`. Now if dask also wants to use this image, I'd need to have a code path for that as well. I think this strong coupling causes a lot of unnecessary friction here. Another point of confusion here is that `cmd` in helm / k8s is actually *ENTRYPOINT* in docker, while `args` in helm / k8s is *CMD* in docker. Earlier, we were setting `cmd` to `jupyterhub-singleuser`, which overrode any entrypoint the docker builder had set up - causing the issues we were trying to fix. However, now that KubeSpawner doesn't actually use `args` (k8s / helm) in any meaningful fashion, can we set the default `args` to `jupyterhub-singleuser`? This has a few advantages: 1. *ENTRYPOINT* set in Dockerfile is *not* overriden by what we set in helm, which is the current problem. This goes away 2. All existing docker images continue to work without admins having to set extra config or the image having to be rebuilt. 3. The same image can be used in multiple places without having to be aware of who is calling it. I think this is an approach to solving the original problem that is both *easier to document*, *easier for our users (existing and new)* and makes life a lot less difficult for people building images.

yuvipanda · 2022-07-19T05:39:32Z

I can investigate the test failures and try getting this to work if y'all think the general approach is workable, @manics @minrk @consideRatio

yuvipanda · 2022-07-19T06:04:35Z

Here's the somewhat tortured test I had to write to validate my 'multi dispatch entrypoint' works ok https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-6027b5fe95c80f9fe0bcead4ae1288105dfee85005ad12d2aa07d58c0d2fa8d2R35

yuvipanda · 2022-07-19T06:22:22Z

/cc @minrk @consideRatio @manics

manics · 2022-07-19T08:02:05Z

The aim of the original PR was to ensure images that run scripts with custom setup will work without changes. For example, the docker-stacks images run some hooks to customise the environment before running jupyterhub-singleuser:
https://discourse.jupyter.org/t/pyspark-library-is-missing-from-jupyter-pyspark-notebook-when-running-with-jupyterhub-zero-to-jupyterhub-k8s/8450/2

If we override the entrypoint or command this will break the docker-stacks images again:
https://github.com/jupyter/docker-stacks/blob/57ebe55c07d06df70f6fa1166bc5d8ce22b6da34/base-notebook/Dockerfile#L144

consideRatio · 2022-07-19T10:51:57Z

About docker-stacks images

Docker stacks relies on the startup scripts to do misc logic, but will call jupyterhub-singleuser if a JUPYTERHUB_API_TOKEN env var is found. This is done in three steps: step 1 - start-notebook.sh, step 2 - start-singleuser.sh, step 3 - start.sh.

From a docker-stacks perspective, such images should be started with the start-notebook.sh script if users also want to rely on various environment variable influencing those scripts. They can start like that either by default via their Dockerfile's declared ENTRYPOINT+CMD, or via z2jh's configuration of kubespawner, which in turn configures the k8s args (but not the k8s command aka. Dockerfile's ENTRYPOINT).

About reverting #2449

There is a key part that I know @minrk wants resolved, and that is the possibility to configuring z2jh in a way that makes the Dockerfile's CMD be respected. Can we do that? I recall an issue, so I'll look into this...

Currently, singleuser.cmd configuration in z2jh is used like this:

zero-to-jupyterhub-k8s/jupyterhub/files/hub/jupyterhub_config.py

Lines 412 to 413 in 58d7d67

    
           set_config_if_not_none(c.Spawner, "cmd", "singleuser.cmd") 
        
           set_config_if_not_none(c.Spawner, "default_url", "singleuser.defaultUrl")

The default value of cmd in KubeSpawner is None. Good!

Okay, I think we should be good. Users that want the behavior of relying on the CMD in the Dockerfiles can specify singleuser.cmd=null and by doing so override the z2jh default configuration of jupyterhub-singleuser. If a user wants certain profile options to rely on the image, they can also do that by setting the KubeSpawner instance's cmd traitlet to None as well.

My current opinion

I think either choice, of defaulting to jupyterhub-singleuser as a args/CMD, or defaulting to what the image specifies is fine, and I'm not sure what I'd prefer most if this was setup from scratch as a new project.

Given that this isn't a new project, and we have a current default and have forced users that wanted to rely on CMD from their image to declare singleuser.cmd=null, I suggest that we stick with it and revert #2449. Why? Because its a breaking change that forces us to communicate and explain etc. For users that want to rely on CMD in image, it should be made clear that they can if they set singleuser.cmd or the KubeSpawner instance's cmd traitlet to null (YAML) or None (Python).

Example:

singleuser:
  cmd:
  # declaring cmd like above is the same as declaring it like below
  # cmd: null

consideRatio · 2022-07-19T11:24:03Z

jupyterhub/values.yaml

  cmd:
+  args:
+    - jupyterhub-singleuser


For singleuser, there is no args - this is Helm chart config mimicing the jupyterhub configuration options for the spawner, not directly mapped to k8s config specification (command + args)

So, singleuser.cmd is the JupyterHub Spawner cmd, which is in the end rendered to the k8s args by KubeSpawner.

I think looking at #2449 is needed to better revert that.

yuvipanda · 2022-07-19T18:36:41Z

Given that this isn't a new project, and we have a current default and have forced users that wanted to rely on CMD from their image to declare singleuser.cmd=null, I suggest that we stick with it and revert #2449. Why? Because its a breaking change that forces us to communicate and explain etc. For users that want to rely on CMD in image, it should be made clear that they can if they set singleuser.cmd or the KubeSpawner instance's cmd traitlet to null (YAML) or None (Python).

I'm very much +1 to this too.

minrk · 2022-07-20T04:45:59Z

I've lost track of where it was discussed recently, but as long as set-to-null in a user chart to opt-in to the image's custom command is easy and doesn't require replicating the command itself (the original motivation of #2138, I think), I'm happy. As @consideRatio mentioned, changing None to be the default in KubeSpawner seeems to have made this possible, so with this change (after fixing singleuser.args to singleuser.cmd), we have:

default behavior: always works, matches other JupyterHub contexts
custom image CMD (i.e. docker stacks) is easy and clear to opt-in to: singleuser.cmd = null (solves BREAKING: singleuser.cmd defaults to Docker image default #2138 without breaking other images or changing default behavior)

This seems the best balance to me.

manics · 2022-07-20T08:17:20Z

OK, let's revert the breaking change then. Can you update https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/customizing/user-environment.html#choose-and-use-an-existing-docker-image to include singleuser.cmd: null in the config?

consideRatio reviewed Jul 19, 2022

View reviewed changes

minrk mentioned this pull request Aug 1, 2022

Restore jupyterhub-singleuser as the default command #2820

Merged

manics closed this in #2820 Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set singleuser.args to jupyterhub-singleuser #2805

Set singleuser.args to jupyterhub-singleuser #2805

yuvipanda commented Jul 19, 2022 •

edited

Loading

yuvipanda commented Jul 19, 2022

yuvipanda commented Jul 19, 2022

yuvipanda commented Jul 19, 2022

manics commented Jul 19, 2022

consideRatio commented Jul 19, 2022 •

edited

Loading

consideRatio Jul 19, 2022 •

edited

Loading

yuvipanda commented Jul 19, 2022

minrk commented Jul 20, 2022

manics commented Jul 20, 2022 •

edited by consideRatio

Loading

Set singleuser.args to jupyterhub-singleuser #2805

Set singleuser.args to jupyterhub-singleuser #2805

Conversation

yuvipanda commented Jul 19, 2022 • edited Loading

yuvipanda commented Jul 19, 2022

yuvipanda commented Jul 19, 2022

yuvipanda commented Jul 19, 2022

manics commented Jul 19, 2022

consideRatio commented Jul 19, 2022 • edited Loading

About docker-stacks images

About reverting #2449

My current opinion

consideRatio Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

yuvipanda commented Jul 19, 2022

minrk commented Jul 20, 2022

manics commented Jul 20, 2022 • edited by consideRatio Loading

yuvipanda commented Jul 19, 2022 •

edited

Loading

consideRatio commented Jul 19, 2022 •

edited

Loading

consideRatio Jul 19, 2022 •

edited

Loading

manics commented Jul 20, 2022 •

edited by consideRatio

Loading