Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set singleuser.args to jupyterhub-singleuser #2805

Closed
wants to merge 1 commit into from

Conversation

yuvipanda
Copy link
Collaborator

@yuvipanda yuvipanda commented Jul 19, 2022

After writing
https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-a77643b43a7be453fa8556937bf32b27907e152a10d4c693f3e7670c66a44378,
I would like to ask that we reconsider requiring entrypoint be set to
a specific thing (jupyterhub-singleuser) in the images users are
required to produce. This makes it quite difficult to produce images
that work across multiple tools without having to configure each of
those tools. In this case, we want to use the exact same image in both
binder and apache beam. Apache beam requires an entrypoint be
specified, while at least as of now binder does not - so I wrote a
script that uses heuristics to determine if the caller is apache beam or
something else, and change behavior accordingly. But if I want this
image to also work with JupyterHub in the future, I'll have to modify my
script to detect if JupyterHub is calling it, and exec
jupyterhub-singleuser. Now if dask also wants to use this image, I'd
need to have a code path for that as well. I think this strong coupling
causes a lot of unnecessary friction here.

Another point of confusion here is that cmd in helm / k8s is actually
ENTRYPOINT in docker, while args in helm / k8s is CMD in docker.
Earlier, we were setting cmd to jupyterhub-singleuser, which
overrode any entrypoint the docker builder had set up - causing the
issues we were trying to fix. However, now that KubeSpawner doesn't
actually use args (k8s / helm) in any meaningful fashion, can we set
the default args to jupyterhub-singleuser? This has a few
advantages:

  1. ENTRYPOINT set in Dockerfile is not overriden by what we set in
    helm, which is the current problem. This goes away
  2. All existing docker images continue to work without admins having to
    set extra config or the image having to be rebuilt.
  3. The same image can be used in multiple places without having to be
    aware of who is calling it.

I think this is an approach to solving the original problem that is both
easier to document, easier for our users (existing and new) and
makes life a lot less difficult for people building images.

Re-implementation of #2449

After writing
https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-a77643b43a7be453fa8556937bf32b27907e152a10d4c693f3e7670c66a44378,
I would like to ask that we reconsider *requiring* entrypoint be set to
a specific thing (`jupyterhub-singleuser`) in the images users are
required to produce. This makes it quite difficult to produce images
that *work across multiple tools* without having to configure each of
those tools. In this case, we want to use the *exact same image* in both
binder and apache beam. Apache beam *requires* an entrypoint be
specified, while at least as of now binder does not - so I wrote a
script that uses heuristics to determine if the caller is apache beam or
something else, and change behavior accordingly. But if I want this
image to also work with JupyterHub in the future, I'll have to modify my
script to detect if JupyterHub is calling it, and exec
`jupyterhub-singleuser`. Now if dask also wants to use this image, I'd
need to have a code path for that as well. I think this strong coupling
causes a lot of unnecessary friction here.

Another point of confusion here is that `cmd` in helm / k8s is actually
*ENTRYPOINT* in docker, while `args` in helm / k8s is *CMD* in docker.
Earlier, we were setting `cmd` to `jupyterhub-singleuser`, which
overrode any entrypoint the docker builder had set up - causing the
issues we were trying to fix. However, now that KubeSpawner doesn't
actually use `args` (k8s / helm) in any meaningful fashion, can we set
the default `args` to `jupyterhub-singleuser`? This has a few
advantages:

1. *ENTRYPOINT* set in Dockerfile is *not* overriden by what we set in
   helm, which is the current problem. This goes away
2. All existing docker images continue to work without admins having to
   set extra config or the image having to be rebuilt.
3. The same image can be used in multiple places without having to be
   aware of who is calling it.

I think this is an approach to solving the original problem that is both
*easier to document*, *easier for our users (existing and new)* and
makes life a lot less difficult for people building images.
@yuvipanda
Copy link
Collaborator Author

I can investigate the test failures and try getting this to work if y'all think the general approach is workable, @manics @minrk @consideRatio

@yuvipanda
Copy link
Collaborator Author

Here's the somewhat tortured test I had to write to validate my 'multi dispatch entrypoint' works ok https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-6027b5fe95c80f9fe0bcead4ae1288105dfee85005ad12d2aa07d58c0d2fa8d2R35

@yuvipanda
Copy link
Collaborator Author

/cc @minrk @consideRatio @manics

@manics
Copy link
Member

manics commented Jul 19, 2022

The aim of the original PR was to ensure images that run scripts with custom setup will work without changes. For example, the docker-stacks images run some hooks to customise the environment before running jupyterhub-singleuser:
https://discourse.jupyter.org/t/pyspark-library-is-missing-from-jupyter-pyspark-notebook-when-running-with-jupyterhub-zero-to-jupyterhub-k8s/8450/2

If we override the entrypoint or command this will break the docker-stacks images again:
https://github.com/jupyter/docker-stacks/blob/57ebe55c07d06df70f6fa1166bc5d8ce22b6da34/base-notebook/Dockerfile#L144

@consideRatio
Copy link
Member

consideRatio commented Jul 19, 2022

About docker-stacks images

Docker stacks relies on the startup scripts to do misc logic, but will call jupyterhub-singleuser if a JUPYTERHUB_API_TOKEN env var is found. This is done in three steps: step 1 - start-notebook.sh, step 2 - start-singleuser.sh, step 3 - start.sh.

From a docker-stacks perspective, such images should be started with the start-notebook.sh script if users also want to rely on various environment variable influencing those scripts. They can start like that either by default via their Dockerfile's declared ENTRYPOINT+CMD, or via z2jh's configuration of kubespawner, which in turn configures the k8s args (but not the k8s command aka. Dockerfile's ENTRYPOINT).

About reverting #2449

There is a key part that I know @minrk wants resolved, and that is the possibility to configuring z2jh in a way that makes the Dockerfile's CMD be respected. Can we do that? I recall an issue, so I'll look into this...

Currently, singleuser.cmd configuration in z2jh is used like this:

set_config_if_not_none(c.Spawner, "cmd", "singleuser.cmd")
set_config_if_not_none(c.Spawner, "default_url", "singleuser.defaultUrl")

The default value of cmd in KubeSpawner is None. Good!

Okay, I think we should be good. Users that want the behavior of relying on the CMD in the Dockerfiles can specify singleuser.cmd=null and by doing so override the z2jh default configuration of jupyterhub-singleuser. If a user wants certain profile options to rely on the image, they can also do that by setting the KubeSpawner instance's cmd traitlet to None as well.

My current opinion

I think either choice, of defaulting to jupyterhub-singleuser as a args/CMD, or defaulting to what the image specifies is fine, and I'm not sure what I'd prefer most if this was setup from scratch as a new project.

Given that this isn't a new project, and we have a current default and have forced users that wanted to rely on CMD from their image to declare singleuser.cmd=null, I suggest that we stick with it and revert #2449. Why? Because its a breaking change that forces us to communicate and explain etc. For users that want to rely on CMD in image, it should be made clear that they can if they set singleuser.cmd or the KubeSpawner instance's cmd traitlet to null (YAML) or None (Python).

Example:

singleuser:
  cmd:
  # declaring cmd like above is the same as declaring it like below
  # cmd: null

Comment on lines 397 to +399
cmd:
args:
- jupyterhub-singleuser
Copy link
Member

@consideRatio consideRatio Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For singleuser, there is no args - this is Helm chart config mimicing the jupyterhub configuration options for the spawner, not directly mapped to k8s config specification (command + args)

So, singleuser.cmd is the JupyterHub Spawner cmd, which is in the end rendered to the k8s args by KubeSpawner.

I think looking at #2449 is needed to better revert that.

@yuvipanda
Copy link
Collaborator Author

Given that this isn't a new project, and we have a current default and have forced users that wanted to rely on CMD from their image to declare singleuser.cmd=null, I suggest that we stick with it and revert #2449. Why? Because its a breaking change that forces us to communicate and explain etc. For users that want to rely on CMD in image, it should be made clear that they can if they set singleuser.cmd or the KubeSpawner instance's cmd traitlet to null (YAML) or None (Python).

I'm very much +1 to this too.

@minrk
Copy link
Member

minrk commented Jul 20, 2022

I've lost track of where it was discussed recently, but as long as set-to-null in a user chart to opt-in to the image's custom command is easy and doesn't require replicating the command itself (the original motivation of #2138, I think), I'm happy. As @consideRatio mentioned, changing None to be the default in KubeSpawner seeems to have made this possible, so with this change (after fixing singleuser.args to singleuser.cmd), we have:

This seems the best balance to me.

@manics
Copy link
Member

manics commented Jul 20, 2022

OK, let's revert the breaking change then. Can you update https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/customizing/user-environment.html#choose-and-use-an-existing-docker-image to include singleuser.cmd: null in the config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants