k3s cluster - jupyter notebook not starting/accessible #3298

ChrisEdel · 2021-05-11T15:58:44Z

Environmental Info:
K3s Version:
k3s version v1.20.6+k3s1 (8d04328)
go version go1.15.10

Node(s) CPU architecture, OS, and Version:
Linux 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 server, 1 agent

Describe the bug:
My cluster consists of 1 server and 1 agent, both behind different NATs. The creation of the cluster worked without any problems. Access to all GPUs is also working. I installed kubeflow(https://github.com/kubeflow/kubeflow) on top of k3s, that also works. However, when creating a Jupyter notebook in the kubeflow dashboard, it only works on the server, i.e. I can create one notebook accessing the GPU of the server, but when creating another one with a GPU (which would need to be executed on the agent, since the server only has 1 GPU), the notebook starts but is not accessible.

notebook-4 runs on the server (working fine), and notebook-3 runs on the agent (not accessible).

kubectl describe pod notebook-3-0 -n kubeflow-user-example-com

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m33s                  default-scheduler  Successfully assigned kubeflow-user-example-com/notebook-3-0 to untersberg
  Normal   Pulling    5m33s                  kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal   Pulled     5m32s                  kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.492170458s
  Normal   Created    5m32s                  kubelet            Created container istio-init
  Normal   Started    5m32s                  kubelet            Started container istio-init
  Normal   Pulling    5m31s                  kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal   Pulled     5m31s                  kubelet            Container image "gcr.io/arrikto-public/tensorflow-1.15.2-notebook-gpu:1.0.0.arr1" already present on machine
  Normal   Created    5m31s                  kubelet            Created container notebook-3
  Normal   Started    5m31s                  kubelet            Started container notebook-3
  Normal   Pulled     5m30s                  kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.366426061s
  Normal   Created    5m30s                  kubelet            Created container istio-proxy
  Normal   Started    5m30s                  kubelet            Started container istio-proxy
  Warning  Unhealthy  5m2s (x14 over 5m28s)  kubelet            Readiness probe failed: Get "http://10.42.1.16:15021/healthz/ready": dial tcp 10.42.1.16:15021: connect: connection refused
  Warning  Unhealthy  32s (x96 over 3m42s)   kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

kubectl logs notebook-3-0 -n kubeflow-user-example-com

Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/notebook-3-0. Please use `kubectl.kubernetes.io/default-container` instead
[I 15:49:10.702 LabApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[W 15:49:10.839 LabApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 15:49:11.009 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 15:49:11.009 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 15:49:11.172 LabApp] Serving notebooks from local directory: /home/jovyan
[I 15:49:11.172 LabApp] The Jupyter Notebook is running at:
[I 15:49:11.172 LabApp] http://notebook-3-0:8888/notebook/kubeflow-user-example-com/notebook-3/
[I 15:49:11.172 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Steps To Reproduce:

Installed K3s:
On the server
export K3S_EXTERNAL_IP=<server_public_ip>
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--write-kubeconfig ~/.kube/config --write-kubeconfig-mode 666 --tls-san $K3S_EXTERNAL_IP --node-external-ip=$K3S_EXTERNAL_IP" sh -

On the agent:
export K3S_TOKEN=<token>
export K3S_URL=https://<server_public_ip>:6443
export INSTALL_K3S_EXEC="--token $K3S_TOKEN --server $K3S_URL"

curl -sfL https://get.k3s.io | sh -

Afterwards, I installed kubeflow as described here: https://github.com/kubeflow/manifests#install-with-a-single-command

Expected behavior:
I expected both notebooks to be accessible in the dashboard.

Actual behavior:
The notebook notebook-3 (on the agent) is not accessible.

Additional context / logs:
Logs from notebook-4

kubectl logs notebook-4-0 -n kubeflow-user-example-com

Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/notebook-4-0. Please use `kubectl.kubernetes.io/default-container` instead
[I 15:49:58.940 LabApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[W 15:49:59.093 LabApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 15:49:59.263 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 15:49:59.263 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 15:49:59.434 LabApp] Serving notebooks from local directory: /home/jovyan
[I 15:49:59.434 LabApp] The Jupyter Notebook is running at:
[I 15:49:59.434 LabApp] http://notebook-4-0:8888/notebook/kubeflow-user-example-com/notebook-4/
[I 15:49:59.434 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

kubectl describe pod notebook-4-0 -n kubeflow-user-example-com

Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  6m11s  default-scheduler  Successfully assigned kubeflow-user-example-com/notebook-4-0 to gaisberg
  Normal  Pulling    6m10s  kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal  Pulled     6m9s   kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.23111517s
  Normal  Created    6m9s   kubelet            Created container istio-init
  Normal  Started    6m9s   kubelet            Started container istio-init
  Normal  Pulled     6m8s   kubelet            Container image "gcr.io/arrikto-public/tensorflow-1.15.2-notebook-gpu:1.0.0.arr1" already present on machine
  Normal  Created    6m8s   kubelet            Created container notebook-4
  Normal  Started    6m8s   kubelet            Started container notebook-4
  Normal  Pulling    6m8s   kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal  Pulled     6m7s   kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.224361895s
  Normal  Created    6m7s   kubelet            Created container istio-proxy
  Normal  Started    6m7s   kubelet            Started container istio-proxy

The text was updated successfully, but these errors were encountered:

brandond · 2021-05-11T16:27:20Z

It sounds like the agent and server can communicate with each other, but pods on different nodes cannot. What flannel backend are you using? You may have better luck with wireguard compared to vxlan if both nodes are behind different firewalls.

ChrisEdel · 2021-05-11T17:56:22Z

Yes, I thought so, too. They are behind different firewalls. I did not change anything, so I guess the one that comes with the standard installation? I am pretty new to k3s, do you have any idea what I could specifically do? That would be very much appreciated!

Also, that is in my /run/flannel/subnet.env:

FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

brandond · 2021-05-11T18:24:55Z

You might try wireguard instead so that it can tunnel between the two networks; you'll need to start both nodes with --node-external-ip=$PUBLIC_IP and --flannel-backend=wireguard. Most CNI backends assume your nodes are on a flat network and can reach each other directly. Even with wireguard you may have a hard time getting this working, depending on how restrictive your firewalls are.

brandond closed this as completed May 11, 2021

k3s-io locked and limited conversation to collaborators May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

k3s cluster - jupyter notebook not starting/accessible #3298

k3s cluster - jupyter notebook not starting/accessible #3298

ChrisEdel commented May 11, 2021

brandond commented May 11, 2021 •

edited

Loading

ChrisEdel commented May 11, 2021

brandond commented May 11, 2021 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

k3s cluster - jupyter notebook not starting/accessible #3298

k3s cluster - jupyter notebook not starting/accessible #3298

Comments

ChrisEdel commented May 11, 2021

brandond commented May 11, 2021 • edited Loading

ChrisEdel commented May 11, 2021

brandond commented May 11, 2021 • edited Loading

This issue was moved to a discussion.

brandond commented May 11, 2021 •

edited

Loading

brandond commented May 11, 2021 •

edited

Loading