Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s cluster - jupyter notebook not starting/accessible #3298

Closed
ChrisEdel opened this issue May 11, 2021 · 3 comments
Closed

k3s cluster - jupyter notebook not starting/accessible #3298

ChrisEdel opened this issue May 11, 2021 · 3 comments

Comments

@ChrisEdel
Copy link

Environmental Info:
K3s Version:
k3s version v1.20.6+k3s1 (8d04328)
go version go1.15.10

Node(s) CPU architecture, OS, and Version:
Linux 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 server, 1 agent

Describe the bug:
My cluster consists of 1 server and 1 agent, both behind different NATs. The creation of the cluster worked without any problems. Access to all GPUs is also working. I installed kubeflow(https://github.com/kubeflow/kubeflow) on top of k3s, that also works. However, when creating a Jupyter notebook in the kubeflow dashboard, it only works on the server, i.e. I can create one notebook accessing the GPU of the server, but when creating another one with a GPU (which would need to be executed on the agent, since the server only has 1 GPU), the notebook starts but is not accessible.

notebook-4 runs on the server (working fine), and notebook-3 runs on the agent (not accessible).

image

kubectl describe pod notebook-3-0 -n kubeflow-user-example-com

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m33s                  default-scheduler  Successfully assigned kubeflow-user-example-com/notebook-3-0 to untersberg
  Normal   Pulling    5m33s                  kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal   Pulled     5m32s                  kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.492170458s
  Normal   Created    5m32s                  kubelet            Created container istio-init
  Normal   Started    5m32s                  kubelet            Started container istio-init
  Normal   Pulling    5m31s                  kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal   Pulled     5m31s                  kubelet            Container image "gcr.io/arrikto-public/tensorflow-1.15.2-notebook-gpu:1.0.0.arr1" already present on machine
  Normal   Created    5m31s                  kubelet            Created container notebook-3
  Normal   Started    5m31s                  kubelet            Started container notebook-3
  Normal   Pulled     5m30s                  kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.366426061s
  Normal   Created    5m30s                  kubelet            Created container istio-proxy
  Normal   Started    5m30s                  kubelet            Started container istio-proxy
  Warning  Unhealthy  5m2s (x14 over 5m28s)  kubelet            Readiness probe failed: Get "http://10.42.1.16:15021/healthz/ready": dial tcp 10.42.1.16:15021: connect: connection refused
  Warning  Unhealthy  32s (x96 over 3m42s)   kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

kubectl logs notebook-3-0 -n kubeflow-user-example-com

Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/notebook-3-0. Please use `kubectl.kubernetes.io/default-container` instead
[I 15:49:10.702 LabApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[W 15:49:10.839 LabApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 15:49:11.009 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 15:49:11.009 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 15:49:11.172 LabApp] Serving notebooks from local directory: /home/jovyan
[I 15:49:11.172 LabApp] The Jupyter Notebook is running at:
[I 15:49:11.172 LabApp] http://notebook-3-0:8888/notebook/kubeflow-user-example-com/notebook-3/
[I 15:49:11.172 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Steps To Reproduce:

  • Installed K3s:
    On the server
    export K3S_EXTERNAL_IP=<server_public_ip>
    curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--write-kubeconfig ~/.kube/config --write-kubeconfig-mode 666 --tls-san $K3S_EXTERNAL_IP --node-external-ip=$K3S_EXTERNAL_IP" sh -

On the agent:
export K3S_TOKEN=<token>
export K3S_URL=https://<server_public_ip>:6443
export INSTALL_K3S_EXEC="--token $K3S_TOKEN --server $K3S_URL"

curl -sfL https://get.k3s.io | sh -

Afterwards, I installed kubeflow as described here: https://github.com/kubeflow/manifests#install-with-a-single-command

Expected behavior:
I expected both notebooks to be accessible in the dashboard.

Actual behavior:
The notebook notebook-3 (on the agent) is not accessible.

Additional context / logs:
Logs from notebook-4

kubectl logs notebook-4-0 -n kubeflow-user-example-com

Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/notebook-4-0. Please use `kubectl.kubernetes.io/default-container` instead
[I 15:49:58.940 LabApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[W 15:49:59.093 LabApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 15:49:59.263 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 15:49:59.263 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 15:49:59.434 LabApp] Serving notebooks from local directory: /home/jovyan
[I 15:49:59.434 LabApp] The Jupyter Notebook is running at:
[I 15:49:59.434 LabApp] http://notebook-4-0:8888/notebook/kubeflow-user-example-com/notebook-4/
[I 15:49:59.434 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

kubectl describe pod notebook-4-0 -n kubeflow-user-example-com

Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  6m11s  default-scheduler  Successfully assigned kubeflow-user-example-com/notebook-4-0 to gaisberg
  Normal  Pulling    6m10s  kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal  Pulled     6m9s   kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.23111517s
  Normal  Created    6m9s   kubelet            Created container istio-init
  Normal  Started    6m9s   kubelet            Started container istio-init
  Normal  Pulled     6m8s   kubelet            Container image "gcr.io/arrikto-public/tensorflow-1.15.2-notebook-gpu:1.0.0.arr1" already present on machine
  Normal  Created    6m8s   kubelet            Created container notebook-4
  Normal  Started    6m8s   kubelet            Started container notebook-4
  Normal  Pulling    6m8s   kubelet            Pulling image "docker.io/istio/proxyv2:1.9.0"
  Normal  Pulled     6m7s   kubelet            Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 1.224361895s
  Normal  Created    6m7s   kubelet            Created container istio-proxy
  Normal  Started    6m7s   kubelet            Started container istio-proxy
@brandond
Copy link
Member

brandond commented May 11, 2021

It sounds like the agent and server can communicate with each other, but pods on different nodes cannot. What flannel backend are you using? You may have better luck with wireguard compared to vxlan if both nodes are behind different firewalls.

@ChrisEdel
Copy link
Author

Yes, I thought so, too. They are behind different firewalls. I did not change anything, so I guess the one that comes with the standard installation? I am pretty new to k3s, do you have any idea what I could specifically do? That would be very much appreciated!

Also, that is in my /run/flannel/subnet.env:

FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

@brandond
Copy link
Member

brandond commented May 11, 2021

You might try wireguard instead so that it can tunnel between the two networks; you'll need to start both nodes with --node-external-ip=$PUBLIC_IP and --flannel-backend=wireguard. Most CNI backends assume your nodes are on a flat network and can reach each other directly. Even with wireguard you may have a hard time getting this working, depending on how restrictive your firewalls are.

@k3s-io k3s-io locked and limited conversation to collaborators May 11, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants