Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes ingress #626

Closed
witten opened this issue Apr 3, 2019 · 18 comments
Closed

Kubernetes ingress #626

witten opened this issue Apr 3, 2019 · 18 comments

Comments

@witten
Copy link
Contributor

witten commented Apr 3, 2019

Based on the documentation, it sounds like the current approach for reaching the Jupyter Enterprise Gateway running on Kubernetes is via a NodePort. However, on most clusters, this does not actually make the port available to the outside world, but rather just to internal cluster IPs.

So this ticket is a request to add support for including an optional Kubernetes Ingress resource when deploying Jupyter Enterprise Gateway. The idea is that if the cluster is configured for ingress, then the Ingress resource will thereby make the Gateway accessible from the outside world. This makes it much more convenient to reach the Gateway, say from a developer's local laptop. (Without ingress, a developer may need to maintain a kubectl port-forward proxy or similar.)

Note that making ingress optional may be easier to do once a Helm chart is available (#625).

@lresende
Copy link
Member

lresende commented Apr 3, 2019

You can find an ingress template at our ansible scripts. The issue that I see with providing the yaml at eg repo is that there are different versions of ingress implementations and from playing with couple ones they all require different configurations and I don't think we can easily maintain/test on the multiple available ingress implementations, but I am open to suggestions... maybe just add this example to the documentation and users would translate/configure based on the ingress they are using ?

@witten
Copy link
Contributor Author

witten commented Apr 3, 2019

One way I've seen around this, besides making ingress itself optional, is to support configurable ingress annotation keys/values. Here's an example of an arbitrary Helm template that takes this approach: https://github.com/helm/charts/blob/master/incubator/istio/templates/deployment/ingress.yaml .. Note the .Values.ingress.annotations that are injected into the template.

That way, it's up to the user deploying (in this case, deploying the Helm chart) to make the Ingress resource work with their own ingress implementation.

But you make a fair point that maybe it's not worth trying to make this work for the variety of ingress implementations. As soon as you support Ingress resources, the next request will be for Certificate resources.. Documentation examples may be sufficient, especially if there's a base Helm chart (as per #625) that users can derive from to add their own custom resources like Ingresses and Certificates.

@witten
Copy link
Contributor Author

witten commented Apr 5, 2019

Some additional thoughts from @kevin-bates on the topic of ingress from #625:

Also, in working with this, I found the k8s-master-ip thing to be virtually useless unless you're in a single-node env (for POCs, etc.). As a result, I'm thinking it would be really helpful if we could add a template (in another PR), that produces the equivalent of what our ansible scripts produce for a traefik ingress. Of course, we'd want to make the ingress 'provider' a variable, along with the path (which is likely a provider-specific variable). But I see that as another great thing to add from an example standpoint for others to glean from.

@lresende
Copy link
Member

Here is an example using Traefik:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: enterprise-gateway
  name: enterprise-gateway-ingress
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.frontend.rule.type: PathPrefixStrip
spec:
  rules:
  - host: {{ groups['master'][0] }}.{{ ansible_domain }}
    http:
      paths:
      - path: /gateway
        backend:
          serviceName: enterprise-gateway
servicePort: 8888

Here is an example using Nginx

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: enterprise-gateway
  name: enterprise-gateway-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/ssl-redirect: false
    nginx.ingress.kubernetes.io/force-ssl-redirect: false
spec:
  rules:
  - host:
    http:
      paths:
      - path: /gateway/?(.*)
        backend:
          serviceName: enterprise-gateway
          servicePort: 8888

@esevan
Copy link
Contributor

esevan commented Apr 20, 2019

BTW, why should EG use NodePort, LoadBalancer, or Ingress? I wonder the cases where EG should be exported out from the cluster scope.

If EG is designed to run as a backend service, then residing in cluster network looks reasonable to me. i.e. headless service.

@kevin-bates
Copy link
Member

@esevan - I see your point. I suppose that works if the notebook instance is also running in the same k8s cluster (i.e., the combined Hub/EG scenario). However, I view our primary configuration as more of a Kernel As A Service model where the notebook instance is running external to the EG k8s cluster. For that, we need to provide access into EG.

Would those descriptions match where your question is coming from? I suppose the KG_URL would be something like what's in the helm chart for KIP's KIP_GATEWAY_HOST.

I wonder if my view of KAAS is not appropriate for K8s? The JupyterHub/EG scenario is very compelling and I could see that being the primary use case.

@esevan
Copy link
Contributor

esevan commented Apr 22, 2019

@kevin-bates Thanks for your response. I'm using only The JupyterHub/EG scenario which is in-cluster scenario, and I admit I have a narrow view of EG use case in k8s.

I think this issue comes to "What is our default scenario" and "How to support other options?.
For the latter thing, IMHO, ClusterIP can be another option to support.

@kevin-bates
Copy link
Member

@esevan - yes, I agree that ClusterIP should be an option to support and, because that avoids the ingress setup completely, I wonder if it shouldn't be the default. Please confirm, but wouldn't this make the KG_URL configured for Notebook/NB2KG essentially become http://enterprise-gateway.enterprise-gateway:8888?

In the typical Hub setup, where Notebooks are spawned, how is each spawned Notebook server exposed outside the cluster? Does the Hub essentially act like a proxy such that its port is the only thing exposed? (Sorry, I haven't spent enough time with Hub.)

@witten
Copy link
Contributor Author

witten commented Apr 22, 2019

For what it's worth, I did file this ticket originally with the non-Hub, local Jupyter use case in mind. Meaning, Jupyter is running locally on a dev's laptop, and connecting to a remote in-cluster Enterprise Gateway via nb2kg. A dev with access to the cluster can always set up a manual kubectl port-forward in order to connect to Enterprise Gateway, but that's kind of a PITA. I would very much like the "kernel as a service" approach, where our devs can connect to Enterprise Gateway via a public internet-accessible URL.

@esevan
Copy link
Contributor

esevan commented Apr 22, 2019

@kevin-bates I'm using hub - kubespawner combination and hub spawns notebooks in hub namespace. In hub configuration, we can easily specify env for notebook pods, so I add KG_URL to all notebook instances as you commented.

Notebook is exposed via hub ingress or loadBalancer with sub-path; i.e. {{ hub url }}/user/{{ username }}. It works like L7 proxy when corresponding user notebook is up, so no need to expose notebook, EG, or kernel port to outside from cluster.

Anyhow, if hub scenario is truly compelling, we can set clusterIP as default because it works fine.

@esevan
Copy link
Contributor

esevan commented Apr 22, 2019

@witten Interesting! I didn't reach that scenario because I started from Hub. It sounds great to put notebook instance in local. User might use server resource and local resource at the same time with that use case.

Now I fully understand EG has lots of deployment views, thanks!

@kevin-bates
Copy link
Member

I've found myself flip/flopping around whether ClusterIP or NodePort should be the default, but believe ClusterIP is probably best for the following reasons. I'd love to hear what others think...

  1. Using NodePort OOTB could be considered security fragile in that it opens a port on the EG node - wherever that is.
  2. Using NodePort isn't very useful unless you're on a single-node cluster.
  3. ClusterIP is the Kubernetes default.
  4. We want to recommend using an ingress or LoadBalancer anyway - neither of which depend on NodePort.

Regardless of either value, internal access can still be obtained using <service-name>.<namespace-name>:<service-port>, so this doesn't affect things like kernel-image-puller or hub/notebook configs.

Regarding the ingress entries that @lresende posted above... Would anyone want to create an ingress.yaml template that includes each of these (with the idea that others could be added later)? We'd want to parameterize their general enablement, ingress-type, namespace, port and path prefix (e.g., just the /gateway portion). (enablement and ingress-type could be combined such that no ingress-type indicates non-use.) Note that #641 parameterizes the EG port already. Do others feel we should have KIP and can merge #641?

@witten
Copy link
Contributor Author

witten commented Apr 23, 2019

I agree that ClusterIP probably makes sense over NodePort!

@kevin-bates
Copy link
Member

@akchinSTC - isn't this closed by #651?

@akchinSTC
Copy link
Collaborator

Yes it should be resolved by #651, agreed that EG should be ClusterIP/internal service DNS and be exposted via Ingress (ingress controller)

@xxlest
Copy link

xxlest commented Aug 9, 2019

sorry i can't get all of your , i want set ingress path to access different notebook instance but failed, but only use ingress path "/" access one notebook is ok, actually am developing PaaS which allow different user to create different notebook, but i dnt want use NodePort that's not enough. how can use EG to solve it ?

@esevan
Copy link
Contributor

esevan commented Aug 9, 2019

@xxlest For notebook instances, I recommend to use Jupyter Hub project. It will automate authentication, and to proxy user to their own notebook instance respectively.

I'm not sure what enterprise scheduler you're using, but looking at this blog is surely very helpful for you to work with EG.
https://blog.jupyter.org/on-demand-notebooks-with-jupyterhub-jupyter-enterprise-gateway-and-kubernetes-e8e423695cbf

@xxlest
Copy link

xxlest commented Aug 9, 2019

@esevan Thanks very much, i will try the material u showed, am using k8s as scheduler, and notebook images which are tensorflow, mxnet, pytorch, caffe… user can login our PaaS and select quota of resource(CPU/GPU/MEM/Storage) for training model and serving model, we’ve also integrate istio as servicemesh for monitoring, so many user's notebook and we need provide web link connect jupyter notebook, then we need proxy behind ingress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants