Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug?][Documentation] nginx Ingress for client times out #729

Closed
1 of 2 tasks
tbukic opened this issue Nov 15, 2022 · 13 comments · Fixed by #1051
Closed
1 of 2 tasks

[Bug?][Documentation] nginx Ingress for client times out #729

tbukic opened this issue Nov 15, 2022 · 13 comments · Fixed by #1051
Assignees
Labels
bug Something isn't working P3 Nice-to-have, low urgency, won't-do until priority is increased

Comments

@tbukic
Copy link

tbukic commented Nov 15, 2022

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

Others

What happened + What you expected to happen

Wrapping up this discussion:

I'm still not sure if this is a bug or just me misunderstanding documentation, but when I try to access ray Client via Ingress client.[hostname]; command: ray.init("ray://client.[hostname]") I get timeout errors.

Reproduction script

For the comparison, this configuration renders dashboard just fine:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ .Values.name }}-ingress-dashboard 
  namespace: {{ .Values.namespace }}
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: dashboard.{{ .Values.hostRoot }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ .Values.name }}-head-svc
                port:
                  name: dashboard 

Configurations that fails are

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ .Values.name }}-ingress-client
  namespace: {{ .Values.namespace }}
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    nginx.ingress.kubernetes.io/server-snippet: |
      underscores_in_headers on;
      ignore_invalid_headers on;
spec:
  rules:
    - host: client.{{ .Values.hostRoot }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ .Values.name }}-head-svc
                port:
                  name: client

and

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ .Values.name }}-ingress-client
  namespace: {{ .Values.namespace }}
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    nginx.ingress.kubernetes.io/server-snippet: |
      underscores_in_headers on;
      ignore_invalid_headers on;
spec:
  ingressClassName: nginx
  rules:
    - host: client.{{ .Values.hostRoot }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ .Values.name }}-head-svc
                port:
                  number: 10001
  tls:
  - hosts:
      - client.{{ .Values.hostRoot }}
    secretName: {{ .Values.httpsSecretName }}

(and without tls as well :) ). My efforts up to now were inspired by this example - which is basically this paragraph from the documentation - and by @kevin85421 's comments on Slack.

Anything else

I'd prefer to access client in the form of client.[host], thus I don't follow existing example for exposing dashboard. Also, my dashboard works fine; as well as client when it's port-forwarded from either pod or service and accessed via localhost:[port]. I want maximally simplify access to client for DS colleagues who don't use k8s, so I'm trying to skip kubectl and port-forwarding.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@tbukic tbukic added the bug Something isn't working label Nov 15, 2022
@kevin85421 kevin85421 self-assigned this Nov 16, 2022
@kevin85421
Copy link
Member

Thank @tbukic for pointing this out!

@DmitriGekhtman
Copy link
Collaborator

I want maximally simplify access to client for DS colleagues who don't use k8s, so I'm trying to skip kubectl and port-forwarding.

One pattern we've seen from users is

  • Setting up a Jupyter notebook environment in the K8s cluster
  • Connecting via Ray Client in the notebook
  • Exposing the Jupyter notebook to external traffic, which is a fairly common thing to do.

Related:
#725

@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Nov 16, 2022

Actually, the experience of setting up networking to expose Ray Client running in a K8s cluster was hell for the Ray maintainers at Anyscale.
For a data science platform, I wouldn't recommend going down the route of exposing Ray client directly.

@DmitriGekhtman
Copy link
Collaborator

@ckw017 are there any obvious fixes for the ingress config that you can see after a quick glance?

@ckw017
Copy link
Member

ckw017 commented Nov 16, 2022

Hmm, nothing obvious stands out. The main gotcha we saw before was this, where reconfiguring ingresses would cause them to close active connections incorrectly.

One thing to check here is if the service that the ingress is pointing to has the ray client port exposed in its configuration, and to try connecting ray client through that service from another node inside that cluster (using the service's DNS). This would rule out problems with the service configuration.

@DmitriGekhtman DmitriGekhtman added the P2 Important issue, but not time critical label Nov 16, 2022
@tbukic
Copy link
Author

tbukic commented Nov 17, 2022

Thanks to everybody!

@ckw017 I've checked, it works:

  • if I connect to it using ray://[service_name].[namespace]:10001 from another pod
  • if I connect to service ADDRESS and IP from another pod obtained from pod's env
  • if I do port forwarding head's port 10001 and connect to ray://localhost:forwarded_port

Using ingress host still doesn't work.

If it means anything, I get multiple entries like this in ingress logs:

I1117 18:55:47.583755       1 status.go:299] "updating Ingress status" namespace="[namespace]" ingress="[client-ingress-pod-name]" currentValue=[] newValue=[{IP:[ingress-addr-1] Hostname: Ports:[]} {IP:[ingress-addr-2] Hostname: Ports:[]}]
I1117 18:55:47.597219       1 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"[namespace]", Name:"[client-ingress-pod-name]", UID:"[UID]", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1096235631", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I1117 18:55:47.608717       1 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"[namespace]", Name:"[client-ingress-pod-name]", UID:"[UID]", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1096235636", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync

This is described current version of my ingress:

Name:             [release-name]-ingress-client
Labels:           app.kubernetes.io/managed-by=Helm
Namespace:        [namespace]
Address:          [ingress-addr-1],[ingress-addr-2]
Ingress Class:    nginx
Default backend:  <default>
TLS:
  [namespace]-tls terminates client.[host]
Rules:
  Host                                                      Path  Backends
  ----                                                      ----  --------
  client.[host]  
                                                            /   [service-name]:client ([correct-ip-of-head-node]:10001)
Annotations:                                                field.cattle.io/publicEndpoints:
                                                              [{"addresses":["[ingress-addr-1]","[ingress-addr-2]"],"port":443,"protocol":"HTTPS","serviceName":"[namespace]:[service-name]"...
                                                            ginx.ingress.kubernetes.io/rewrite-target: /
                                                            meta.helm.sh/release-name: [release-name]
                                                            meta.helm.sh/release-namespace: [namespace]
                                                            nginx.ingress.kubernetes.io/backend-protocol: GRPC
                                                            nginx.ingress.kubernetes.io/server-snippet:
                                                              underscores_in_headers on;
                                                              ignore_invalid_headers on;
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    36m (x3 over 36m)  nginx-ingress-controller  Scheduled for sync
  Normal  Sync    36m (x3 over 36m)  nginx-ingress-controller  Scheduled for sync

@DmitriGekhtman , thank you for your reply! We're already thinking about Jupyter as a longerm solution. After seeing Kubeflow mentioned in #725 , I'm interested in exploring how it works with Kuberay as well. Both are bit longer goals, atm it'd be ideal if we'd have client and just

Temporary hack is exposed LoadBalancer to outside of k8s + hardcoded DNS name, but ingress is the next goal. I understand it may not be possible, or be very hard. I hope not having to care about authenticating users will make life easier for DS team scope.

@ckw017
Copy link
Member

ckw017 commented Nov 17, 2022

Hm, shot in the dark but can you try pathType: ImplementationSpecific instead of pathType: Prefix

@tbukic
Copy link
Author

tbukic commented Nov 17, 2022

Good idea, but still times me out. :/

@ckw017
Copy link
Member

ckw017 commented Nov 17, 2022

Hmm, can you try with these annotations:

    # note: also try dropping rewrite-target
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/server-snippet: underscores_in_headers on; ignore_invalid_headers
      on;

I think if there's a way to sanity check if gRPC works in general (as opposed to Ray Client specifically) that would help narrow this down

@DmitriGekhtman DmitriGekhtman added P3 Nice-to-have, low urgency, won't-do until priority is increased and removed P2 Important issue, but not time critical labels Dec 9, 2022
@jacobdanovitch
Copy link

The issue is that ingress-nginx only supports secure GRPC (kubernetes/ingress-nginx#4095 (comment)). I was able to connect as a client using cert-manager with the following ingress definition:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # some of these may not be strictly necessary, haven't tried
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/server-snippet: |
      underscores_in_headers on;
      ignore_invalid_headers on;
  name: ray-cluster
spec:
  ingressClassName: nginx 
  rules:
    - host: ray.example.com
      http:
        paths:
        - backend:
            service: 
              name: ray-head-svc
              port: 
                number: 10001
          path: /
          pathType: ImplementationSpecific
  tls:
    - hosts:
      - ray.example.com
      secretName: ray-cluster-cert

Once cert-manager provisions your certificate, get the cert and key:

k get secrets/ray-cluster-cert -o json | jq -r '.data."tls.crt"' | base64 -d > /tmp/ray.crt
k get secrets/ray-cluster-cert -o json | jq -r '.data."tls.key"' | base64 -d > /tmp/ray.key

Then you can make a secure connection in client mode:

import ray
import grpc

with open('/tmp/ray.key', 'rb') as key, open('/tmp/ray.crt', 'rb') as crt:
    credentials = grpc.ssl_channel_credentials(private_key=key.read(), certificate_chain=crt.read())

print('Connecting to Ray...')
ray.init(address='ray://ray.example.com:443', _credentials=credentials)
print('Connected.')
print(ray.available_resources())

@kevin85421
Copy link
Member

Thanks @jacobdanovitch! This is very helpful. I will add it into a document #955.

@kevin85421
Copy link
Member

@tedhtchang will take this issue.

[Note]:
As @DmitriGekhtman said (#729 (comment), #729 (comment)), KubeRay / Ray communities do not encourage users to expose Ray client directly. However, there are still some use cases where users need to expose the Ray client directly. In such cases, it would be beneficial to have documentation for the ingress setup and outlines the related pros and cons.

@tedhtchang
Copy link
Contributor

@kevin85421 I can provide example and steps to setup Nginx Ingress controller on local Kind cluster for using with the ray.init().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P3 Nice-to-have, low urgency, won't-do until priority is increased
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants