-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug?][Documentation] nginx Ingress for client times out #729
Comments
Thank @tbukic for pointing this out! |
One pattern we've seen from users is
Related: |
Actually, the experience of setting up networking to expose Ray Client running in a K8s cluster was hell for the Ray maintainers at Anyscale. |
@ckw017 are there any obvious fixes for the ingress config that you can see after a quick glance? |
Hmm, nothing obvious stands out. The main gotcha we saw before was this, where reconfiguring ingresses would cause them to close active connections incorrectly. One thing to check here is if the service that the ingress is pointing to has the ray client port exposed in its configuration, and to try connecting ray client through that service from another node inside that cluster (using the service's DNS). This would rule out problems with the service configuration. |
Thanks to everybody! @ckw017 I've checked, it works:
Using ingress host still doesn't work. If it means anything, I get multiple entries like this in ingress logs:
This is described current version of my ingress:
@DmitriGekhtman , thank you for your reply! We're already thinking about Jupyter as a longerm solution. After seeing Kubeflow mentioned in #725 , I'm interested in exploring how it works with Kuberay as well. Both are bit longer goals, atm it'd be ideal if we'd have client and just Temporary hack is exposed LoadBalancer to outside of k8s + hardcoded DNS name, but ingress is the next goal. I understand it may not be possible, or be very hard. I hope not having to care about authenticating users will make life easier for DS team scope. |
Hm, shot in the dark but can you try |
Good idea, but still times me out. :/ |
Hmm, can you try with these annotations:
I think if there's a way to sanity check if gRPC works in general (as opposed to Ray Client specifically) that would help narrow this down |
The issue is that ingress-nginx only supports secure GRPC (kubernetes/ingress-nginx#4095 (comment)). I was able to connect as a client using apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
# some of these may not be strictly necessary, haven't tried
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/server-snippet: |
underscores_in_headers on;
ignore_invalid_headers on;
name: ray-cluster
spec:
ingressClassName: nginx
rules:
- host: ray.example.com
http:
paths:
- backend:
service:
name: ray-head-svc
port:
number: 10001
path: /
pathType: ImplementationSpecific
tls:
- hosts:
- ray.example.com
secretName: ray-cluster-cert Once k get secrets/ray-cluster-cert -o json | jq -r '.data."tls.crt"' | base64 -d > /tmp/ray.crt
k get secrets/ray-cluster-cert -o json | jq -r '.data."tls.key"' | base64 -d > /tmp/ray.key Then you can make a secure connection in client mode: import ray
import grpc
with open('/tmp/ray.key', 'rb') as key, open('/tmp/ray.crt', 'rb') as crt:
credentials = grpc.ssl_channel_credentials(private_key=key.read(), certificate_chain=crt.read())
print('Connecting to Ray...')
ray.init(address='ray://ray.example.com:443', _credentials=credentials)
print('Connected.')
print(ray.available_resources()) |
Thanks @jacobdanovitch! This is very helpful. I will add it into a document #955. |
@tedhtchang will take this issue. [Note]: |
@kevin85421 I can provide example and steps to setup Nginx Ingress controller on local Kind cluster for using with the ray.init(). |
Search before asking
KubeRay Component
Others
What happened + What you expected to happen
Wrapping up this discussion:
I'm still not sure if this is a bug or just me misunderstanding documentation, but when I try to access ray Client via Ingress
client.[hostname]
; command:ray.init("ray://client.[hostname]")
I get timeout errors.Reproduction script
For the comparison, this configuration renders dashboard just fine:
Configurations that fails are
and
(and without tls as well :) ). My efforts up to now were inspired by this example - which is basically this paragraph from the documentation - and by @kevin85421 's comments on Slack.
Anything else
I'd prefer to access client in the form of
client.[host]
, thus I don't follow existing example for exposing dashboard. Also, my dashboard works fine; as well as client when it's port-forwarded from either pod or service and accessed vialocalhost:[port]
. I want maximally simplify access to client for DS colleagues who don't use k8s, so I'm trying to skip kubectl and port-forwarding.Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: