Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Pod to connect Ray client to remote a Ray cluster with TLS enabled #994

Merged

Conversation

tedhtchang
Copy link
Contributor

@tedhtchang tedhtchang commented Mar 28, 2023

Why are these changes needed?

Provide an example of connect Ray client to a remote Ray cluster with TLS enabled on the ray client port.

Related issue number

Closes #992

Checks

Setup the environment

# Create k8s cluster
kind create cluster
kubectl cluster-info --context kind-kind

# Deploy kuberay
export KUBERAY_VERSION=v0.5.0
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=${KUBERAY_VERSION}&timeout=90s"

# Get this example
git clone https://github.com/tedhtchang/kuberay
cd kuberay
git checkout connect-remote-raycluster-tls

# Start a cluster with TLS enabled
kubectl apply -f ray-operator/config/samples/ray-cluster.tls.yaml

# Create client pod and connect to cluster
kubectl apply -f ray-operator/config/samples/ray-pod.tls.yaml
kubectl logs ray-client-tls

Verify the log contains IP of the head node.

Defaulted container "client" out of: client, gen-cert (init)
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
{'node:10.244.0.26': 1.0, 'object_store_memory': 521180774.0, 'CPU': 1.0, 'memory': 2000000000.0}

image

@tedhtchang
Copy link
Contributor Author

@kevin85421 Please review if this example what we want.

@kevin85421 kevin85421 self-requested a review March 28, 2023 20:00
@kevin85421
Copy link
Member

Thank you for this contribution! I will review this PR after I finish the release process.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution! Could you also add an optional step in tls.md? In addition, maybe we can rename the file to ray-pod.tls.yaml? RayCluster is a CRD in KubeRay, but this YAML file is not a RayCluster.

In addition, it will be great to add detailed reproduction instructions and screenshots (optional) to the PR description. Because we do not add tests for this document in CI, I need to verify this PR manually by myself. Thanks!

valueFrom:
fieldRef:
fieldPath: status.podIP
- name: FQ_RAY_IP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can either:

(1) Update ray.init()

# ray.init(\"ray://raycluster-tls-head-svc.default.svc.cluster.local:10001\")
ray.init(\"ray://$FQ_RAY_IP:10001\")

(2) Remove FQ_RAY_IP environment variable.

initContainers:
- name: gen-cert
image: rayproject/ray:2.3.0
args: ["/bin/sh", "-c", "cp -R /etc/ca/tls /etc/ray && /etc/gen/tls/gencert_head.sh"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use gencert_worker.sh instead.

@tedhtchang
Copy link
Contributor Author

Thanks for the detailed feedback. I will add a step to the md. It will be something like:
Step 5: Connect the cluster with Ray client using TLS for interactive development...

@tedhtchang
Copy link
Contributor Author

@kevin85421 Added a step to the doc. Please take a look.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only comment is to remove the Jupyter section. Others look good to me. Thank you for your contribution!

{'CPU': 2.0, 'node:10.254.20.20': 1.0, 'object_store_memory': 771128524.0, 'memory': 3000000000.0, 'node:10.254.16.25': 1.0}
```

For instruction on connecting the Ray cluster from a Jupyter Notebook from your local machine, follow the instruction below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind removing the following section? There are some limitations in Ray client. By the way, the Ray community wants to de-emphasize Ray client and promote Ray job submission instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@tedhtchang tedhtchang force-pushed the connect-remote-raycluster-tls branch from 3a55e68 to 534503a Compare April 24, 2023 21:12
@tedhtchang
Copy link
Contributor Author

@kevin85421 Could you provide some context about the Ray client limitation and reasons for demote/promote Ray client/Ray job submission?

@kevin85421
Copy link
Member

@kevin85421 Could you provide some context about the Ray client limitation and reasons for demote/promote Ray client/Ray job submission?

  1. The Ray client is very sensitive to both the Ray version and the Python version being used in your laptop and the RayCluster.

  2. If you use the Ray client on your laptop to connect to a RayCluster, you will need to install all related dependencies on your laptop, such as TensorFlow and PyTorch, even if the libraries only run on the RayCluster.

  3. The network connection between your laptop and the RayCluster must be very stable.

On the other hand, ray job submission will copy your Python script to the head Pod and execute the script in the head Pod, so (1) (2) (3) are solved. However, ray job submission may not fulfill the requirement of "interactive development".

@tedhtchang
Copy link
Contributor Author

@kevin85421 Thanks. I will share this with my team.

@kevin85421 kevin85421 merged commit e93ebcc into ray-project:master Apr 24, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…abled (ray-project#994)

Example Pod to connect Ray client to remote a Ray cluster with TLS enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Instructions to connect to the RayCluster using TLS authentication from outside the Ray cluster
2 participants