diff --git a/docs/guidance/tls.md b/docs/guidance/tls.md index 82e6d798f7..7cb3cef844 100644 --- a/docs/guidance/tls.md +++ b/docs/guidance/tls.md @@ -1,15 +1,15 @@ # TLS Authentication -Ray can be configured to use TLS on its gRPC channels. This means that -connecting to the Ray head will require an appropriate -set of credentials and also that data exchanged between various processes +Ray can be configured to use TLS on its gRPC channels. This means that +connecting to the Ray head will require an appropriate +set of credentials and also that data exchanged between various processes (client, head, workers) will be encrypted ([Ray's document](https://docs.ray.io/en/latest/ray-core/configure.html?highlight=tls#tls-authentication)). This document provides detailed instructions for generating a public-private key pair and CA certificate for configuring KubeRay. > Warning: Enabling TLS will cause a performance hit due to the extra -overhead of mutual authentication and encryption. Testing has shown that +overhead of mutual authentication and encryption. Testing has shown that this overhead is large for small workloads and becomes relatively smaller for large workloads. The exact overhead will depend on the nature of your workload. @@ -43,12 +43,12 @@ kubectl apply -f ray-operator/config/samples/ray-cluster.tls.yaml `ray-cluster.tls.yaml` will create: * A Kubernetes Secret containing the CA's private key (`ca.key`) and self-signed certificate (`ca.crt`) (**Step 1**) -* A Kubernetes ConfigMap containing the scripts `gencert_head.sh` and `gencert_worker.sh`, which allow Ray Pods to generate private keys +* A Kubernetes ConfigMap containing the scripts `gencert_head.sh` and `gencert_worker.sh`, which allow Ray Pods to generate private keys (`tls.key`) and self-signed certificates (`tls.crt`) (**Step 2**) * A RayCluster with proper TLS environment variables configurations (**Step 3**) The certificate (`tls.crt`) for a Ray Pod is encrypted using the CA's private key (`ca.key`). Additionally, all Ray Pods have the CA's public key included in `ca.crt`, which allows them to decrypt certificates from other Ray Pods. - + # Step 1: Generate a private key and self-signed certificate for CA In this document, a self-signed certificate is used, but users also have the @@ -76,7 +76,7 @@ openssl x509 -in ca.crt -noout -text kubectl create secret generic ca-tls --from-file=ca.key --from-file=ca.crt ``` * `ca.key`: CA's private key -* `ca.crt`: CA's self-signed certificate +* `ca.crt`: CA's self-signed certificate This step is optional because the `ca.key` and `ca.crt` files have already been included in the Kubernetes Secret specified in [ray-cluster.tls.yaml](../../ray-operator/config/samples/ray-cluster.tls.yaml). @@ -85,12 +85,12 @@ already been included in the Kubernetes Secret specified in [ray-cluster.tls.yam In [ray-cluster.tls.yaml](../../ray-operator/config/samples/ray-cluster.tls.yaml), each Ray Pod (both head and workers) generates its own private key file (`tls.key`) and self-signed -certificate file (`tls.crt`) in its init container. We generate separate files for each Pod -because worker Pods do not have deterministic DNS names, and we cannot use the same +certificate file (`tls.crt`) in its init container. We generate separate files for each Pod +because worker Pods do not have deterministic DNS names, and we cannot use the same certificate across different Pods. -In the YAML file, you'll find a ConfigMap named `tls` that contains two shell scripts: -`gencert_head.sh` and `gencert_worker.sh`. These scripts are used to generate the private key +In the YAML file, you'll find a ConfigMap named `tls` that contains two shell scripts: +`gencert_head.sh` and `gencert_worker.sh`. These scripts are used to generate the private key and self-signed certificate files (`tls.key` and `tls.crt`) for the Ray head and worker Pods. An alternative approach for users is to prebake the shell scripts directly into the docker image that's utilized by the init containers, rather than relying on a ConfigMap. @@ -99,13 +99,13 @@ Please find below a brief explanation of what happens in each of these scripts: 1. A 2048-bit RSA private key is generated and saved as `/etc/ray/tls/tls.key`. 2. A Certificate Signing Request (CSR) is generated using the private key file (`tls.key`) and the `csr.conf` configuration file. -3. A self-signed certificate (`tls.crt`) is generated using the private key of the +3. A self-signed certificate (`tls.crt`) is generated using the private key of the Certificate Authority (`ca.key`) and the previously generated CSR. The only difference between `gencert_head.sh` and `gencert_worker.sh` is the `[ alt_names ]` -section in `csr.conf` and `cert.conf`. The worker Pods use the fully qualified domain name -(FQDN) of the head Kubernetes Service to establish a connection with the head Pod. -Therefore, the `[alt_names]` section for the head Pod needs to include the FQDN of the head +section in `csr.conf` and `cert.conf`. The worker Pods use the fully qualified domain name +(FQDN) of the head Kubernetes Service to establish a connection with the head Pod. +Therefore, the `[alt_names]` section for the head Pod needs to include the FQDN of the head Kubernetes Service. By the way, the head Pod uses `$POD_IP` to communicate with worker Pods. ```sh @@ -131,7 +131,7 @@ To enable TLS authentication in your Ray cluster, set the following environment - `RAY_USE_TLS`: Either 1 or 0 to use/not-use TLS. If this is set to 1 then all of the environment variables below must be set. Default: 0. - `RAY_TLS_SERVER_CERT`: Location of a certificate file which is presented to other endpoints so as to achieve mutual authentication (i.e. `tls.crt`). -- `RAY_TLS_SERVER_KEY`: Location of a private key file which is the cryptographic means to prove to other endpoints that you are the authorized user of a given certificate (i.e. `tls.key`). +- `RAY_TLS_SERVER_KEY`: Location of a private key file which is the cryptographic means to prove to other endpoints that you are the authorized user of a given certificate (i.e. `tls.key`). - `RAY_TLS_CA_CERT`: Location of a CA certificate file which allows TLS to decide whether an endpoint’s certificate has been signed by the correct authority (i.e. `ca.crt`). For more information on how to configure Ray with TLS authentication, please refer to [Ray's document](https://docs.ray.io/en/latest/ray-core/configure.html#tls-authentication). @@ -142,8 +142,8 @@ For more information on how to configure Ray with TLS authentication, please ref # Log in to the worker Pod kubectl exec -it ${WORKER_POD} -- bash -# Since the head Pod has the certificate of $FQ_RAY_IP, the connection to the worker Pods -# will be established successfully, and the exit code of the ray health-check command +# Since the head Pod has the certificate of $FQ_RAY_IP, the connection to the worker Pods +# will be established successfully, and the exit code of the ray health-check command # should be 0. ray health-check --address $FQ_RAY_IP:6379 echo $? # 0 @@ -154,8 +154,22 @@ echo $? # 0 ray health-check --address $RAY_IP:6379 # If you add `DNS.3 = $RAY_IP` to the [alt_names] section in `gencert_head.sh`, -# the head Pod will generate the certificate of $RAY_IP. -# +# the head Pod will generate the certificate of $RAY_IP. +# # For KubeRay versions prior to 0.5.0, this step is necessary because Ray workers in earlier # versions use $RAY_IP to connect with Ray head. ``` + +# Step 5: Connect to the cluster with Ray client using TLS for interactive development +To learn more, please check [interactive development](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html#ray-client-interactive-development) and [TLS authentication](https://docs.ray.io/en/latest/ray-core/configure.html?highlight=tls#tls-authentication) for more detail. + +For instructions on connecting the Ray cluster from a Pod: +``` +# Create a client pod and connect to cluster +kubectl apply -f ray-operator/config/samples/ray-pod.tls.yaml +kubectl logs ray-client-tls +``` +Verify the output similar to: +``` +{'CPU': 2.0, 'node:10.254.20.20': 1.0, 'object_store_memory': 771128524.0, 'memory': 3000000000.0, 'node:10.254.16.25': 1.0} +``` diff --git a/ray-operator/config/samples/ray-pod.tls.yaml b/ray-operator/config/samples/ray-pod.tls.yaml new file mode 100644 index 0000000000..eadd50600a --- /dev/null +++ b/ray-operator/config/samples/ray-pod.tls.yaml @@ -0,0 +1,60 @@ +apiVersion: v1 +kind: Pod +metadata: + name: ray-client-tls + namespace: default +spec: + containers: + - name: client + image: rayproject/ray:2.3.0 + args: [/bin/sh, -c, 'python -c "import ray; ray.init(\"ray://$FQ_RAY_IP:10001\"); print(ray.cluster_resources())" && sleep infinity'] + env: + - name: RAY_USE_TLS + value: "1" + - name: RAY_TLS_SERVER_CERT + value: "/etc/ray/tls/tls.crt" + - name: RAY_TLS_SERVER_KEY + value: "/etc/ray/tls/tls.key" + - name: RAY_TLS_CA_CERT + value: "/etc/ca/tls/ca.crt" + # FQ_RAY_IP depends on ray-operator/config/samples/ray-cluster.tls.yaml + - name: FQ_RAY_IP + value: raycluster-tls-head-svc.default.svc.cluster.local + volumeMounts: + - mountPath: /etc/ca/tls + name: ca-tls + readOnly: true + - mountPath: /etc/ray/tls + name: ray-tls + - mountPath: /etc/gen/tls + name: gen-tls-script + initContainers: + - name: gen-cert + image: rayproject/ray:2.3.0 + args: ["/bin/sh", "-c", "cp -R /etc/ca/tls /etc/ray && /etc/gen/tls/gencert_worker.sh"] + volumeMounts: + - mountPath: /etc/ca/tls + name: ca-tls + readOnly: true + - mountPath: /etc/ray/tls + name: ray-tls + - mountPath: /etc/gen/tls + name: gen-tls-script + env: + - name: POD_IP + valueFrom: + fieldRef: + fieldPath: status.podIP + volumes: + - name: gen-tls-script + configMap: + name: tls + defaultMode: 0777 + items: + - key: gencert_worker.sh + path: gencert_worker.sh + - name: ca-tls + secret: + secretName: ca-tls + - name: ray-tls + emptyDir: {}