ray-project · kevin85421 · Jun 23, 2023 · Jun 21, 2023 · Jun 21, 2023 · Jun 22, 2023
diff --git a/docs/guidance/aws-eks-gpu-cluster.md b/docs/guidance/aws-eks-gpu-cluster.md
@@ -0,0 +1,74 @@
+# Start Amazon EKS Cluster with GPUs for KubeRay
+
+## Step 1: Create a Kubernetes cluster on Amazon EKS
+
+Follow the first two steps in [this AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#)
+to: (1) create your Amazon EKS cluster and (2) configure your computer to communicate with your cluster.
+
+## Step 2: Create node groups for the Amazon EKS cluster
+
+Follow "Step 3: Create nodes" in [this AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#) to create node groups. The following section provides more detailed information.
+
+### Create a CPU node group
+
+Typically, avoid running GPU workloads on the Ray head. Create a CPU node group for all Pods except Ray GPU 
+workers, such as the KubeRay operator, Ray head, and CoreDNS Pods.
+
+Here's a common configuration that works for most KubeRay examples in the docs:
+  * Instance type: [**m5.xlarge**](https://aws.amazon.com/ec2/instance-types/m5/) (4 vCPU; 16 GB RAM)
+  * Disk size: 256 GB
+  * Desired size: 1, Min size: 0, Max size: 1
+
+### Create a GPU node group
+
+Create a GPU node group for Ray GPU workers.
+
+1. Here's a common configuration that works for most KubeRay examples in the docs:
+   * AMI type: Bottlerocket NVIDIA (BOTTLEROCKET_x86_64_NVIDIA)
+   * Instance type: [**g5.xlarge**](https://aws.amazon.com/ec2/instance-types/g5/) (1 GPU; 24 GB GPU Memory; 4 vCPUs; 16 GB RAM)
+   * Disk size: 1024 GB
+   * Desired size: 1, Min size: 0, Max size: 1
+
+2. **Please follow Step 4 to install the NVIDIA device plugin.**
+   * If you use `AMI type: Bottlerocket NVIDIA`, there is no need to install NVIDIA device plugin.
+   * For other AMI types, you may need to install the NVIDIA device plugin DaemonSet in order to run GPU-enabled containers in your Amazon EKS cluster.
+   If the GPU nodes have taints, add `tolerations` to `nvidia-device-plugin.yml` to enable the DaemonSet to schedule Pods on the GPU nodes."
+
+3. Add a Kubernetes taint to prevent scheduling CPU Pods on this GPU node group. For KubeRay examples, add the following taint to the GPU nodes: `Key: ray.io/node-type, Value: worker, Effect: NoSchedule`, and include the corresponding `tolerations` for GPU Ray worker Pods.
+
+> Warning: GPU nodes are extremely expensive. Please remember to delete the cluster if you no longer need it.
+
+## Step 3: Verify the node groups
+
+> **Note:** If you encounter permission issues with `eksctl`, navigate to your AWS account's webpage and copy the
+credential environment variables, including `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN`,
+from the "Command line or programmatic access" page.
+
+```sh
+eksctl get nodegroup --cluster ${YOUR_EKS_NAME}
+
+# CLUSTER         NODEGROUP       STATUS  CREATED                 MIN SIZE        MAX SIZE        DESIRED CAPACITY        INSTANCE TYPE   IMAGE ID                        ASG NAME                           TYPE
+# ${YOUR_EKS_NAME}     cpu-node-group  ACTIVE  2023-06-05T21:31:49Z    0               1               1                       m5.xlarge       AL2_x86_64                      eks-cpu-node-group-...     managed
+# ${YOUR_EKS_NAME}     gpu-node-group  ACTIVE  2023-06-05T22:01:44Z    0               1               1                       g5.12xlarge     BOTTLEROCKET_x86_64_NVIDIA      eks-gpu-node-group-...     managed
+```
+
+## Step 4: Install the DaemonSet for NVIDIA device plugin
+
+> **Note:** If you encounter permission issues with `kubectl`, follow "Step 2: Configure your computer to communicate with your cluster"
+in the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#).
+
+Install the DaemonSet for NVIDIA device plugin to run GPU enabled containers in your Amazon EKS cluster. You can refer to the [Amazon EKS optimized accelerated Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami)
+or [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository for more details.
+
+```sh
+# Install the DaemonSet
+kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml
+
+# Verify that your nodes have allocatable GPUs 
+kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
+
+# Example output:
+# NAME                                GPU
+# ip-....us-west-2.compute.internal   4
+# ip-....us-west-2.compute.internal   <none>
+```
diff --git a/docs/guidance/mobilenet-rayservice.md b/docs/guidance/mobilenet-rayservice.md
@@ -1,4 +1,4 @@
-# RayService: MobileNet example
+# Serve a MobileNet image classifier using RayService
 
 > **Note:** The Python files for the Ray Serve application and its client are in the repository [ray-project/serve_config_examples](https://github.com/ray-project/serve_config_examples).
 
@@ -10,7 +10,8 @@ kind create cluster --image=kindest/node:v1.23.0
 
 ## Step 2: Install KubeRay operator
 
-Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator via Helm repository.
+Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the nightly KubeRay operator via 
+Helm. Note that the YAML file in Step 3 uses `serveConfigV2`, which is first supported by KubeRay v0.6.0.
 
 ## Step 3: Install a RayService
 

diff --git a/docs/guidance/stable-diffusion-rayservice.md b/docs/guidance/stable-diffusion-rayservice.md
@@ -0,0 +1,55 @@
+# Serve a StableDiffusion text-to-image model using RayService
+
+> **Note:** The Python files for the Ray Serve application and its client are in the [ray-project/serve_config_examples](https://github.com/ray-project/serve_config_examples) repo 
+and [the Ray documentation](https://docs.ray.io/en/latest/serve/tutorials/stable-diffusion.html).
+
+## Step 1: Create a Kubernetes cluster with GPUs
+
+Follow [aws-eks-gpu-cluster.md](./aws-eks-gpu-cluster.md) to create an AWS EKS cluster with 1
+CPU (`m5.xlarge`) node and 1 GPU (`g5.xlarge`) node.
+
+## Step 2: Install KubeRay operator
+
+Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the nightly KubeRay operator via 
+Helm. Note that the YAML file in Step 3 uses `serveConfigV2`, which is first supported by KubeRay v0.6.0.
+
+## Step 3: Install a RayService
+
+```sh
+# path: ray-operator/config/samples/
+kubectl apply -f ray-service.stable-diffusion.yaml
+```
+
+* The `tolerations` for workers must match the taints on the GPU node group. Without the tolerations, worker Pods won't be scheduled on GPU nodes.
+    ```yaml
+    # Please add the following taints to the GPU node.
+    tolerations:
+        - key: "ray.io/node-type"
+        operator: "Equal"
+        value: "worker"
+        effect: "NoSchedule"
+    ```
+* Install `diffusers` in `runtime_env` as it is not included by default in the `ray-ml` image.
+
+## Step 4: Forward the port of Serve
+
+```sh
+kubectl port-forward svc/stable-diffusion-serve-svc 8000
+```
+
+Note that the RayService's Kubernetes service will be created after the Serve applications are ready and running. This process may take approximately 1 minute after all Pods in the RayCluster are running.
+
+## Step 5: Send a request to the text-to-image model
+
+```sh
+# Step 5.1: Download `stable_diffusion_req.py` 
+curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/stable_diffusion/stable_diffusion_req.py
+
+# Step 5.2: Update `prompt` in `stable_diffusion_req.py`.
+
+# Step 5.3: Send a request to the Stable Diffusion model.
+python stable_diffusion_req.py
+# Check output.png
+```
+
+![image](../images/stable_diffusion_example.png)
diff --git a/docs/images/stable_diffusion_example.png b/docs/images/stable_diffusion_example.png
diff --git a/ray-operator/config/samples/ray-service.mobilenet.yaml b/ray-operator/config/samples/ray-service.mobilenet.yaml
@@ -5,11 +5,13 @@ metadata:
 spec:
   serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
   deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
-  serveConfig:
-    importPath: mobilenet.mobilenet:app
-    runtimeEnv: |
-      working_dir: "https://github.com/ray-project/serve_config_examples/archive/b393e77bbd6aba0881e3d94c05f968f05a387b96.zip"
-      pip: ["python-multipart==0.0.6"]
+  serveConfigV2: |
+    applications:
+      - name: mobilenet
+        import_path: mobilenet.mobilenet:app
+        runtime_env:
+          working_dir: "https://github.com/ray-project/serve_config_examples/archive/b393e77bbd6aba0881e3d94c05f968f05a387b96.zip"
+          pip: ["python-multipart==0.0.6"]
   rayClusterConfig:
     rayVersion: '2.5.0' # should match the Ray version in the image of the containers
     ######################headGroupSpecs#################################
@@ -28,11 +30,11 @@ spec:
               image: rayproject/ray-ml:2.5.0
               resources:
                 limits:
-                  cpu: 2
-                  memory: 8Gi
+                  cpu: 1
+                  memory: 4Gi
                 requests:
-                  cpu: 2
-                  memory: 8Gi
+                  cpu: 1
+                  memory: 4Gi
               ports:
                 - containerPort: 6379
                   name: gcs-server
@@ -65,8 +67,8 @@ spec:
                       command: ["/bin/sh","-c","ray stop"]
                 resources:
                   limits:
-                    cpu: "2"
-                    memory: "8Gi"
+                    cpu: 1
+                    memory: 4Gi
                   requests:
-                    cpu: "2"
-                    memory: "8Gi"
+                    cpu: 1
+                    memory: 4Gi
diff --git a/ray-operator/config/samples/ray-service.stable-diffusion.yaml b/ray-operator/config/samples/ray-service.stable-diffusion.yaml
@@ -0,0 +1,80 @@
+apiVersion: ray.io/v1alpha1
+kind: RayService
+metadata:
+  name: stable-diffusion
+spec:
+  serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
+  deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
+  serveConfigV2: |
+    applications:
+      - name: stable_diffusion
+        import_path: stable_diffusion.stable_diffusion:entrypoint
+        runtime_env:
+          working_dir: "https://github.com/ray-project/serve_config_examples/archive/d6acf9b99ef076a1848f506670e1290a11654ec2.zip"
+          pip: ["diffusers==0.12.1"]
+  rayClusterConfig:
+    rayVersion: '2.5.0' # Should match the Ray version in the image of the containers
+    ######################headGroupSpecs#################################
+    # Ray head pod template.
+    headGroupSpec:
+      # The `rayStartParams` are used to configure the `ray start` command.
+      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
+      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
+      rayStartParams:
+        dashboard-host: '0.0.0.0'
+      # Pod template
+      template:
+        spec:
+          containers:
+          - name: ray-head
+            image: rayproject/ray-ml:2.5.0
+            ports:
+            - containerPort: 6379
+              name: gcs
+            - containerPort: 8265
+              name: dashboard
+            - containerPort: 10001
+              name: client
+            - containerPort: 8000
+              name: serve
+            volumeMounts:
+              - mountPath: /tmp/ray
+                name: ray-logs
+            resources:
+              limits:
+                cpu: "2"
+                memory: "8G"
+              requests:
+                cpu: "2"
+                memory: "8G"
+          volumes:
+            - name: ray-logs
+              emptyDir: {}
+    workerGroupSpecs:
+    # The pod replicas in this group typed worker
+    - replicas: 1
+      minReplicas: 1
+      maxReplicas: 10
+      groupName: gpu-group
+      rayStartParams: {}
+      # Pod template
+      template:
+        spec:
+          containers:
+          - name: ray-worker
+            image: rayproject/ray-ml:2.5.0
+            resources:
+              limits:
+                cpu: 4
+                memory: "16G"
+                nvidia.com/gpu: 1
+              requests:
+                cpu: 3
+                memory: "12G"
+                nvidia.com/gpu: 1
+          # Please add the following taints to the GPU node.
+          tolerations:
+            - key: "ray.io/node-type"
+              operator: "Equal"
+              value: "worker"
+              effect: "NoSchedule"