Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not update pod labels if they haven't changed #1304

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

JoshKarpel
Copy link
Contributor

Why are these changes needed?

See #1303 for more context.

I decided on some light future-proofing by checking if any labels have changed before updating the pod (though there is only one label being changed right now).

Related issue number

Closes #1303

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

I followed DEVELOPMENT.md and was able to stand up a RayService in a local kind cluster using an operator image built from this PR:

❯ k get po -n default
NAME                                                      READY   STATUS    RESTARTS   AGE
ervice-sample-raycluster-pn4cd-worker-small-group-qfntp   1/1     Running   0          6m46s
kuberay-operator-5987588ffc-dg2mn                         1/1     Running   0          23m
rayservice-sample-raycluster-pn4cd-head-fnsl6             1/1     Running   0          6m46s
❯ k get svc -n default
NAME                                          TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                   AGE
custom-ray-serve-service-name                 LoadBalancer   10.96.163.198   <pending>     8000:30266/TCP                                            5m54s
kuberay-operator                              ClusterIP      10.96.30.152    <none>        8080/TCP                                                  23m
kubernetes                                    ClusterIP      10.96.0.1       <none>        443/TCP                                                   25m
rayservice-sample-head-svc                    ClusterIP      10.96.40.11     <none>        10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP   5m54s
rayservice-sample-raycluster-pn4cd-head-svc   ClusterIP      10.96.26.140    <none>        10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP   6m29s
❯ k describe pod/ervice-sample-raycluster-pn4cd-worker-small-group-qfntp
Name:         ervice-sample-raycluster-pn4cd-worker-small-group-qfntp
Namespace:    default
Priority:     0
Node:         kind-control-plane/172.19.0.2
Start Time:   Tue, 08 Aug 2023 15:06:05 -0500
Labels:       app.kubernetes.io/created-by=kuberay-operator
              app.kubernetes.io/name=kuberay
              ray.io/cluster=rayservice-sample-raycluster-pn4cd
              ray.io/group=small-group
              ray.io/identifier=rayservice-sample-raycluster-pn4cd-worker
              ray.io/is-ray-node=yes
              ray.io/node-type=worker
              ray.io/serve=true
Annotations:  ray.io/ft-enabled: false
              ray.io/health-state:
Status:       Running
IP:           10.244.0.12
IPs:
  IP:           10.244.0.12
Controlled By:  RayCluster/rayservice-sample-raycluster-pn4cd
Init Containers:
  wait-gcs-ready:
    Container ID:  containerd://ee18aa5754c97cf592f10a24fc04b712354b6577a0a778de47ec9c5aa04e74a9
    Image:         rayproject/ray:2.5.0
    Image ID:      docker.io/rayproject/ray@sha256:cb53dcc21af8f913978fd2a3fc57c812f87d99e0b40db6a42ccd6f43eca11281
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -lc
      --
    Args:

                            SECONDS=0
                            while true; do
                              if (( SECONDS <= 120 )); then
                                if ray health-check --address rayservice-sample-raycluster-pn4cd-head-svc.default.svc.cluster.local:6379 > /dev/null 2>&1; then
                                  echo "GCS is ready."
                                  break
                                fi
                                echo "$SECONDS seconds elapsed: Waiting for GCS to be ready."
                              else
                                if ray health-check --address rayservice-sample-raycluster-pn4cd-head-svc.default.svc.cluster.local:6379; then
                                  echo "GCS is ready. Any error messages above can be safely ignored."
                                  break
                                fi
                                echo "$SECONDS seconds elapsed: Still waiting for GCS to be ready. For troubleshooting, refer to the FAQ at https://github.com/ray-project/kuberay/blob/master/docs/guidance/FAQ.md."
                              fi
                              sleep 5
                            done

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 08 Aug 2023 15:06:06 -0500
      Finished:     Tue, 08 Aug 2023 15:06:17 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     500m
      memory:  2Gi
    Environment:
      FQ_RAY_IP:  rayservice-sample-raycluster-pn4cd-head-svc.default.svc.cluster.local
      RAY_IP:     rayservice-sample-raycluster-pn4cd-head-svc
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fftq7 (ro)
Containers:
  ray-worker:
    Container ID:  containerd://ed185f50ffaf8a90922efa821d143969eec271740f4780877890f498596ca56d
    Image:         rayproject/ray:2.5.0
    Image ID:      docker.io/rayproject/ray@sha256:cb53dcc21af8f913978fd2a3fc57c812f87d99e0b40db6a42ccd6f43eca11281
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      -lc
      --
    Args:
      ulimit -n 65536; ray start  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1  --memory=2147483648  --address=rayservice-sample-raycluster-pn4cd-head-svc.default.svc.cluster.local:6379
    State:          Running
      Started:      Tue, 08 Aug 2023 15:06:18 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     500m
      memory:  2Gi
    Environment:
      FQ_RAY_IP:                                  rayservice-sample-raycluster-pn4cd-head-svc.default.svc.cluster.local
      RAY_IP:                                     rayservice-sample-raycluster-pn4cd-head-svc
      RAY_CLUSTER_NAME:                            (v1:metadata.labels['ray.io/cluster'])
      RAY_PORT:                                   6379
      RAY_timeout_ms_task_wait_for_death_info:    0
      RAY_gcs_server_request_timeout_seconds:     5
      RAY_SERVE_KV_TIMEOUT_S:                     5
      RAY_INTERNAL_SERVE_CONTROLLER_PIN_ON_NODE:  0
      RAY_ADDRESS:                                rayservice-sample-raycluster-pn4cd-head-svc.default.svc.cluster.local:6379
      RAY_USAGE_STATS_KUBERAY_IN_USE:             1
      REDIS_PASSWORD:
      RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE:        1
    Mounts:
      /dev/shm from shared-mem (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fftq7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  shared-mem:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  2Gi
  kube-api-access-fftq7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  7m56s  default-scheduler  Successfully assigned default/ervice-sample-raycluster-pn4cd-worker-small-group-qfntp to kind-control-plane
  Normal  Pulled     7m55s  kubelet            Container image "rayproject/ray:2.5.0" already present on machine
  Normal  Created    7m55s  kubelet            Created container wait-gcs-ready
  Normal  Started    7m55s  kubelet            Started container wait-gcs-ready
  Normal  Pulled     7m44s  kubelet            Container image "rayproject/ray:2.5.0" already present on machine
  Normal  Created    7m43s  kubelet            Created container ray-worker
  Normal  Started    7m43s  kubelet            Started container ray-worker
❯ k describe svc/custom-ray-serve-service-name
Name:                     custom-ray-serve-service-name
Namespace:                default
Labels:                   custom-label=custom-ray-serve-service-label
                          ray.io/serve=rayservice-sample-serve
                          ray.io/service=rayservice-sample
Annotations:              custom-annotation: custom-ray-serve-service-annotation
Selector:                 ray.io/cluster=rayservice-sample-raycluster-pn4cd,ray.io/serve=true
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.96.163.198
IPs:                      10.96.163.198
Port:                     serve  8000/TCP
TargetPort:               8000/TCP
NodePort:                 serve  30266/TCP
Endpoints:                10.244.0.11:8000,10.244.0.12:8000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

So cluster formation looks good, but I'm not really sure how to test the opposite case where the label should go from true to false. Open to suggestions!

Also I don't really know anything about go - please review carefully for style and correctness :)

@JoshKarpel JoshKarpel force-pushed the unecessary-pod-updates branch from 74a4485 to 899e752 Compare August 8, 2023 20:17
@kevin85421 kevin85421 self-requested a review August 9, 2023 18:27
@kevin85421 kevin85421 self-assigned this Aug 9, 2023
@kevin85421
Copy link
Member

Thank you for the PR! I will review it today.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! Would you mind running some e2e tests by the following command in the root directory of kuberay repository? Currently, these tests are not in CI pipeline due to some CI infra issues.

RAY_IMAGE=rayproject/ray:2.5.0 OPERATOR_IMAGE=controller:latest pytest -vs tests/test_sample_rayservice_yamls.py --log-cli-level=INFO

This PR looks good to me, provided it can pass the e2e tests in your local environment. In the future, I might consider using a readiness probe to determine whether a worker Pod is ready to serve requests. Relying on the KubeRay operator to check the availability of each Pod doesn't seem scalable.

@JoshKarpel
Copy link
Contributor Author

Thank you for the contribution! Would you mind running some e2e tests by the following command in the root directory of kuberay repository? Currently, these tests are not in CI pipeline due to some CI infra issues.

RAY_IMAGE=rayproject/ray:2.5.0 OPERATOR_IMAGE=controller:latest pytest -vs tests/test_sample_rayservice_yamls.py --log-cli-level=INFO

This PR looks good to me, provided it can pass the e2e tests in your local environment.

The autoscaling test is actually failing on both master and my branch, at least for me 😬

FAILED tests/test_sample_rayservice_yamls.py::TestRayService::test_service_autoscaling - TimeoutError: Cluster did not scale to the expected number of 1 worker pod(s) within 400 seconds. Cluster is currently at 5 worker pods.

So that doesn't seem related to my change... but maybe related to something else in my local setup? Thoughts?


In the future, I might consider using a readiness probe to determine whether a worker Pod is ready to serve requests. Relying on the KubeRay operator to check the availability of each Pod doesn't seem scalable.

I was actually just asking about something related on Ray Slack and @edoakes was talking about some other changes to the Serve HTTP proxy related to autoscaling, where the proxy now returns a 503 error if the Ray worker doesn't have any other Serve replicas on it. We were seeing these 503s in the KubeRay operator, which triggered some of our alerting. So moving to a k8s readiness check would fix that (no 503s between the operator and the worker pods)... but then our pods would be unready and we also alert on that!

I haven't thought deeply about it, but it seems like the problem is that we need to distinguish between a variety of states:

  1. The worker pod is healthy (health)
  2. The worker pod is healthy and has a healthy Serve proxy on it (ready)
  3. The worker pod is healthy, has a healthy Serve proxy on it, and traffic should be routed to that pod (ready and... enabled? 🤔 )

But maybe those states will change with the work @edoakes was talking about?

@kevin85421
Copy link
Member

The autoscaling test is actually failing on both master and my branch, at least for me 😬

FAILED tests/test_sample_rayservice_yamls.py::TestRayService::test_service_autoscaling - TimeoutError: Cluster did not scale to the expected number of 1 worker pod(s) within 400 seconds. Cluster is currently at 5 worker pods.

So that doesn't seem related to my change... but maybe related to something else in my local setup? Thoughts?

cc @zcin do you have any thoughts?

@kevin85421
Copy link
Member

I was actually just asking about something related on Ray Slack and @edoakes was talking about some other changes to the Serve HTTP proxy related to autoscaling, where the proxy now returns a 503 error if the Ray worker doesn't have any other Serve replicas on it.

I'm confused about this. In my understanding, if a node lacks the requested replica, the HTTP Proxy Actor would forward the request to the node containing that replica rather than returning 503. I will check the Ray Serve team, and will keep you updated.

  1. The worker pod is healthy (health)
  2. The worker pod is healthy and has a healthy Serve proxy on it (ready)
  3. The worker pod is healthy, has a healthy Serve proxy on it, and traffic should be routed to that pod (ready and... enabled? 🤔 )

Would you mind explaining the difference between (2) and (3)? From my understanding, a failed readiness probe will cause the container's service endpoint to be removed, and it will not receive any new traffic. Hence, (2) will not receive any new traffic.

@JoshKarpel
Copy link
Contributor Author

Sorry, I should have linked more context - this is very specifically about the question I asked here: https://ray-distributed.slack.com/archives/CNCKBBRJL/p1691532723786239?thread_ts=1690284353.203859&cid=CNCKBBRJL .

re: (2) vs (3) My thought was that a pod with nothing on it but the proxy actor would actually report unready and thus be taken out of the Service, as you say. But it's not really "unready" in a k8s sense (it could serve a request), we just don't want traffic flowing to it. That might mess up monitoring schemes that check for unready pods, which is a fairly common pattern (at least from what I've seen).

@kevin85421
Copy link
Member

That might mess up monitoring schemes that check for unready pods

Would you mind adding more details on how does "monitoring schemes" work for unready Pods? Thanks!

@kevin85421
Copy link
Member

Btw, I can reproduce this error. I thought I broke something recently 😭. I will fix it ASAP.

@kevin85421
Copy link
Member

[Update] "503 error"

When a Ray node is not properly serving traffic (either the node is unhealthy or there are no replicas on the node), the health check queries to /-/healthz or /-/routes will get 503 to avoid load balancers (e.g. ALB) routing traffic to this node. The actual requests will still be forwarded.

@kevin85421
Copy link
Member

Btw, I can reproduce this error. I thought I broke something recently 😭. I will fix it ASAP.

[Update]

I found that the tests failed because the image kuberay/operator:nightly is pretty old (~ months ago). I deleted the image, ran the tests again, and they passed.

RAY_IMAGE=rayproject/ray:2.5.0 OPERATOR_IMAGE=kuberay/operator:nightly pytest -vs tests/test_sample_rayservice_yamls.py --log-cli-level=INFO

I cloned your branch, built the image, and ran the tests:

RAY_IMAGE=rayproject/ray:2.5.0 OPERATOR_IMAGE=controller:latest pytest -vs tests/test_sample_rayservice_yamls.py --log-cli-level=INFO
Screen Shot 2023-08-10 at 5 11 44 PM

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin85421 kevin85421 merged commit 197fcc2 into ray-project:master Aug 11, 2023
@JoshKarpel JoshKarpel deleted the unecessary-pod-updates branch August 11, 2023 14:22
@JoshKarpel
Copy link
Contributor Author

That might mess up monitoring schemes that check for unready pods

Would you mind adding more details on how does "monitoring schemes" work for unready Pods? Thanks!

Sure - I will DM you on Ray Slack so I can provide some more details

blublinsky pushed a commit to blublinsky/kuberay that referenced this pull request Aug 15, 2023
Do not update pod labels if they haven't changed
blublinsky pushed a commit to blublinsky/kuberay that referenced this pull request Aug 15, 2023
Do not update pod labels if they haven't changed
blublinsky pushed a commit to blublinsky/kuberay that referenced this pull request Aug 28, 2023
Do not update pod labels if they haven't changed
blublinsky pushed a commit to blublinsky/kuberay that referenced this pull request Aug 29, 2023
Do not update pod labels if they haven't changed
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Do not update pod labels if they haven't changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Only update ray.io/serve label on pods if its value changes
2 participants