Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RayCluster updates status frequently #1211

Merged
merged 6 commits into from
Jul 5, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Jun 30, 2023

Why are these changes needed?

This PR is similar to #1065. Without this PR, the RayCluster reconciler will update the CR's status in every reconciliation. Refer to the source code of ReplicaSet and StatefulSet. They will check the difference between the original status and the new status to determine whether update the status or not. Hence, this PR implements a similar function inconsistentRayClusterStatus to determine whether update status or not.

Related issue number

#1182

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
  • I conducted an experiment similar to [DO NOT MERGE][Experiment] RayCluster updates status frequently #1214. In [DO NOT MERGE][Experiment] RayCluster updates status frequently #1214, the status update is called 8 times in 10 minutes. Let the number of reconciliations be N.

    • 6 => RayCluster initialization phase => O(1)
    • 2 => The reconciler will be triggered every 5 minutes because the operator will unconditionally requeue the resource event using ctrl.Result{RequeueAfter: time.Duration(requeueAfterSeconds) * time.Second} (with a default of 300 seconds). => O(N)
  • The CR status update is called 3 times in 10 minutes.

    • 3 => RayCluster initialization phase => O(1)
    • 0 => After RayCluster finishes initialization.
    2023-06-30T18:15:46.246Z	INFO	controllers.RayCluster	rayClusterReconcile	{"r.Status().Update()": "raycluster-complete", "status": {"desiredWorkerReplicas":1,"minWorkerReplicas":1,"maxWorkerReplicas":10,"lastUpdateTime":"2023-06-30T18:15:46Z","head":{},"observedGeneration":1}}
    2023-06-30T18:15:46.345Z	INFO	controllers.RayCluster	rayClusterReconcile	{"r.Status().Update()": "raycluster-complete", "status": {"desiredWorkerReplicas":1,"minWorkerReplicas":1,"maxWorkerReplicas":10,"lastUpdateTime":"2023-06-30T18:15:46Z","head":{},"observedGeneration":1}}
    2023-06-30T18:16:51.042Z	INFO	controllers.RayCluster	rayClusterReconcile	{"r.Status().Update()": "raycluster-complete", "status": {"state":"ready","availableWorkerReplicas":1,"desiredWorkerReplicas":1,"minWorkerReplicas":1,"maxWorkerReplicas":10,"lastUpdateTime":"2023-06-30T18:16:51Z","head":{},"observedGeneration":1}}
    
Screen Shot 2023-06-30 at 5 24 06 PM

@kevin85421 kevin85421 force-pushed the ray-cluster-status-update branch from 16aa211 to 8558a55 Compare July 1, 2023 00:11
@kevin85421 kevin85421 marked this pull request as ready for review July 1, 2023 00:29
@kevin85421
Copy link
Member Author

cc @msumitjain would you mind reviewing this PR? Thanks!

@kevin85421 kevin85421 requested a review from gvspraveen July 3, 2023 19:59
@kevin85421
Copy link
Member Author

cc @anshulomar would you mind reviewing this PR? Thanks!

// status update will not be triggered.
//
// TODO (kevin85421): The field `ObservedGeneration` is not being well-maintained at the moment. In the future,
// this field should be used to determine whether to update this CR or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have GH issue for this ObservedGeneration fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I will open an issue for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open an issue #1217.

oldStatus.MinWorkerReplicas, newStatus.MinWorkerReplicas, oldStatus.MaxWorkerReplicas, newStatus.MaxWorkerReplicas))
return true
}
if !reflect.DeepEqual(oldStatus.Endpoints, newStatus.Endpoints) || !reflect.DeepEqual(oldStatus.Head, newStatus.Head) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason to not include this as part of if statement in line 296?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge L290, L296, and L305 into a single if statement; however, the resulting log message will be very long and difficult for users to read.

@kevin85421 kevin85421 merged commit 15daa54 into ray-project:master Jul 5, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
RayCluster updates status frequently
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants